Crux
Crux is a software toolkit for molecular phylogenetic inference that runs on (at least) Linux, FreeBSD, and Mac OS X. It is structured as a set of Python modules, which makes it possible to quickly develop Python scripts that perform unique, non-canned analyses.
The Python language is powerful and elegant, but it does not have any high performance implementations, which makes it a challenge to use for scientific computing. Crux is actually implemented as a combination of Python, Cython and C code. Cython is a superset of Python, with various features that allow compiled results to substantially out-perform pure Python code. The result is that Crux supports a wide variety of users:
-
End users can use canned scripts, either those included with Crux, or scripts developed for them by computer-savvy collaborators.
-
Computer-savvy users can develop custom scripts that utilize the components of Crux in unforseen ways.
-
Power users can use Crux as a component of a custom software package, or extend Crux itself. The source code is available under a very permissive license, with the intent that it be used.
Features
Crux's current features include the following:
-
Tree log-likelihoods can be computed under a variety of models, including all specializations of GTR+I+Γ and mixture models. Tree likelihoods can be computed in parallel via pthreads or MPI.
-
Bayesian Markov chain Monte Carlo (MCMC) methods (with Metropolis coupling and MPI support for parallel computation) can sample among non-nested models using reversible model jumps. Polytomous trees can be sampled, also via reversible jumps. In fact, every non-essential model parameter that Crux's MCMC implementation estimates can be expunged via reversible jumps.
-
Crux is capable of simulating character data under any model its likelihood engine is capable of. The huge range of simulation options makes a canned command line interface like that of Seq-Gen rather unweildy, so simulation is perhaps the most obvious compelling reason to develop Crux scripts.
-
The neighbor joining (NJ) and relaxed neighbor joining (RNJ) implementations are among the fastest in existence, along with Clearcut.
- Pairwise distances between sequences can be computed based on percent identity, or using methods that correct for multiple hits (Jukes-Cantor, Kimura, and logDet).
-
Multifurcating trees can be manipulated at a low level. Various standard operations are implemented, such as canonization (ordering the tree according to a ranking of the taxa), collapsing of zero-length branches, Robinson-Foulds distance computation, and so on.
-
Parsers are included for various file formats, including Newick trees, FASTA sequences, and PHYLIP distance matrices.
Crux 1.0.0 was the first official release of Crux, but it was actually the fourth major implementation. The earlier versions were implemented in other programming languages, and a few features have since been dropped (fast Fitch parsimony, sophisticated TBR and NNI tree transformation/enumeration).
Copyright © 2009 Jason Evans <jasone@canonware.com>.
Last updated 2009/07/23.