background image for sidebar

Software packages and other links for genetic analyses

This list was originally made to accompany my various Phylogeography courses.

UPDATE: New programs added (and some dead links removed. Most recent update: June 18th, 2009. Please alert me if you find more problems, or have suggestions for this page (not that anyone ever does, but here's my email anyway): andrew ∀ dna.ac

Data manipulation, alignment and visualization

Geneious

Geneious: I finally started using this program as an alternative to Sequencher. Geneious is far more enjoyable to use than BioEdit, and has many additional analysis and visualization features that Sequencher does not. There's still no beating Sequencher for cleaning up DNA sequences, but the price is prohibitive for many genetics labs around the world. Therefore, I give Geneious 2 thumbs up.

ClustalW and ClustalX version 2.0

Click on ClustalX and ClustalW to download a copy of either version of the paradigmatic alignment software, both re-written in September, 2007. Available for Mac, PC, Linux.

You can also run you analyses using ClustalW on the EMBL-EBI server!

ClustalX Help: A very detailed user's guide to ClustalX (but version 1).

BAli---Phy

Bayesian Alignment and Phylogeny estimation using Bali-Phy: This software breaks new ground in model-based phylogenetics, but unfortunately it is not fast. I recommmend trying it first on small data sets.

BioEdit

BioEdit: A free alternative to Sequencher. Windows OS only. Runs Clustal, translates DNA to amino acid sequence, calculates reverse complement, and reads raw chromatograms.

TreeEdit

TreeEdit: Mac OS 9 and OS X only. This software has more features than TreeView.

FigTree

FigTree: more software from Andrew Rambaut, available in Windows, Mac, and Linux flavors. This software has fewer features than TreeEdit, but makes way better graphics, and is more reliable.

Mesquite

Mesquite: Includes modules for coalescent simulations, as well as manipulating & visualizing alignments. Calculates the reverse & complement of a DNA sequence, which the new MacClade does not (as of 2006).

FaBox

FaBox provides a wide variety of DNA sequence data manipulation tools that you might need to get raw FASTA data organized how you need it and ready to input into your favorite software pacakge. Citation available at DOI: 10.1111/j.1471-8286.2007.01821.x.

Population genetics and coalescent simulators

dnaSP

dnaSP: the industry standard for population genetic analysis of DNA sequences. For Windows OS.
Version 5 released June 11, 2009.

Mesquite

Mequite is as powerful as it is easy to use. The coalescent simulation package has wonderful graphics that let you simulate and visualize gene trees within species trees. As in SIMCOAL, you can invent and compare the historical models of your choosing. If you're new to coalescent simulations, I would start here.

IM (Isolation with Migration model analysis using MCMC)

IM and IMa: Assuming two recently separated populations/species, estimates effective pop. sizes (including the common ancestor's), asymmetric migration rates, divergence time, and relative sizes of the two founding populations. IMa allows for a likelihood ratio test of nested versions of the full 6-parameter model. For Windows/DOS or Unix (Darwin or Linux). Updated May 6, 2009.

If you are serious about IM, join the Google Group Isolation with Migration.

On 16 June 2009 Jody Hey released a coalescent simulation program designed to complement IM style analyses, called SIMDIV.
SIMCOAL2 (see below) may actually be more flexible, but SIMDIV will be parameterized similiarly to IM, which may help with comparison of results.

MIMAR

Now you can analyze the popular "IM" model while accounting for intralocus recombination in the data, using MIMAR.

SIMCOAL

From the people that brought you Arlequin, a flexible coalescent simulator for Windows 2000/XP and Linux. SIMCOAL2 adds recombination and more types of markers over the original, but if you don't need those features, I strongly recommend the original SIMCOAL version 1.

SAMOVA 1.0

Another interesting package related to Arlequin, this one uses spatially explicit data and simulated annealing to locate and quantify genetic breaks across the range of samples. Windows PC only. Download SAMOVA here, and perhaps look around for possible updates.

MCMCcoal and MCcoal

MCMCcoal estiamtes species divergence times and ancestral population sizes from multilocus data, while
MCcoal is a coalescent simulation tool, both provided by the famous (and patient) Ziheng Yang.

Phylogenetics

STEM

Species Tree Estimation using Maximum likelihood, aka STEM, searches for the most likely species tree from a set of gene trees under a coalescent model. Concatenation-free, multi-locus phylogenetic methods are especially (but not uniquely) useful at phylogeographic scales of analysis.

Two programs for inferring haplotype networks

TCS, the official network-building software for NCPA, uses parsimony. The root is assigned by considering relative frequencies of haplotypes, and therefore is strongly influenced by your sampling design.
Network 4.510 is a much easier program to use, and provides nice graphics. Networks are inferred by median-joining and other algorithms.

Selecting model of DNA sequence evolution

To select for your phylogenetic dataset an appropriate standard model of DNA sequence evolution, try one of the 5 options below:
Uses PhyML:
1) jModeltest (uses AIC, AICc, BIC, DT, hLRT and dLRT criteria).
2) phymltest command (implements AIC) in the R-package, ape.
Takes output of likelihood scores from PAUP*:
3) DT_ModSel (implements BIC using decision theory framework)/
4) The old Modeltest [(riteria include AIC, AICc, BIC and hLRT).
Used for selecting models among partitions simulateously (I have not tried this yet):
5) kakusan3.

RAxML

RAxML: a popular software package for ML phylogenetic inference of large data sets. Available as a download or may be run on public servers. See link for details.

MrBayes

MrBayes, always download the latest version.

A forum for MrBayes users.

Tracer, a useful program for visualizing and evaluating the quality of Bayesian MCMC analyses.

Are We There Yet?, another useful software for evaluating the quality of your MCMC sampling of the posterior distribution of phylogenetic trees, perhaps more informative than even Tracer. See the official citation at DOI: 10.1093/bioinformatics/btm388

The R package coda also does MCMC sampling diagnostics. It's less fancy but very general to any MCMC analysis.

BayesPhylogenies

BayesPhylogenies: a powerful alternative to MrBayes by Pagel & Meade. BayesPhylogenies implements "mixture models": the user decides how many partitions to implement and the software estimates in what proportion each partition applies to each site. Remarkable! However, it seems a bit slower than MrBayes, and seems to have fewer features to help evaluate burnin and mixing of the (MC)MCMC run. But see diagnostic tools cited above.

Garli: Genetic Algorithm for Rapid Likelihood Inference

Find maximum likelihood solutions to even very large molecular phylogenetic datasets super fast, with Garli by Derrick Zwickl. The program uses most of the same evolutionary models as PAUP, but like PAUP it does not (I don't think) allow different models to be applied simultaneously to different partitions of the same dataset. It's so fast, I use Garli instead of NJ to make trees from large datasets (300+ OTUs). However, be sure to repeat each analysis, say, 5 times to explore variation among runs.

MEGA

MEGA version 4, for Windows.

PAUP* - SADLY, IT IS NOT FREE

PAUP* FAQ

PAUP* commands reference version 2, as PDF.

PaupUP is free software to convert the DOS (not Windows™ OS) version of PAUP* into a point-and-click menu and window driven version. See Screenshot. 80% of PAUP*'s commands are available via the menus, the other 20% are still accessible via the command line. The software also incorporates features of Modeltest and TreeView.

Suggestions for running a good first set of analyses using PAUP*.

Brian O'Meara's PAUP* instructions: the best on-line resource for using PAUP*, especially for running batch files.

SEQ-GEN

seq-gen, version 1.3: for simulating the evolution of DNA sequences on a given tree.

Divergence time estimation

multidivtime

multidivtime, Bayesian MCMC analysis of divergence time without assuming a 'molecular clock.'

BEAST

BEAST, another Bayesian analysis of divergence time without assuming a 'molecular clock.'

PATHd8

PATHd8, a new and improved algorithm for making ultrametric trees, works on even very large data sets. Does no exaggerate terminal branch lengths descending from the shorter branches on your tree, like NPRS would. However, I've seen it inexplicably (and annoyingly) enlongate intermediately positioned branches. I now only recommend this if you cannot get an ultrametric tree out of BEAST or perhaps r8s.

Speciation Time Estimators

STE: Use a divergence population genetic model without migration to estimate species (or population) splitting times.

More useful pages

Genetics Software Forum: Questions and answers for users of various evolutionary genetic software packages. Unfortunately it has not shown any signs of life since February 2009.


This page will be growing constantly, because there are always new software packages. Send suggestions to andrew-∀-dna.ac (replace "-∀-" with "@", obviously).

Last update: 15 July 2010.