#
**Treesim**

Sampling of genealogies from an approximate posterior under the infinite sites model with recombination.

The ancestry of a collection of individuals at a single locus is most naturally represented by a binary tree. In order to represent ancestry across multiple loci we must extend this to a graph, the ancestral recombination graph (ARG). Under some simplifying asumptions (see e.g. the Handbook of Statistical Genetics) we can show that the coalescent with recombination gives a good model for the distribution of these graphs. Given genetic data, this allows us to estimate genealogies, calculate likelihoods, and estimate parameters of the evolutionary process.

We present here a Monte Carlo approach to simulating heuristic genealogies for large regoins of the genome. These can be powerful tools for a range of applications including imputation and detection of genetic associations. Building on the approach of Fearnhead and Donnelly (Genetics, 2001) We introduce a number of heuristics which, when employed, greatly improve the speed of their calculations for a single genealogy, at the cost of exploring the posterior as evenly.

*treesim* can also be used to calculate likelihoods using importance
sampling, and approximates the optimal
importance sampler, following the approach of Stephens and Donnelly
and Fearnhead and Donnelly in
constructing an importance sampler to calculate the likelihood curves under
both the SMC and the Coalescent. The method generates a coalescent genealogy
backwards in time until the most recent common ancestor is reached
at all sites. The genealogy is constructed using the sample and is augmented
with mutations so that the probability of the sample configuration given the
augmented genealogy is 1 or 0.

The program *treesim* is written in C++ and can be run from the
command line on Windows, Linux and Mac OSX.

*Treesim*was used by Zhan Su to create a set of genealogies for the hapmap CEU and YRI populations, here

**Manual**

**Source Code (.tar.gz file)**

**Files for a test run (.tar file)**

**Helper R code - to run and parse/collate/plot output**

**Home**