CNVCALL

Accurate assignment of copy number at known copy number variant (CNV) loci is important for both increasing understanding of the structural evolution of genomes as well as for carrying out association studies of copy number with disease. As with calling SNP genotypes, the task can be framed as a clustering problem but for a number of reasons assigning copy number is much more challenging. CNV-assays have lower signal to noise ratios than SNP assays, often display heavy tailed and asymmetric intensity distributions, contain outlying observations and may exhibit systematic technical differences among different cohorts. In addition, the number of copy-number classes at a CNV in the population may be unknown a priori. Due to these complications automatic and robust assignment of copy number from array data remains a challenging problem. We developed a copy number assignment algorithm, CNVCALL, that robustly identifies both the number of different copy number classes at a specific locus as well as relative copy number for each individual in the sample. We use a Bayesian hierarchical mixture model with specific features designed to give robust inference in the presence of the artifacts found in real data. This approach is fully automated which is a critical requirement when analysing large numbers of CNVs. We find that these methods perform well and as such were used in the Wellcome Trust Case Control Consortiums recent CNV association study.

Manual
Code
Home