Phylogenetic techniques are increasingly applied to infer the somatic mutational history of a tumor from DNA sequencing data. However, standard phylogenetic tree reconstruction techniques do not account for the fact that bulk sequencing data measures mutations in a population of cells. We formulate and solve the multi-state perfect phylogeny mixture deconvolution problem of reconstructing a phylogenetic tree given mixtures of its leaves, under the multi-state perfect phylogeny, or infinite alleles model. Our somatic phylogeny reconstruction using combinatorial enumeration (SPRUCE) algorithm uses this model to construct phylogenetic trees jointly from single-nucleotide variants (SNVs) and copy-number aberrations (CNAs). We show that SPRUCE addresses complexities in simultaneous analysis of SNVs and CNAs. In particular, there are often many possible phylogenetic trees consistent with the data, but the ambiguity decreases considerably with an increasing number of samples. These findings have implications for tumor sequencing strategies, suggest caution in drawing strong conclusions based on a single tree reconstruction, and explain difficulties faced by applying existing phylogenetic techniques to tumor sequencing data.
All Science Journal Classification (ASJC) codes
- Pathology and Forensic Medicine
- Cell Biology