This paper introduces advanced
phylogenetic algorithms designed to process
pathogen genomic data at a massive scale, specifically focusing on the evolution of
SARS-CoV-2. The authors address the challenges of
computational demand and
data inaccuracies by implementing new models within the
MAPLE software to identify
mutation rate variations and
recurrent sequencing errors. By distinguishing between real evolutionary changes and technical artifacts, the framework improves the
accuracy of tree reconstruction for datasets containing millions of sequences. The research culminates in the creation of a
reliable global phylogeny of over two million genomes, providing a high-resolution map of the virus's spread. These methodological improvements enhance
genomic epidemiology, offering essential tools for monitoring current and future
infectious disease outbreaks.
References:
- De Maio N, Willemsen M, Martin S, et al. Rate variation and recurrent sequence errors in pandemic-scale phylogenetics[J]. Nature Methods, 2026: 1-9.