This paper describes
ctyper, a novel computational method for accurately
genotyping sequence-resolved copy number variation (CNV), particularly in complex or medically relevant genes, by leveraging human
pangenome reference assemblies and next-generation sequencing data. The authors establish that
ctyper is both highly
accurate and computationally efficient enough for large-scale biobank analysis, outperforming current methods in capturing phased variants and copy numbers for genes like HLA and CYP2D6. Furthermore, the application of
ctyper revealed new insights into
global population diversity of CNVs and significantly improved the prediction of
gene expression divergence when compared to simpler aggregate copy number measurements, highlighting the functional importance of allele-specific variation. The methodology relies on an
alignment-free comparison of low-copy k-mers and a recursive phylogenetic rounding approach to solve for integer copy numbers efficiently.
References:
- Ma W, Chaisson M J P. Genotyping sequence-resolved copy number variation using pangenomes reveals paralog-specific global diversity and expression divergence of duplicated genes[J]. Nature Genetics, 2025: 1-11.