The paper introduces
X-Atlas/Pisces, the most extensive genome-wide
CRISPRi Perturb-seq dataset created to date, featuring 25.6 million single-cell transcriptomes across 16 biological contexts. Leveraging this massive resource, researchers developed
X-Cell, a
diffusion language model designed to predict how genetic interventions reshape gene expression. The model improves accuracy by integrating
multi-modal biological priors, such as protein language models and interaction networks, through a specialized
cross-attention architecture. By scaling the system to 4.9 billion parameters in
X-Cell-Ultra, the authors demonstrate that perturbation prediction follows
power-law scaling similar to large language models. Ultimately, the research shows that
X-Cell achieves superior
zero-shot generalization in unseen cell types and primary human cells, offering a transformative tool for
computational drug discovery and target identification.
References:
- Wang C, Karimzadeh M, Ravindra N G, et al. X-Cell: Scaling Causal Perturbation Prediction Across Diverse Cellular Contexts via Diffusion Language Models[J]. bioRxiv, 2026: 2026.03. 18.712807.