CellHermes is a novel biological foundation model designed to unify diverse
omics data by leveraging the reasoning power of pretrained
large language models. Unlike traditional models trained from scratch on specific data types, this framework translates complex
transcriptomic profiles and
protein-protein interaction networks into a standardized
natural language format. By reformulating biological datasets into
question-answer pairs, the system emulates multiple self-supervised learning paradigms within a single, universal architecture. The model serves three primary functions: an
encoder for representing genes and cells, a
predictor for multi-task biological forecasting, and an
explainer that provides human-readable reasoning for its findings. Benchmarks demonstrate that
CellHermes matches or exceeds the performance of specialized single-cell models while maintaining high
interpretability. This research establishes natural language as a versatile medium for integrating heterogeneous biological information into a cohesive analytical loop.
References:
- Gao Y, Wang W, Zhao Y, et al. Language may be all omics needs: Harmonizing multimodal data for omics understanding with CellHermes[J]. bioRxiv, 2025: 2025.11. 07.687322.