
Sign up to save your podcasts
Or


️ Episode 207: Semantic Design of de novo Genes with Evo
In this episode of PaperCast Base by Base, we explore how a genomic language model called Evo can use genomic context to design entirely new DNA sequences that encode functional genes and multi-component defence systems.
Study Highlights:
Researchers trained the Evo genomic language model on long prokaryotic and phage DNA sequences and used genomic neighbourhoods as prompts to autocomplete new genes whose functions mirror those of their neighbours. They experimentally validated Evo-designed toxin–antitoxin systems and type III toxin–antitoxin modules, discovering novel protein toxins, protein antitoxins and RNA antitoxins that strongly modulate bacterial survival despite low or absent sequence similarity to natural proteins. Using prompts from anti-CRISPR operons, they generated diverse anti-CRISPR proteins that block SpCas9 activity and protect cells from phage infection, including candidates that cannot be confidently assigned to any known protein family. Finally, they scaled this semantic design strategy to build SynGenome, a public resource of more than 120 billion base pairs of Evo-generated DNA organised by gene ontology and domain annotations to enable function-guided exploration across many biological pathways.
Conclusion:
This work shows that genomic language models can move beyond imitating nature, using semantic relationships in genomes to design de novo functional genes and systems that expand the sequence space available for protein engineering and synthetic biology.
Music:
Enjoy the music based on this article at the end of the episode.
Reference:
Merchant AT, King SH, Nguyen E, Hie BL. Semantic design of functional de novo genes from a genomic language model. Nature. 2025. https://doi.org/10.1038/s41586-025-09749-7
License:
This episode is based on an open-access article published under the Creative Commons Attribution 4.0 International License (CC BY 4.0) – https://creativecommons.org/licenses/by/4.0/
Support:
Base by Base – Stripe donations: https://donate.stripe.com/7sY4gz71B2sN3RWac5gEg00
Official website https://basebybase.com
Castos player https://basebybase.castos.com
On PaperCast Base by Base you’ll discover the latest in genomics, functional genomics, structural genomics, and proteomics.
By Gustavo Barra️ Episode 207: Semantic Design of de novo Genes with Evo
In this episode of PaperCast Base by Base, we explore how a genomic language model called Evo can use genomic context to design entirely new DNA sequences that encode functional genes and multi-component defence systems.
Study Highlights:
Researchers trained the Evo genomic language model on long prokaryotic and phage DNA sequences and used genomic neighbourhoods as prompts to autocomplete new genes whose functions mirror those of their neighbours. They experimentally validated Evo-designed toxin–antitoxin systems and type III toxin–antitoxin modules, discovering novel protein toxins, protein antitoxins and RNA antitoxins that strongly modulate bacterial survival despite low or absent sequence similarity to natural proteins. Using prompts from anti-CRISPR operons, they generated diverse anti-CRISPR proteins that block SpCas9 activity and protect cells from phage infection, including candidates that cannot be confidently assigned to any known protein family. Finally, they scaled this semantic design strategy to build SynGenome, a public resource of more than 120 billion base pairs of Evo-generated DNA organised by gene ontology and domain annotations to enable function-guided exploration across many biological pathways.
Conclusion:
This work shows that genomic language models can move beyond imitating nature, using semantic relationships in genomes to design de novo functional genes and systems that expand the sequence space available for protein engineering and synthetic biology.
Music:
Enjoy the music based on this article at the end of the episode.
Reference:
Merchant AT, King SH, Nguyen E, Hie BL. Semantic design of functional de novo genes from a genomic language model. Nature. 2025. https://doi.org/10.1038/s41586-025-09749-7
License:
This episode is based on an open-access article published under the Creative Commons Attribution 4.0 International License (CC BY 4.0) – https://creativecommons.org/licenses/by/4.0/
Support:
Base by Base – Stripe donations: https://donate.stripe.com/7sY4gz71B2sN3RWac5gEg00
Official website https://basebybase.com
Castos player https://basebybase.castos.com
On PaperCast Base by Base you’ll discover the latest in genomics, functional genomics, structural genomics, and proteomics.