Paper Talk

796-DIAMOND DeepClust for Protein Clustering


Listen Later

The paper introduces DIAMOND DeepClust, an ultra-fast software tool designed to cluster billions of protein sequences across the global biosphere. By employing a cascaded clustering method and optimized alignments, it enables planetary-scale organization of the "protein universe" while maintaining high sensitivity at low sequence identity. Researchers successfully used the tool to group 19 billion sequences into manageable clusters, significantly expanding the known diversity of protein families compared to existing databases. This massive reduction in data complexity helps accelerate comparative genomics and improves the accuracy of AI-driven structure predictions like AlphaFold2. Ultimately, the method provides a future-proof solution for the Earth BioGenome project, which aims to sequence and categorize the genetic information of all known eukaryotic species.

References:

  • Buchfink B J, Barbé É, Ashkenazy H, et al. Clustering the protein universe of life using DIAMOND DeepClust[J]. Nature Methods, 2026: 1-4.
...more
View all episodesView all episodes
Download on the App Store

Paper TalkBy 淼淼Elva