October 19, 2022

AlphaFold2, OpenFold, Protein Language Models and Beyond | Nazim Bouatta

1 hour 1 minute

[DISCLAIMER] - For the full visual experience, we recommend you tune in through our YouTube channel to see the presented slides.

If you enjoyed this talk, consider joining the Molecular Modeling and Drug Discovery (M2D2) talks live.

Also consider joining the M2D2 Slack

Abstract: AlphaFold2 represents a stunning advance on one of biology’s grand challenges: predicting the 3D structure of a protein from the knowledge of its sequence of amino acids. After briefly explaining AlphaFold2 key features, I will introduce our OpenFold: an optimized, trainable, and completely open-source version of AlphaFold2. By training OpenFold from scratch, we match the accuracy of AlphaFold2. I will discuss the analysis of intermediate structures produced by OpenFold during training and report surprising insights into the model’s critical early phase of learning and new relationships between data size/diversity and prediction accuracy. Despite the high prediction accuracy achieved by AlphaFold2 (and OpenFold), many challenges remain, including (1) prediction of orphan and rapidly evolving proteins; and (2) rapid exploration of designed proteins. I will also report on the development of an end-to-end differentiable recurrent geometric network (RGN2) that uses a protein language model (AminoBERT) to learn latent structural information from unaligned proteins. On average, RGN2 outperforms AlphaFold2 on orphan proteins and classes of designed proteins while achieving up to a 10^6 -fold reduction in compute time.

Full Paper

Speakers: Nazim Bouatta

Twitter Prudencio

Twitter Therence

Twitter Cas

Twitter Valence Discovery

...more