Building Things with Machine Learning

Ep 5: Discovering Pharmaceuticals with Machine Learning, with Ryan Emerson of A-Alpha Bio


Listen Later

A true “aha” conversation! Learn how deep learning techniques from natural language processing (NLP) are applied to drug discovery, specifically, protein to protein interactions. Includes a quick and dirty primer on just enough biology to understand the training data A-Alpha Bio uses for their ML models.

For more episodes, visit https://yaoshiang.com/podcast.html

Show Notes:

0:37 - The basics of synthetic biology for machine learning practitioners

0:50 - What are proteins and why do they matter?

1:50 - A protein is a string of 20 amino acids… which means it starts looking like a Natural Language Processing problem.

2:35 - DeepMind’s AlphaFold and Meta FAIR’s ESMFold: taking as input a string of amino acids, and then predicting the 3D structure of proteins.

6:23: Where Alphafold got their training data: The Protein Data Bank.

8:07: A Alpha Bio’s product: AlphaSeq. 10:45: The source of the name “A Alpha Bio”: yeast genders. 11:36: Applications of synthetic biology: pharmaceuticals, agriculture.

15:00: Applying ML to predict protein to protein interactions.

20:30: !!! The actual ML techniques applied: treating proteins as strings and applying NLP architectures: RNNs, LSTMs, Attention, and Transformers.

22:50: Discrete Optimization problem to then generate proteins.

28:30: The insights behind why applying ML would work.

31:20: The rise of deep learning in the field of computational biology.

32:50: Ryan’s journey into machine learning and data science

35:20: Advice for deep learning people interested in applying ML to biology

 

Additional papers covering the topic of ML in biology:

https://www.nature.com/articles/s41586-021-03819-2 - The AlphaFold paper.

https://pubmed.ncbi.nlm.nih.gov/35830864/ - A broad overview of deep learning in biology.

https://pubmed.ncbi.nlm.nih.gov/35862514/ - A paper out of the Baker lab in which the authors use deep learning to design proteins from scratch.

https://pubmed.ncbi.nlm.nih.gov/35099535/ - From Charlotte Deane’s lab with collaborators from Roche, this paper presents a deep learning approach to rapidly and accurately model the structure of antibody CDR3 loops. One of the papers mentioned in the review above.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9129155/ - This is recent work from A-Alpha; this paper doesn’t include any ML but does include some great examples of AlphaSeq data and how it can be applied.

 

The YouTube version of this podcast is available at https://www.youtube.com/watch?v=k2OzeRQIXMs.

...more
View all episodesView all episodes
Download on the App Store

Building Things with Machine LearningBy Yaoshiang ho