Molecular Modelling and Drug Discovery

Structure-Independent Peptide Binder Design via Generative Language Models | Pranam Chatterjee


Listen Later

[DISCLAIMER] - For the full visual experience, we recommend you tune in through our ⁠⁠⁠⁠⁠⁠⁠⁠YouTube channel ⁠⁠⁠⁠⁠⁠⁠⁠to see the presented slides.

Try datamol.io - the open source toolkit that simplifies molecular processing and featurization workflows for machine learning scientists working in drug discovery: ⁠⁠⁠⁠⁠https://datamol.io/⁠⁠⁠⁠⁠

If you enjoyed this talk, consider joining the ⁠⁠⁠⁠⁠⁠⁠⁠Molecular Modeling and Drug Discovery (M2D2) talks⁠⁠⁠⁠⁠⁠⁠⁠ live.

Also, consider joining the ⁠⁠⁠⁠⁠⁠⁠⁠M2D2 Slack⁠⁠⁠⁠⁠⁠⁠⁠.

Abstract: The ability to modulate pathogenic proteins represents a powerful treatment strategy for diseases. Unfortunately, many proteins are considered “undruggable” by small molecules, and are often intrinsically disordered, precluding the usage of structure-based tools for binder design. To address these challenges, we have developed a suite of algorithms that enable the design of target-specific peptides via protein language model embeddings, without the requirement of 3D structures. First, we train a model that leverages ESM-2 embeddings to efficiently select high-affinity peptides from natural protein interaction interfaces. We experimentally fuse model-derived peptides to E3 ubiquitin ligases and identify candidates exhibiting robust degradation of undruggable targets in human cells. Next, we develop a high-accuracy discriminator, based on the CLIP architecture, to prioritize and screen peptides with selectivity to a specified target protein. As input to the discriminator, we create a Gaussian diffusion generator to sample an ESM-2-based latent space, fine-tuned on experimentally-valid peptide sequences. Finally, to enable de novo generation of binding peptides, we train an instance of GPT-2 with protein interacting sequences to enable peptide generation conditioned on target sequence. Our model demonstrates low perplexities across both existing and generated peptide sequences. Together, our work lays the foundation for programmable protein targeting and editing applications.


Speaker: Pranam Chatterjee

Twitter -  ⁠⁠⁠⁠⁠⁠⁠⁠Prudencio⁠⁠⁠⁠⁠⁠⁠⁠

Twitter - ⁠⁠⁠⁠⁠⁠⁠⁠Jonny⁠⁠⁠⁠⁠⁠⁠⁠

Twitter - ⁠⁠⁠⁠⁠⁠⁠⁠datamol.io

...more
View all episodesView all episodes
Download on the App Store

Molecular Modelling and Drug DiscoveryBy Valence Discovery