Data in Biotech is a fortnightly podcast exploring how companies leverage data to drive innovation in life sciences.
Every two weeks, Ross Katz, Principal and Data Science
... moreBy CorrDyn
Data in Biotech is a fortnightly podcast exploring how companies leverage data to drive innovation in life sciences.
Every two weeks, Ross Katz, Principal and Data Science
... more5
1010 ratings
The podcast currently has 29 episodes available.
This week on Data in Biotech, we’re joined by Mo Jain, the Founder and CEO of Sapient, a biomarker discovery organization that enables biopharma sponsors to go beyond the genome to accelerate precision drug development.
Mo talks us through his personal journey into the world of science, from school to working in academia to founding his business, Sapient.
He explains how and why Sapient first started and the evolution of the high-throughput mass-spectrometry service it provides to the biopharmaceutical sector.
Together with our host Ross, they explore the technology that’s allowed scientists to explore one's medical history like never before via metabolome, lipidome, and proteome analysis.
They look at how the technology developed to allow data testing to go from running twenty tests per blood sample to twenty thousand. How have Sapient built themselves up to such a renowned status in biopharmaceuticals for large-scale data projects?
They discuss Sapient’s process when working with clients on genome projects. We learn about Sapient’s relationship with their clients, how they understand the targets and aims of each project, why they put so much importance on proprietary database management and quality control, and Sapient’s three pillars for high quality data discovery.
Finally, Mo takes the opportunity to give us his insights on the future of biomarker discovery and mass-spectrometry technology - and how AI and Machine Learning are leading to enhanced data quality and quantity.
Data in Biotech is a fortnightly podcast exploring how companies leverage data innovation in the life sciences.
Chapter Markers
[1:33] Introduction to Mo Jain, his journey, Genomics, and Sapient’s use of Genomics data to accelerate Medicine and Drug Development
[6:50] The types of data generated at Sapient via metabolome, lipidome & proteome, and why that data is generated
[12:30] How Sapient generates this data at scale, via specialist mass-spectrometry technology
[14:48] The problems Sapient can solve for pharma and biotech companies with this data
[21:03] Sapient as a service company: the questions they’re asked by pharmaceutical businesses, why they come to Sapient, and Sapient’s process for answering those questions.
[26:23] computational frameworks and data handling side of things, and how the team interact with the client
[29:59] Proprietary database development and quality control
[35:27] The future of biomarker discovery and mass-spectrometry technology, and how AI and Machine Learning are leading the way at Sapient
This week on Data in Biotech, we are joined by Parul Bordia Doshi, Chief Data Officer at Cellarity, a company that is leveraging data science to challenge traditional approaches to drug discovery.
Parul kicks off the conversation by explaining Cellarity’s mission and how it is using generative AI and single-cell multiomics to design therapies that target the entire cellular system, rather than focusing on single molecular targets.
She gives insight into the functionality of Cellarity Maps, the company’s cutting-edge visualization tool that maps the progression of disease states and bridges the gap between biologists and computational scientists.
Along with host Ross Katz, Parul walks through some of the big challenges facing Chief Data Officers, particularly for biotech organizations with data-centric propositions.
She emphasizes the importance of robust data frameworks for validating and standardizing complex data sets, and looks at some of the practical approaches that ensure data scientists can derive the maximum amount of value from all available data.
They discuss what data science teams look like within Cellarity, including the unique way the company incorporates human intervention into its processes.
Parul also emphasizes the benefits that come through hiring multilingual, multidisciplinary teams and putting a strong focus on collaboration.
Finally, we get Parul’s take on the future of data science for drug discovery, plus a look at Cellarity’s ongoing collaboration with Novo Nordisk on the development of novel therapeutics.
Data in Biotech is a fortnightly podcast exploring how companies leverage data innovation in the life sciences.
Chapter Markers
[1:45] Introduction to Parul, her career journey, and Cellarity’s approach to drug discovery.
[5:47] The life cycle of data at Cellarity from collection to how it is used by the organization.
[7:45] How the Cellarity Maps visualization tool is used to show the progression of disease states
[9:05] The role of a Chief Data Officer in aligning an organization’s data strategy with its company mission.
[11:46] The benefits of collaboration and multidisciplinary, cross-functional teams to drive innovation.
[14:53] Cellarity's end-to-end discovery process; including how it uses generative AI, contrastive learning techniques, and visualization tools.
[19:42] The role of humans vs the role of machines in scientific processes.
[23:05] Developing and validating models, including goal setting, benchmarking, and the need for collaboration between data teams and ML scientists.
[30:58] Generating and managing massive amounts of data, ensuring quality, and maximizing the value extracted.
[37:08] The future of data science for drug discovery, including Cellarity’s collaboration with Novo Nordisk to discover and develop a novel treatment for MASH.
This week on Data in Biotech, Ryan Mork, Director of Data Science at Evozyne, joins host Ross Katz to discuss how data science and machine learning are being used in protein engineering and drug discovery.
Ryan explains how Evozyne is utilizing large language models (LLMs) and generative AI (GenAI) to design new biomolecules, training the models with huge volumes of protein and biology data. He walks through the organization’s evolution-based design approach and how it leverages the evolutionary history of protein families.
Ross and Ryan dig into the different models being used by Evozyne, including latent variable models and embeddings. They also discuss some of the challenges around testing the functionality of models and the approaches that can be used for evaluation.
Alongside the deep dive into data and modeling topics, Ryan also discusses the importance of relationships between the wet lab and data science teams. He emphasizes the need for mutual understanding of each role to ensure the entire organization pulls together towards the same goals.
Finally, Ross asks Ryan to opine on the future of GenAI and LLMs for biotechnology and how this area will develop over the next five years. He also finds out more about the R&D roadmap at Evozyne and its plans to play a part in moving GenAI for protein engineering forward.
Data in Biotech is a fortnightly podcast exploring how companies leverage data innovation in the life sciences.
Chapter Markers
[1:24] Introduction to Ryan, his career to date, and the focus of Evozyne.
[2:59] How the Evozyne data science team operates and the data sources it utilizes.
[4:22] Building models to develop synthetic proteins for therapeutic uses.
[9:10] Deciding which proteins to take into the lab for experimental validation.
[10:49] Taking an evolution-based design approach to protein engineering.
[14:34] Using latent variable models and embeddings to capture evolutionary relationships.
[18:01] Evaluating the functionality of generative models and the role of auxiliary models.
[24:24] The value of tight coupling and mutual understanding between wet lab and data science teams.
[28:07] Evozyne’s approach to developing and testing new data science tools, models, and technologies.
[31:35] Predictions for future developments in Generative AI for biotechnology.
[33:41] Evozyne’s goal to increase throughput and its planned approach.
[39:09] Where to connect with Ryan and keep up to date with news from Evozyne.
This week on Data in Biotech, Ross is joined by Jonathan Eads, VP of Engineering at genomics intelligence company Genomenon, to discuss how his work supports the company’s mission to make genomic evidence actionable.
Jonathan explains his current role leading the teams focused on clinical engineering, curation engineering, platform development and overseeing Genomenon’s data science and AI efforts.
He gives insight into how Genomenon’s software works to scan genomics literature and index genetic variants, providing critical evidence-based guidance for those working across biotech, pharmaceutical, and medical disciplines.
Jonathan outlines the issues with inconsistent genetic data, variant nomenclature and extracting genetic variants from unstructured text, before explaining how human curators are essential to ensure accuracy of output.
Jonathan and Ross discuss the opportunities and limitations that come with using AI and natural language processing (NLP) techniques for genetic variant analysis.
Jonathan lays out the process of developing robust validation datasets and fine-tuning AI models to handle issues like syntax anomalies and outlines the need to balance the short-term need for data quality with the long-term goal of advancing the platform’s AI and automation capabilities.
We hear notable success stories of how Genomenon’s platform is being used to accelerate variant interpretation, disease diagnosis, and precision medicine development.
Finally, Ross gets Jonathan’s take on the future of genomics intelligence, including the potential of end-to-end linkage of information from variants all the way out to patient populations.
Data in Biotech is a fortnightly podcast exploring how companies leverage data innovation in the life sciences.
Chapter Markers
[1:50] Introduction to Jonathan and his academic and career background.
[5:14] What Genomenon’s mission to ‘make genomic evidence actionable’ looks like in practice.
[14:48] The limitations of how scientists and doctors have historically been able to use literature to understand genetic variants.
[16:08] Challenges with nomenclature and indexing and how this impacts on access to information.
[18:11] Extracting genetic variants from scientific publications into a structured, searchable index.
[22:04] Using a combination of software processes and human curation for accurate research outputs.
[24:57] Building high functionality, complex, and accurate software processes to analyze genomic literature.
[29:45] Dealing with the challenges of AI and the role of human curators for the accuracy of genetic variant classification.
[34:37] Managing the trade-off between short-term needs for improved data and long-term goals for automation and AI development.
[38:39] Success stories using the Genomenon platform including making an FDA case and diagnosing rare disease.
[41:55] Predictions for future advancements in literature search for genetic variant analysis.
[43:21] The potential impact of Genomenon’s acquisition of Jack's Clinical Knowledge Base.
This week on Data in Biotech, we’re joined by Amy Gamber, VP of Manufacturing at Atara Biotherapeutics, an allogeneic T-cell immunotherapy company developing off-the-shelf treatments to help achieve faster patient outcomes.
As a treatment that sits at the cutting edge of options available for cancer and autoimmune disease, host Ross Katz explores how Atara is able to deliver personalized medicine that can be with the patient inside a three-day window.
Amy is clear-eyed about what works well in this field and what doesn’t. We gain insight into the complexity of developing this type of cell therapy and the subsequent production challenges of manufacturing at scale. We also cover the manufacturing process, corresponding data problems Amy encounters on a day-to-day basis in her role as VP of Manufacturing, and the strategies she employs to overcome them.
Amy discusses the importance of continuous quality monitoring and the need to introduce it from an early stage to see how a program changes through the development phases. She highlights the importance of data as a tool for the ‘detective work’ needed to understand where problems arise during manufacturing.
Finally, she and Ross close the episode by discussing the future of cell therapy manufacturing, a world where modeling enables predictive QC, the possibilities of AI, and the need to standardize data.
Data in Biotech is a fortnightly podcast exploring how companies leverage data innovation in the life sciences.
Chapter Markers
[1:56] Introduction to Amy and the manufacturing process at Atara, including the importance of cryo storage to facilitate faster patient treatment.
[6:37] Amy and Ross discuss the challenge of donor variability in cell therapy manufacturing and how to manage it.
[12:38] Ross asks about scaling cell therapy production and the different considerations for small batch versus commercial-scale manufacturing.
[15:47] Amy discusses the importance of continuous quality monitoring, highlighting the value of tracking metrics to ensure quality control and identify improvement opportunities
[18:46] Ross moves the focus to automating data collection, as he and Amy emphasize the need for more efficient data access and analysis for timely decision-making.
[20:50] Ross and Amy explore the data challenges biotechnology companies face, including the problem with manual data processes, creating feedback loops, and regulatory compliance.
[25:16] Amy explains how Atara addressed manufacturing efficiency challenges, the importance of ‘detective work’ to understand problem causes, and the process of solving them.
[33:28] Ross and Amy examine how to use data to gather meaningful manufacturing insights, particularly identifying true signals when analyzing small datasets.
[36:33] Ross talks about predictive QC measures as the solution to the point Amy makes about being able to guarantee product quality from the outset.
[37:31] Amy gives her perspective on the future of biotech manufacturing, the role of AI, predictive modeling, and the need for standardization in the industry.
Download our latest white paper on “Using Machine Learning to Implement Mid-Manufacture Quality Control in the Biotech Sector.”
Visit this link: https://connect.corrdyn.com/biotech-ml
It's a holiday week here in the U.S., and the CorrDyn team is currently getting together for our annual company retreat. So, we decided to do something a little different this week.
We're re-releasing abridged versions of two of our most popular episodes.
These are dedicated to a critical workflow at the intersection of data and scientific research - Design of Experiments.
Across these interviews, we brought together two leading experts to give you a comprehensive overview of how best-practice DOE works in the biotech industry - Wolfgang Halter from Merck Life Sciences and Markus Gershater from Synthace.
This week on Data in Biotech, we’re delighted to be joined by Guru and Satya Singh, co-founders of SciSpot, a company focused on transforming biotech companies through smarter embodiment of biological processes in data/software and acceleration of the R&D process.
They discuss how their respective biotech and data backgrounds led them to develop the platform and their very personal motivation behind their mission to enable data to accelerate life science research.
Guru and Satya explore the concept of giving biotech companies a “digital brain” that uses AI to learn from every experiment. They emphasize how this requires modern software principles like being API-first and data-centric.
Based on their work helping their biotech customers move towards this model, Guru and Satya discuss overcoming some of the biggest adoption challenges – instilling data competence, moving to standardized data models, and bridging the gap between wet lab scientists and computational experts.
Data in Biotech is a fortnightly podcast exploring how companies leverage data innovation in the life sciences.
Chapter Markers
[1:38] Guru and Satya both give a brief overview of their respective backgrounds and the industry challenges that led them to launch SciSpot.
[3:35] Guru discusses the challenges of bringing organic and inorganic intelligence together and introduces the concept of a “digital brain.”
[6:50] Ross asks about the components of the SciSpot platform and how it works for companies using it.
[9:42] Guru and Satya emphasize the challenge of educating scientists on the advantages of adopting an API-first, data science-focused system.
[12:09] Guru and Satya highlight the ‘a-ha’ moments for customers using the platform, which include standardizing data models and connecting all instruments into SciSpot.
[14:27] Satya discusses knowledge graphs and how the system enables both implicit tagging and human input to enrich the data for data science purposes.
[17:52] The discussion covers the need for flexible workflows in biotech and how SciSpot changes the way its customers think about data science workflows.
[22:44] Guru shares his views on the future of biotech companies and underlines the importance of standardized data models.
[25:59] The discussion covers the challenges of integrating biotech-specific systems into an API-first platform and the current gaps in data capabilities.
[29:45] Ross highlights the importance of a unified platform for the range of biotech personas to drive AI faster.
[31:32] Guru and Satya revisit their vision of biotech organizations with a “digital brain” and real-time, established feedback loops that will make them smarter.
[34:46] Guru and Satya share advice for biotech organizations, focusing on how they should think about data and tooling.
Download our latest white paper on “Using Machine Learning to Implement Mid-Manufacture Quality Control in the Biotech Sector.”
Visit this link: https://connect.corrdyn.com/biotech-ml
This week, Ross sits down with Mike Nally, CEO at Generate:Biomedicines, a pioneer in generative biology that is transforming the way medicines are developed. Mike joined the Data in Biotech podcast to discuss the AI-driven drug development landscape and how data is set to change the way every drug is made in the future.
Mike shares his journey to Generate:Biomedicines, motivated by the ambition to improve productivity and democratize the availability of drugs.
He discusses the latest in drug development trends, from how the availability of data accelerates what is possible to breakthroughs in de novo generation that allow proteins to be developed with unprecedented specificity. He shares how Generate innovates at each phase of AI-driven drug development and provides insight into Chroma, an open-source diffusion model, explaining how it allows scientists to push the boundaries of protein discovery.
Data in Biotech is a fortnightly podcast exploring how companies leverage data innovation in the life sciences.
Chapter Markers
[1:21] Mike gives a quick rundown of his background and the route to his current role as CEO at Generate:Biomedicines.
[4:03] Mike discusses the changes in the availability of data to advance biotechnology.
[6:37] Mike explains the process of designing new proteins and where AI fits into this.
[11:12] Mike introduces Chroma, an open-source diffusion model from Generate:Biomedicines, and explains how it allows scientists to expand the natural universe of proteins.
[16:12] Ross and Mike discuss the challenge of combining biology and computer training.
[18:09] Mike gives his view on the current status of machine learning's role in biotech R&D and how this will evolve.
[21:05] Mike emphasizes the importance of human attention in AI-driven drug discovery and outlines how technological advancements require workflow innovation.
[26:13] Mike highlights teamwork, company culture, and ambition as key differentiators for Generate:Biomedicines.
[28:05] Ross asks Mike his perspective on skepticism around AI-discovered drugs
[30:25] Mike shares updates on the two leading candidates coming out of Generate:Biomedicines.
This week, Nathan Clark, CEO at Ganymede, joins the Data in Biotech podcast to discuss the challenges of integrating lab instruments and data in the biotech industry and how Ganymede’s developer platform is helping to automate data integration and metadata management across Life Sciences.
Nathan sits down with Data in Biotech host Ross Katz to discuss the multiple factors that add to the complexity of handling lab data, from the evolutionary nature of biology to the lab instruments being used. Nathan explains the importance of collecting metadata as unique identifiers that are essential to facilitating automation and data workflows.
As the founder of Ganymede, Nathan outlines the fundamentals of the developer platform and how it has been built to deal practically with the data, workflow, and automation challenges unique to life sciences organizations. He explains the need for code to allow organizations to contextualize and consume data and how the platform is built to enable flexible last-mile integration. He also emphasizes Ganymede's vision to create tools at varying levels of the stack to glue together systems in whatever way is optimal for its specific ecosystem.
As well as giving an in-depth overview of how the Ganymede platform works, he also digs into some of the key challenges facing life sciences organizations as they undergo digital transformation journeys.
The need to engage with metadata from the outset to avoid issues down the line, how to rid organizations of secret Excel files and improve data collection, and the regulatory risks that come with poor metadata handling are all covered in this week’s episode.
Data in Biotech is a fortnightly podcast exploring how companies leverage data innovation in the life sciences.
Chapter Markers
[1:28] Nathan gives a quick overview of his background and the path that led him to launch Ganymede.
[5:43] Nathan gives us his perspective on where the complexity of life sciences data comes from.
[8:23] Nathan explains the importance of using code to cope with the high levels of complexity and how the Ganymede developer platform facilitates this.
[11:26] Nathan summarizes the three layers in the Ganymede platform: the ‘core platform’, ‘connectors’ or templates, and ‘transforms’, which allow data to be utilized.
[13:18] Nathan highlights the importance of associating lab data with a unique ID to facilitate data entry and automation.
[15:05] Nathan outlines why the drawbacks of manual data association are inefficient, unreliable, and difficult to maintain.
[16:43] Nathan explains what using Ganymede to manage data and metadata looks like from inside a company.
[24:50] Ross asks Nathan to describe how Ganymede assists with workflow automation and how it can overcome organization-specific challenges.
[27:42] Nathan highlights the challenges businesses are looking to solve when they turn to a solution, like Ganymede, pointing to three common scenarios.
[34:32] Nathan emphasizes the importance of laying the groundwork for a data future at an early stage.
[37:49] Nathan and Ross stress the need for a digital transformation roadmap, with smaller initiatives on the way demonstrating value in their own right.
[40:35] Nathan talks about the future for Ganymede and what is on the horizon for the company and their customers.
Download our latest white paper on “Using Machine Learning to Implement Mid-Manufacture Quality Control in the Biotech Sector.”
Visit this link: https://connect.corrdyn.com/biotech-ml
This week, Harshil Patel, Director of Scientific Development at Seqera, joins the Data in Biotech podcast to discuss the importance of collaborative, open-source projects in scientific research and how they support the need for reproducibility.
Harshil lifts the lid on how Nextflow has become a leading open-source workflow management tool for scientists and the benefits of using an open-source model. He talks in detail about the development of Nextflow and the wider Seqera ecosystem, the vision behind it, and the advantages and challenges of this approach to tooling.
He discusses how the nf-core community collaboratively develops and maintains over 100 pipelines using Nextflow and how the decision to constrain pipelines to one per analysis type promotes collaboration and consistency and avoids turning pipelines into the “wild west.”
We also look more practically at Nextflow adoption as Harshil delves into some of the challenges and how to overcome them.
He explores the wider Seqera ecosystem and how it helps users manage pipelines, analysis, and cloud infrastructure more efficiently, and he looks ahead to the future evolution of scientific research.
Data in Biotech is a fortnightly podcast exploring how companies leverage data innovation in the life sciences.
---
Chapter Markers
[1:23] Harshil shares a quick overview of his background in bioinformatics and his route to joining Seqera.
[3:37] Harshil gives an introduction to Nextflow, including its origins, development, and the benefits of using the platform for scientists.
[9:50] Harshil expands on some of the off-the-shelf process pipelines available through NFcore and how this is continuing to expand beyond genomics.
[12:08] Harshil explains NFcore’s open-source model, the advantages of constraining pipelines to one analysis per type, and how the Nextflow community works.
[17:43] Harshil talks about Nextflow's custom DSL and the advantages it offers users
[20:23] Harshil explains how Nextflow fits into the broader Seqera ecosystem.
[26:08] Ross asks Harshil about overcoming some of the challenges that arise with parallelization and optimizing pipelines
[28:01] Harshil talks about the features of Wave, Seqera’s containerization solution.
[32:16] Ross asks Harshil to share some of the most complex and impressive things he has seen done within the Seqera ecosystem.
[35:42] Harshil gives his take on how he sees the future of biotech genomics research evolution.
---
Download our latest white paper on “Using Machine Learning to Implement Mid-Manufacture Quality Control in the Biotech Sector.”
Visit this link: https://connect.corrdyn.com/biotech-ml
The podcast currently has 29 episodes available.
32,040 Listeners
2,830 Listeners
1,566 Listeners
1,440 Listeners
2,322 Listeners
304 Listeners
177 Listeners
733 Listeners
4,144 Listeners
380 Listeners
144 Listeners
126 Listeners
44 Listeners
119 Listeners
504 Listeners