
Sign up to save your podcasts
Or


These weeks I will be putting here the lectures I am currently delivering for Biobanking for Data Science, a module that I lead at the University of Westminster for the MSc AI Digital Health course I also lead. I will be providing a summary and the recordings for these lectures.
The research presented here is available via this preprint, which is currently in revision.
Biobanks have become one of the most powerful infrastructures in modern biomedical research. By systematically collecting, storing, and cataloguing biological materials alongside rich clinical and lifestyle data, they enable discoveries that shape the way we understand disease, predict risk, and develop personalized treatments.
Over the past two decades, biobanking has evolved from static repositories into dynamic digital ecosystems. What began in the early 2000s with standardization efforts has now entered an era of high-throughput data generation, ethical regulation, and most recently artificial intelligence (AI) integration . Today, biobanks are not just about sample storage; they are about data-driven health intelligence.
Biobanks underpin critical advances in both research and healthcare:
From the UK Biobank’s half a million participants to the U.S. All of Us initiative and the Estonian Biobank, these large-scale resources are shaping how science approaches both common and rare diseases .
The 2020s have brought a decisive shift: the convergence of biobanking with AI and digital health. Machine learning tools now allow us to sift through immense, heterogeneous datasets—genomes, imaging, electronic health records—to generate insights that were previously inaccessible .
This is where Large Language Models (LLMs) enter the stage. Trained on massive text corpora, LLMs like GPT and Claude are proving their value in mining biobank-related literature, summarising biomedical findings, and even benchmarking patterns across thousands of studies .
In recent work, I explored how LLMs perform when applied to UK Biobank research outputs. The findings are illuminating:
Yet challenges remain—coverage does not guarantee accuracy, and LLMs still struggle with clinical reasoning, multimodal integration, and hypothesis generation .
The promise of AI-enabled biobanking is vast, but so are the challenges:
If addressed properly, the integration of biobanking with LLM-driven analytics could revolutionise our ability to link genetics, lifestyle, environment, and health outcomes.
The future of biobanking lies in its digital transformation. AI and LLMs are not replacing the scientific process but augmenting it, helping us navigate complexity, accelerate discovery, and bring equitable precision medicine closer to reality.
The question now is not whether we will use these tools, but how responsibly and effectively we can embed them into global health research infrastructures.
By Manuel CorpasThese weeks I will be putting here the lectures I am currently delivering for Biobanking for Data Science, a module that I lead at the University of Westminster for the MSc AI Digital Health course I also lead. I will be providing a summary and the recordings for these lectures.
The research presented here is available via this preprint, which is currently in revision.
Biobanks have become one of the most powerful infrastructures in modern biomedical research. By systematically collecting, storing, and cataloguing biological materials alongside rich clinical and lifestyle data, they enable discoveries that shape the way we understand disease, predict risk, and develop personalized treatments.
Over the past two decades, biobanking has evolved from static repositories into dynamic digital ecosystems. What began in the early 2000s with standardization efforts has now entered an era of high-throughput data generation, ethical regulation, and most recently artificial intelligence (AI) integration . Today, biobanks are not just about sample storage; they are about data-driven health intelligence.
Biobanks underpin critical advances in both research and healthcare:
From the UK Biobank’s half a million participants to the U.S. All of Us initiative and the Estonian Biobank, these large-scale resources are shaping how science approaches both common and rare diseases .
The 2020s have brought a decisive shift: the convergence of biobanking with AI and digital health. Machine learning tools now allow us to sift through immense, heterogeneous datasets—genomes, imaging, electronic health records—to generate insights that were previously inaccessible .
This is where Large Language Models (LLMs) enter the stage. Trained on massive text corpora, LLMs like GPT and Claude are proving their value in mining biobank-related literature, summarising biomedical findings, and even benchmarking patterns across thousands of studies .
In recent work, I explored how LLMs perform when applied to UK Biobank research outputs. The findings are illuminating:
Yet challenges remain—coverage does not guarantee accuracy, and LLMs still struggle with clinical reasoning, multimodal integration, and hypothesis generation .
The promise of AI-enabled biobanking is vast, but so are the challenges:
If addressed properly, the integration of biobanking with LLM-driven analytics could revolutionise our ability to link genetics, lifestyle, environment, and health outcomes.
The future of biobanking lies in its digital transformation. AI and LLMs are not replacing the scientific process but augmenting it, helping us navigate complexity, accelerate discovery, and bring equitable precision medicine closer to reality.
The question now is not whether we will use these tools, but how responsibly and effectively we can embed them into global health research infrastructures.