June 03, 2026

AI in Radiology: Benchmarking LLMs, Agentic Hype, and Imaging Informatics | Satvik Tripathi

35 minutes

If you have ever watched a radiology AI demo hit 98% accuracy in testing and then wonder why nobody is actually using it in the clinic, this episode is for you. Hit subscribe so you never miss a conversation like this one.Jason sits down with Satvik Tripathi, incoming Medical Physics and Imaging Informatics PhD student at the University of Pennsylvania, AI scientist for RAD-AID International, and one of the sharpest voices in the field on the gap between research performance and real clinical value. Satvik has been working at the intersection of AI and radiology since 2019 and brings a perspective that cuts through the noise.They get into the hard questions: why multiple-choice benchmarks are a terrible way to evaluate medical LLMs, what data leakage is quietly doing to published performance numbers, and why a fine-tuned model is not always the winner in a clinical context. Satvik also breaks down what it actually takes to build a benchmark that means something, and shares early findings from his team's head-to-head testing of over 20 models on an internally annotated clinical dataset.The conversation also digs into agentic AI in imaging informatics, global health deployments through RAD-AID in Botswana and India, AI-assisted oncology workflows, and why running smaller open-source models locally might be smarter than everyone thinks. Plus, Satvik makes a case that prompt engineering is not a productivity shortcut but a legitimate scientific method.Whether you are a PACS administrator, imaging analyst, radiology IT professional, or just someone trying to figure out which AI tools are actually worth your time, this is the kind of conversation that helps you cut through the hype and think more clearly about what is coming.If this episode is useful to you, please subscribe, leave a review, and share it with a colleague in the imaging informatics community. It makes a real difference. And if you are working toward your CIIP credential or want to go deeper on the foundations of this field, check out the CIIP Foundations Program and the upcoming DICOM training with hands-on live imaging learning labs at nagelsconsulting.com.Learn more at nagelsconsulting.comKey Topics Covered

Why AI model performance metrics often fail to predict real-world clinical impact, and the two questions every AI deployment team should be asking before going live
The flaws in how medical LLMs are benchmarked today, including multiple-choice test limitations, data leakage, and the gap between controlled evaluations and actual clinical usefulness
How Satvik's team built an internal annotated dataset and tested more than 20 models head-to-head, with results that challenge conventional assumptions about fine-tuned models
The promise and current limitations of agentic AI in radiology, including what true agentic systems require versus what vendors are actually shipping• Using AI to democratize global healthcare through RAD-AID's work in Botswana and India, including Google-funded foundation model deployments and lessons that translate back to Western healthcare systems
Why prompt engineering is a scientific method, not just a productivity trick, and how structured prompting can reduce hallucinations and improve reproducibility in clinical AI applications
The practical case for smaller, on-premises open-source models over large cloud-based generalist models, including cost, privacy, sustainability, and compliance considerations

...more

View all episodes

By Nagels Consulting

June 03, 2026

AI in Radiology: Benchmarking LLMs, Agentic Hype, and Imaging Informatics | Satvik Tripathi

35 minutes

Why AI model performance metrics often fail to predict real-world clinical impact, and the two questions every AI deployment team should be asking before going live
The flaws in how medical LLMs are benchmarked today, including multiple-choice test limitations, data leakage, and the gap between controlled evaluations and actual clinical usefulness
How Satvik's team built an internal annotated dataset and tested more than 20 models head-to-head, with results that challenge conventional assumptions about fine-tuned models
The promise and current limitations of agentic AI in radiology, including what true agentic systems require versus what vendors are actually shipping• Using AI to democratize global healthcare through RAD-AID's work in Botswana and India, including Google-funded foundation model deployments and lessons that translate back to Western healthcare systems
Why prompt engineering is a scientific method, not just a productivity trick, and how structured prompting can reduce hallucinations and improve reproducibility in clinical AI applications
The practical case for smaller, on-premises open-source models over large cloud-based generalist models, including cost, privacy, sustainability, and compliance considerations

...more

Share AI in Radiology: Benchmarking LLMs, Agentic Hype, and Imaging Informatics | Satvik Tripathi

Sign up to save your podcasts

AI in Radiology: Benchmarking LLMs, Agentic Hype, and Imaging Informatics | Satvik Tripathi

AI in Radiology: Benchmarking LLMs, Agentic Hype, and Imaging Informatics | Satvik Tripathi