July 24, 2025

Prompt Engineering LLMs for Radiology: Multi-Institutional Study on Report Annotation Accuracy

34 minutes

In this episode of Imaging Informatics Unplugged, Dr. Mana Moassefi (incoming Radiology resident at Mayo Clinic) shares insights from a groundbreaking multi-institutional study that evaluated the use of large language models (LLMs) to annotate radiology reports — without any model training.

Mana and her collaborators from institutions including Mayo Clinic, UCSF, Moffitt Cancer Center, Harvard Medical School, UC Irvine, and Emory used prompt engineering to guide commercial LLMs in extracting diagnostic labels from 3,000 radiology reports. The study found:

🧠 LLMs outperformed traditional NLP due to better contextual understanding

🛠️ Prompt engineering was critical, often outperforming chat-based approaches

📊 Structured reporting improved accuracy, but radiologist variability still introduced challenges

😵‍💫 Hallucinations occurred, especially when LLMs were uncertain — leading to verbose, off-topic answers

📈 Consistency matters: AI may not be perfect, but it’s reliably imperfect — a surprising edge over human variability

We also explore future use cases, from structured patient-facing summaries to multi-agent AI workflows that could quantify uncertainty and improve communication across clinical systems.

🔔 Don’t forget to like, subscribe, and hit the bell to stay updated on future episodes exploring the intersection of imaging, informatics, and AI.

👉 Learn more at:

🌐 https://www.nagelsconsulting.com

📘 https://learn.nagelsconsulting.com

🔗 https://www.linkedin.com/company/nagels-consulting

...more

View all episodes

By Nagels Consulting

July 24, 2025

Prompt Engineering LLMs for Radiology: Multi-Institutional Study on Report Annotation Accuracy

34 minutes

🧠 LLMs outperformed traditional NLP due to better contextual understanding

🛠️ Prompt engineering was critical, often outperforming chat-based approaches

📊 Structured reporting improved accuracy, but radiologist variability still introduced challenges

😵‍💫 Hallucinations occurred, especially when LLMs were uncertain — leading to verbose, off-topic answers

📈 Consistency matters: AI may not be perfect, but it’s reliably imperfect — a surprising edge over human variability

We also explore future use cases, from structured patient-facing summaries to multi-agent AI workflows that could quantify uncertainty and improve communication across clinical systems.

🔔 Don’t forget to like, subscribe, and hit the bell to stay updated on future episodes exploring the intersection of imaging, informatics, and AI.

👉 Learn more at:

🌐 https://www.nagelsconsulting.com

📘 https://learn.nagelsconsulting.com

🔗 https://www.linkedin.com/company/nagels-consulting

...more

Share Prompt Engineering LLMs for Radiology: Multi-Institutional Study on Report Annotation Accuracy

Sign up to save your podcasts

Prompt Engineering LLMs for Radiology: Multi-Institutional Study on Report Annotation Accuracy

Prompt Engineering LLMs for Radiology: Multi-Institutional Study on Report Annotation Accuracy