August 11, 2025

148: Statistics of Generative and Non-Generative AI – 7-Part Livestream 4/7

36 minutes

Send us a text

You might be using AI models in pathology without even knowing if they’re giving you reliable results.

Let that sink in for a second—because today, we’re fixing that.

In this episode, I walk you through the real statistics that power—and sometimes fail—AI in digital pathology. It's episode 4 of our AI series, and we’re demystifying the metrics behind both generative and non-generative AI. Why does this matter? Because accuracy isn't enough. And not every model metric tells you the whole story.

If you’ve ever been impressed by a model’s "99% accuracy," you need to hear why that might actually be a red flag. I share personal stories (yes, including my early days in Germany when I didn’t even know what a "training set" was), and we break down confusing metrics like perplexity, SSIM, FID, and BLEU scores—so you can truly understand what your models are doing and how to evaluate them correctly.

Together, we’ll uncover how model evaluation works for:

Predictive Analytics (non-generative AI)
Generative AI (text/image generating models)
Regression vs. Classification use cases
Why confusion matrix metrics like sensitivity and specificity still matter—and when they don’t.

Whether you're a pathologist, a scientist, or someone leading a digital transformation team—you need this knowledge to avoid misleading data, flawed models, and missed opportunities.

🕒 EPISODE HIGHLIGHTS WITH TIMESTAMPS

[00:00] Warm greetings and a peek into my citizenship journey 👋
[02:30] How exam attire differs across countries
[04:00] Model evaluation isn't about memorizing metrics—it's about understanding concepts
[06:30] Story: My first exposure to AI misuse in pathology
[08:00] Confusion matrix basics: TP, FP, TN, FN
[11:00] Metrics breakdown: Accuracy, Sensitivity, Specificity, F1-score
[15:00] Regression-based metrics and why they matter
[18:00] Statistical challenges in Generative AI
[21:00] What is "Perplexity" and why low scores matter
[24:00] BLEU, ROUGE, and Next Sentence Prediction explained
[28:00] SSIM and FID scores for image quality in AI
[31:00] When metrics mislead: superficial similarity vs. real insight
[35:00] Best practices: Ensemble models, human-in-the-loop, and adversarial testing
[43:00] Choosing the right metric for the right model
[46:00] Closing thoughts on trust, testing, and trailblazing

📘 RESOURCE FROM THIS EPISODE:

🔗 Read the full paper discussed in this episode:
"Statistics of generative and non-generative artificial intelligence models in medicine"

💬 Final Thoughts

Statistical literacy isn’t optional anymore—especially in digital pathology. AI isn’t just a buzzword; it’s a tool, and if we want to lead this field forward, we must understand the systems we rely on. This episode will help you become not just a user, but a better steward of AI.

🎙️ Tune in now and let's keep trailblazing—together.

Support the show

Get the "Digital Pathology 101" FREE E-book and join us!

...more

View all episodes

By Aleksandra Zuraw, DVM, PhD

77 ratings