June 11, 2026

An LLM Evaluation Framework for High-Stakes AI

Listen Later

16 minutes

Experimentation and validation of LLM performance is critical when building LLM-driven systems that must reliably deliver a service, from customer service chat bots to intelligence analysis tools. To help teams meet the need for rigorous evaluation methods, a research team in the SEI's AI Division led by Violet Turri has developed the Evaluating Large Language Models (ELM) library, which is built on best practices for LLM evaluation and benchmarking. In the latest episode from the Carnegie Mellon University Software Engineering Institute, Turri sits down with Katie Robinson, a design researcher also in the SEI's AI division, to discuss the ELM library, which turns evaluation from an ad-hoc process into a repeatable, extensible framework.

...more

View all episodes

View all episodes

Download on the App Store

Download on the App Store

Get it on Google Play

Software Engineering Institute (SEI) Podcast Series

By Members of Technical Staff at the Software Engineering Institute

4.5

1818 ratings

June 11, 2026

An LLM Evaluation Framework for High-Stakes AI

Listen Later

16 minutes

Experimentation and validation of LLM performance is critical when building LLM-driven systems that must reliably deliver a service, from customer service chat bots to intelligence analysis tools. To help teams meet the need for rigorous evaluation methods, a research team in the SEI's AI Division led by Violet Turri has developed the Evaluating Large Language Models (ELM) library, which is built on best practices for LLM evaluation and benchmarking. In the latest episode from the Carnegie Mellon University Software Engineering Institute, Turri sits down with Katie Robinson, a design researcher also in the SEI's AI division, to discuss the ELM library, which turns evaluation from an ad-hoc process into a repeatable, extensible framework.

...more

More shows like Software Engineering Institute (SEI) Podcast Series

Freakonomics Radio by Freakonomics Radio + Stitcher

Freakonomics Radio

32,100 Listeners

Software Engineering Radio - the podcast for professional software developers by team@se-radio.net (SE-Radio Team)

Software Engineering Radio - the podcast for professional software developers

275 Listeners

Making Sense with Sam Harris by Sam Harris

Making Sense with Sam Harris

26,250 Listeners

The a16z Show by Andreessen Horowitz

The a16z Show

1,093 Listeners

Software Engineering Daily by Software Engineering Daily

Software Engineering Daily

626 Listeners

Risky Business by Risky Business Media

Risky Business

375 Listeners

SANS Internet Stormcenter Daily Cyber Security Podcast (Stormcast) by Johannes B. Ullrich

SANS Internet Stormcenter Daily Cyber Security Podcast (Stormcast)

648 Listeners

Thoughtworks Technology Podcast by Thoughtworks

Thoughtworks Technology Podcast

43 Listeners

Smashing Security by Graham Cluley

Smashing Security

317 Listeners

Darknet Diaries by Jack Rhysider

Darknet Diaries

8,051 Listeners

Defense in Depth by CISO Series

Defense in Depth

73 Listeners

Make It Real by CMU Engineering

Make It Real

0 Listeners

SEI Cyber Talks by Members of Technical Staff

SEI Cyber Talks

0 Listeners

The Journal. by The Wall Street Journal & Spotify Studios

The Journal.

6,089 Listeners

Deep Questions with Cal Newport by Cal Newport

Deep Questions with Cal Newport

1,344 Listeners

Cybersecurity Headlines by CISO Series

Cybersecurity Headlines

136 Listeners

The Ezra Klein Show by New York Times Opinion

The Ezra Klein Show

15,950 Listeners