Thoughtworks Technology Podcast

AI testing, benchmarks and evals


Listen Later

Generative AI's popularity has led to a renewed interest in quality assurance — perhaps unsurprising given the inherent unpredictability of the technology. This is why, over the last year, the field has seen a number of techniques and approaches emerge, including evals, benchmarking and guardrails. While these terms all refer to different things, grouped together they all aim to improve the reliability and accuracy of generative AI.

To discuss these techniques and the renewed enthusiasm for testing across the industry, host Lilly Ryan is joined by Shayan Mohanty, Head of AI Research at Thoughtworks, and John Singleton, Program Manager for Thoughtworks' AI Lab. They discuss the differences between evals, benchmarking and testing and explore both what they mean for businesses venturing into generative AI and how they can be implemented effectively.

Learn more about evals, benchmarks and testing in this blog post by Shayan and John (written with Parag Mahajani): https://www.thoughtworks.com/insights/blog/generative-ai/LLM-benchmarks,-evals,-and-tests

...more
View all episodesView all episodes
Download on the App Store

Thoughtworks Technology PodcastBy Thoughtworks

  • 4.5
  • 4.5
  • 4.5
  • 4.5
  • 4.5

4.5

40 ratings


More shows like Thoughtworks Technology Podcast

View all
Hanselminutes with Scott Hanselman by Scott Hanselman

Hanselminutes with Scott Hanselman

378 Listeners

Software Engineering Radio - the podcast for professional software developers by se-radio@computer.org

Software Engineering Radio - the podcast for professional software developers

265 Listeners

The Changelog: Software Development, Open Source by Changelog Media

The Changelog: Software Development, Open Source

285 Listeners

The Cloudcast by Massive Studios

The Cloudcast

155 Listeners

Talk Python To Me by Michael Kennedy

Talk Python To Me

580 Listeners

Software Engineering Daily by Software Engineering Daily

Software Engineering Daily

624 Listeners

Soft Skills Engineering by Jamison Dance and Dave Smith

Soft Skills Engineering

271 Listeners

AWS Podcast by Amazon Web Services

AWS Podcast

203 Listeners

Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

Super Data Science: ML & AI Podcast with Jon Krohn

295 Listeners

Data Engineering Podcast by Tobias Macey

Data Engineering Podcast

139 Listeners

Syntax - Tasty Web Development Treats by Wes Bos & Scott Tolinski - Full Stack JavaScript Web Developers

Syntax - Tasty Web Development Treats

987 Listeners

CoRecursive: Coding Stories by Adam Gordon Bell - Software Developer

CoRecursive: Coding Stories

185 Listeners

Kubernetes Podcast from Google by Abdel Sghiouar, Kaslin Fields

Kubernetes Podcast from Google

184 Listeners

Practical AI by Practical AI LLC

Practical AI

196 Listeners

Pragmatism in Practice by Thoughtworks

Pragmatism in Practice

10 Listeners

The Stack Overflow Podcast by The Stack Overflow Podcast

The Stack Overflow Podcast

62 Listeners

Tecnología y Negocios by Thoughtworks

Tecnología y Negocios

1 Listeners

Hablando de software by Thoughtworks

Hablando de software

0 Listeners