Deep Papers

LibreEval: The Largest Open Source Benchmark for RAG Hallucination Detection


Listen Later

For this week's paper read, we actually dive into our own research.

We wanted to create a replicable, evolving dataset that can keep pace with model training so that you always know you're testing with data your model has never seen before. We also saw the prohibitively high cost of running LLM evals at scale, and have used our data to fine-tune a series of SLMs that perform just as well as their base LLM counterparts, but at 1/10 the cost. 

So, over the past few weeks, the Arize team generated the largest public dataset of hallucinations, as well as a series of fine-tuned evaluation models.

We talk about what we built, the process we took, and the bottom line results.

📃 Read the paper: https://arize.com/llm-hallucination-dataset/

Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.

...more
View all episodesView all episodes
Download on the App Store

Deep PapersBy Arize AI

  • 5
  • 5
  • 5
  • 5
  • 5

5

13 ratings


More shows like Deep Papers

View all
a16z Podcast by Andreessen Horowitz

a16z Podcast

1,007 Listeners

Talk Python To Me by Michael Kennedy

Talk Python To Me

587 Listeners

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) by Sam Charrington

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

442 Listeners

Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

Super Data Science: ML & AI Podcast with Jon Krohn

296 Listeners

NVIDIA AI Podcast by NVIDIA

NVIDIA AI Podcast

321 Listeners

Y Combinator Startup Podcast by Y Combinator

Y Combinator Startup Podcast

210 Listeners

Practical AI by Practical AI LLC

Practical AI

188 Listeners

Machine Learning Street Talk (MLST) by Machine Learning Street Talk (MLST)

Machine Learning Street Talk (MLST)

90 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

350 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

128 Listeners

This Day in AI Podcast by Michael Sharkey, Chris Sharkey

This Day in AI Podcast

196 Listeners

Latent Space: The AI Engineer Podcast by swyx + Alessio

Latent Space: The AI Engineer Podcast

72 Listeners

AI + a16z by a16z

AI + a16z

33 Listeners

Lightcone Podcast by Y Combinator

Lightcone Podcast

22 Listeners

Training Data by Sequoia Capital

Training Data

37 Listeners