
Sign up to save your podcasts
Or


For this week's paper read, we dive into our own research.
We wanted to create a replicable, evolving dataset that can keep pace with model training so that you always know you're testing with data your model has never seen before. We also saw the prohibitively high cost of running LLM evals at scale, and have used our data to fine-tune a series of SLMs that perform just as well as their base LLM counterparts, but at 1/10 the cost.
So, over the past few weeks, the Arize team generated the largest public dataset of hallucinations, as well as a series of fine-tuned evaluation models.
We talk about what we built, the process we took, and the bottom line results. You can read the recap of LibreEval here. Dive into the research, or sign up to join us next time.
Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.
By Arize AI5
1515 ratings
For this week's paper read, we dive into our own research.
We wanted to create a replicable, evolving dataset that can keep pace with model training so that you always know you're testing with data your model has never seen before. We also saw the prohibitively high cost of running LLM evals at scale, and have used our data to fine-tune a series of SLMs that perform just as well as their base LLM counterparts, but at 1/10 the cost.
So, over the past few weeks, the Arize team generated the largest public dataset of hallucinations, as well as a series of fine-tuned evaluation models.
We talk about what we built, the process we took, and the bottom line results. You can read the recap of LibreEval here. Dive into the research, or sign up to join us next time.
Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.

32,264 Listeners

107 Listeners

548 Listeners

1,067 Listeners

112,990 Listeners

232 Listeners

86 Listeners

6,126 Listeners

200 Listeners

765 Listeners

10,225 Listeners

99 Listeners

551 Listeners

5,553 Listeners

98 Listeners