<p>Evaluating models on benchmarks, passing a model vibe check, formal reasoning to synthesize datasets, and what type of datasets researchers prefer</p>

Evaluating models on benchmarks, passing a model vibe check, formal reasoning to synthesize datasets, and what type of datasets researchers prefer

Evaluation metrics for reasoning models

10 years after studying at Stanford, two friends have somehow become AI experts. One builds startups, the other studies at Cambridge - together they break down LLMs and machine learning with zero BS and maximum banter.

Share Evaluation metrics for reasoning models

Sign up to save your podcasts

Evaluation metrics for reasoning models

Evaluation metrics for reasoning models