
Sign up to save your podcasts
Or


Evaluating models on benchmarks, passing a model vibe check, formal reasoning to synthesize datasets, and what type of datasets researchers prefer
By Pierce Freeman & Richard Diehl MartinezEvaluating models on benchmarks, passing a model vibe check, formal reasoning to synthesize datasets, and what type of datasets researchers prefer