Pretrained

Evaluation metrics for reasoning models


Listen Later

Evaluating models on benchmarks, passing a model vibe check, formal reasoning to synthesize datasets, and what type of datasets researchers prefer

...more
View all episodesView all episodes
Download on the App Store

PretrainedBy Pierce Freeman & Richard Diehl Martinez