
Sign up to save your podcasts
Or


Read the full post here: https://www.interconnects.ai/p/building-on-evaluation-quicksand
Chapters
00:00 Building on evaluation quicksand
01:26 The causes of closed evaluation silos
06:35 The challenge facing open evaluation tools
10:47 Frontiers in evaluation
11:32 New types of synthetic data contamination
13:57 Building harder evaluations
Figures
Fig 1: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/manual/openai-predictions.webp
By Nathan Lambert4.1
99 ratings
Read the full post here: https://www.interconnects.ai/p/building-on-evaluation-quicksand
Chapters
00:00 Building on evaluation quicksand
01:26 The causes of closed evaluation silos
06:35 The challenge facing open evaluation tools
10:47 Frontiers in evaluation
11:32 New types of synthetic data contamination
13:57 Building harder evaluations
Figures
Fig 1: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/manual/openai-predictions.webp

536 Listeners

1,105 Listeners

291 Listeners

212 Listeners

203 Listeners

313 Listeners

101 Listeners

551 Listeners

150 Listeners

101 Listeners

228 Listeners

147 Listeners

475 Listeners

34 Listeners

39 Listeners