
Sign up to save your podcasts
Or
Read the full post here: https://www.interconnects.ai/p/building-on-evaluation-quicksand
Chapters
00:00 Building on evaluation quicksand
01:26 The causes of closed evaluation silos
06:35 The challenge facing open evaluation tools
10:47 Frontiers in evaluation
11:32 New types of synthetic data contamination
13:57 Building harder evaluations
Figures
Fig 1: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/manual/openai-predictions.webp
4.1
99 ratings
Read the full post here: https://www.interconnects.ai/p/building-on-evaluation-quicksand
Chapters
00:00 Building on evaluation quicksand
01:26 The causes of closed evaluation silos
06:35 The challenge facing open evaluation tools
10:47 Frontiers in evaluation
11:32 New types of synthetic data contamination
13:57 Building harder evaluations
Figures
Fig 1: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/manual/openai-predictions.webp
1,003 Listeners
512 Listeners
270 Listeners
193 Listeners
199 Listeners
279 Listeners
88 Listeners
348 Listeners
123 Listeners
190 Listeners
62 Listeners
138 Listeners
445 Listeners
29 Listeners
31 Listeners