
Sign up to save your podcasts
Or


Study investigates dataset contamination in large language models for mathematical reasoning using Grade School Math 1000 benchmark, finding evidence of overfitting and potential memorization of benchmark questions.
https://arxiv.org/abs//2405.00332
YouTube: https://www.youtube.com/@ArxivPapers
TikTok: https://www.tiktok.com/@arxiv_papers
Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016
Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers
By Igor Melnyk5
33 ratings
Study investigates dataset contamination in large language models for mathematical reasoning using Grade School Math 1000 benchmark, finding evidence of overfitting and potential memorization of benchmark questions.
https://arxiv.org/abs//2405.00332
YouTube: https://www.youtube.com/@ArxivPapers
TikTok: https://www.tiktok.com/@arxiv_papers
Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016
Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

955 Listeners

1,940 Listeners

437 Listeners

112,049 Listeners

9,973 Listeners

5,511 Listeners

211 Listeners

49 Listeners

91 Listeners

473 Listeners