
Sign up to save your podcasts
Or


Beyond ChatGPT and Gemini: Anthropic's Claude and the $4 billion Amazon investment. How AI industry benchmarks work, including LMSYS Arena Elo and MMLU (Measuring Massive Multitask Language Understanding). How benchmarks are constructed, what they measure, and how to use them to evaluate LLMs. Solo episode.
Anthropic's Claude
https://claude.ai [Note: I am not sponsored by Anthropic]
LMSYS Leaderboard
https://chat.lmsys.org/?leaderboard
To stay in touch, sign up for our newsletter at https://www.superprompt.fm
By Tony Wan5
1717 ratings
Beyond ChatGPT and Gemini: Anthropic's Claude and the $4 billion Amazon investment. How AI industry benchmarks work, including LMSYS Arena Elo and MMLU (Measuring Massive Multitask Language Understanding). How benchmarks are constructed, what they measure, and how to use them to evaluate LLMs. Solo episode.
Anthropic's Claude
https://claude.ai [Note: I am not sponsored by Anthropic]
LMSYS Leaderboard
https://chat.lmsys.org/?leaderboard
To stay in touch, sign up for our newsletter at https://www.superprompt.fm

31,971 Listeners

26,224 Listeners

4,065 Listeners

1,091 Listeners

301 Listeners

334 Listeners

207 Listeners

9,925 Listeners

2,072 Listeners

197 Listeners

635 Listeners

616 Listeners

163 Listeners

466 Listeners

164 Listeners