Sigmatic Science Podcast

Humanity's Last Exam: The Test AI Keeps Failing


Listen Later

2,500 questions no AI can Google. GPT-4o scored 2.7%, humans hit 90%. Inside the hardest AI benchmark and its 30% error rate.
...more
View all episodesView all episodes
Download on the App Store

Sigmatic Science PodcastBy Sigmatic