Intelligence Unbound

GDPval: AI Model Performance on Economic Tasks


Listen Later

The episode introduces GDPval, a new benchmark created by OpenAI to evaluate AI model performance on real-world, economically valuable tasks derived from the work of industry experts across the top nine sectors contributing to U.S. GDP. This evaluation covers tasks from 44 occupations and is intended to provide a more realistic assessment of AI capabilities than traditional academic benchmarks, including the use of multi-modal inputs and subjective grading by human experts. 

...more
View all episodesView all episodes
Download on the App Store

Intelligence UnboundBy Fourth Mind