
Sign up to save your podcasts
Or


This week on GAEA Talks Live from HumanX, Graeme Scott sits down with Alex Ratner - co-founder and CEO of Snorkel AI, affiliate assistant professor at the Paul G. Allen School of Computer Science at the University of Washington, and one of the most influential voices in the world on data-centric AI.Alex has been working in AI data for fifteen years. He earned his AB in Honors Physics at Harvard, completed his PhD in computer science at Stanford under Christopher Re, and led the open source Snorkel project that came out of his thesis. He co-founded Snorkel AI in 2019 to commercialise that research. Snorkel is now a frontier data lab supporting most of the major frontier AI labs and a growing number of vertical AI companies and enterprises with the data sets, environments and benchmarks that AI is actually trained, evaluated and improved on.In this episode, recorded live at HumanX 2026 in San Francisco, Alex argues that compute, talent and data are the three legs of the AI stool, and that data is the leg most people still underestimate. He explains why the more powerful and black-boxed models become, the more upstream data and context problems get hidden behind layers of abstraction. He walks Graeme through how pre-training, post-training and reinforcement learning are all really just stages of giving a model the right context. He shares why every enterprise will run their own data-centric loop in the next few years, why benchmarks are getting "benchmaxed" before they are useful, and why Snorkel has just committed three million dollars in Open Benchmarks Grants to fund the academic and open source community building the next generation of evaluation tools.What you'll take away from this conversation:• Why compute, talent and data are the three legs of AI - and why data is the most underestimated of the three• The "data is everyone else's problem" myth - and why it has held the field back for fifteen years• Why the more powerful and black-boxed AI becomes, the more dangerous the upstream data problems hidden underneath get• A working definition of context - from prompt context to pre-training mix to post-training and reinforcement learning• Why generalist and specialist models will coexist - and why your unique data is your specialisation edge• The Liverpool versus Jersey Shore thought experiment - and how subtle data biases shape model behaviour in ways we still cannot fully predict• Why benchmarks are critical, why they keep getting "benchmaxed", and why Snorkel is funding three million dollars of Open Benchmarks Grants for academia and open source• Moravec's paradox - and why we still confuse what is hard for humans with what is hard for AI• The jagged frontier of intelligence - and why understanding where AI fails is now a safety question, not just a capability question• Why coding agents look superhuman on contest problems but still fall down on long, messy, real-world software work• The Feynman plate-spinning anecdote - and why curiosity and "unimportant" problems are still where the breakthroughs come from• The data-centric loop - measure with data, find the gaps, build more data to fill them - and why every enterprise will be running it• Alex's single piece of advice for anyone serious about AI - do not forget the data, do not forget the context
By GAEA TalksThis week on GAEA Talks Live from HumanX, Graeme Scott sits down with Alex Ratner - co-founder and CEO of Snorkel AI, affiliate assistant professor at the Paul G. Allen School of Computer Science at the University of Washington, and one of the most influential voices in the world on data-centric AI.Alex has been working in AI data for fifteen years. He earned his AB in Honors Physics at Harvard, completed his PhD in computer science at Stanford under Christopher Re, and led the open source Snorkel project that came out of his thesis. He co-founded Snorkel AI in 2019 to commercialise that research. Snorkel is now a frontier data lab supporting most of the major frontier AI labs and a growing number of vertical AI companies and enterprises with the data sets, environments and benchmarks that AI is actually trained, evaluated and improved on.In this episode, recorded live at HumanX 2026 in San Francisco, Alex argues that compute, talent and data are the three legs of the AI stool, and that data is the leg most people still underestimate. He explains why the more powerful and black-boxed models become, the more upstream data and context problems get hidden behind layers of abstraction. He walks Graeme through how pre-training, post-training and reinforcement learning are all really just stages of giving a model the right context. He shares why every enterprise will run their own data-centric loop in the next few years, why benchmarks are getting "benchmaxed" before they are useful, and why Snorkel has just committed three million dollars in Open Benchmarks Grants to fund the academic and open source community building the next generation of evaluation tools.What you'll take away from this conversation:• Why compute, talent and data are the three legs of AI - and why data is the most underestimated of the three• The "data is everyone else's problem" myth - and why it has held the field back for fifteen years• Why the more powerful and black-boxed AI becomes, the more dangerous the upstream data problems hidden underneath get• A working definition of context - from prompt context to pre-training mix to post-training and reinforcement learning• Why generalist and specialist models will coexist - and why your unique data is your specialisation edge• The Liverpool versus Jersey Shore thought experiment - and how subtle data biases shape model behaviour in ways we still cannot fully predict• Why benchmarks are critical, why they keep getting "benchmaxed", and why Snorkel is funding three million dollars of Open Benchmarks Grants for academia and open source• Moravec's paradox - and why we still confuse what is hard for humans with what is hard for AI• The jagged frontier of intelligence - and why understanding where AI fails is now a safety question, not just a capability question• Why coding agents look superhuman on contest problems but still fall down on long, messy, real-world software work• The Feynman plate-spinning anecdote - and why curiosity and "unimportant" problems are still where the breakthroughs come from• The data-centric loop - measure with data, find the gaps, build more data to fill them - and why every enterprise will be running it• Alex's single piece of advice for anyone serious about AI - do not forget the data, do not forget the context