Everyone talks about the magic of AI, but the real war is over data. This episode pulls back the curtain on the messy, multi-billion-dollar process of finding, cleaning, and filtering the information that trains large language models. We explore why the era of simply "hoovering" the internet is over, how deduplication and quality filtering work, and why the "well of high-quality data" might be running dry.

AI's Data Kitchen: From Hoovering to Fine-Tuning

A man, a sloth, and a donkey collaborate to create a podcast (with a little help from AI). No question is too obscure, no rabbit hole too deep. My Weird Prompts celebrates curiosity in all its forms. Daniel, the human, asks the questions that pop into his head at inconvenient moments. Corn the Sloth offers laid-back, thoughtful takes. Herman the Donkey brings boundless enthusiasm and energy. Together, they explore topics ranging from the mundane to the mind-bending. Each episode begins with a real voice memo from Daniel, processed through an AI pipeline that generates scripts, synthesizes voices, and assembles the final podcast. Stay curious.

Share AI's Data Kitchen: From Hoovering to Fine-Tuning

Sign up to save your podcasts

AI's Data Kitchen: From Hoovering to Fine-Tuning

AI's Data Kitchen: From Hoovering to Fine-Tuning