Disseminate: The Computer Science Research Podcast

Till Döhmen | DuckDQ: A Python library for data quality checks in ML pipelines


Listen Later

In this episode we kick off our DuckDB in Research series with Till Döhmen, a software engineer at MotherDuck, where he leads AI efforts. Till shares insights into DuckDQ, a Python library designed for efficient data quality validation in machine learning pipelines, leveraging DuckDB’s high-performance querying capabilities.


We discuss the challenges of ensuring data integrity in ML workflows, the inefficiencies of existing solutions, and how DuckDQ provides a lightweight, drop-in replacement that seamlessly integrates with scikit-learn. Till also reflects on his research journey, the impact of DuckDB’s optimizations, and the future potential of data quality tooling. Plus, we explore how AI tools like ChatGPT are reshaping research and productivity. Tune in for a deep dive into the intersection of databases, machine learning, and data validation!


Resources:

  • GitHub
  • Paper
  • Slides
  • Till's Homepage
  • datasketches extension (released by a DuckDB community member 2 weeks after we recorded!)

Hosted on Acast. See acast.com/privacy for more information.

...more
View all episodesView all episodes
Download on the App Store

Disseminate: The Computer Science Research PodcastBy Jack Waudby

  • 5
  • 5
  • 5
  • 5
  • 5

5

6 ratings


More shows like Disseminate: The Computer Science Research Podcast

View all
The Changelog: Software Development, Open Source by Changelog Media

The Changelog: Software Development, Open Source

288 Listeners

Software Engineering Daily by Software Engineering Daily

Software Engineering Daily

625 Listeners

Kubernetes Podcast from Google by Abdel Sghiouar, Kaslin Fields

Kubernetes Podcast from Google

182 Listeners

Hard Fork by The New York Times

Hard Fork

5,497 Listeners

Developer Voices by Kris Jenkins

Developer Voices

30 Listeners

Complex Systems with Patrick McKenzie (patio11) by Patrick McKenzie

Complex Systems with Patrick McKenzie (patio11)

131 Listeners