Test & Code

33: Katharine Jarmul - Testing in Data Science


Listen Later

A discussion with Katharine Jarmul, aka kjam, about some of the challenges of data science with respect to testing.

Some of the topics we discuss:

  • experimentation vs testing
  • testing pipelines and pipeline changes
  • automating data validation
  • property based testing
  • schema validation and detecting schema changes
  • using unit test techniques to test data pipeline stages
  • testing nodes and transitions in DAGs
  • testing expected and unexpected data
  • missing data and non-signals
  • corrupting a dataset with noise
  • fuzz testing for both data pipelines and web APIs
  • datafuzz
  • hypothesis
  • testing internal interfaces
  • documenting and sharing domain expertise to build good reasonableness
  • intermediary data and stages
  • neural networks
  • speaking at conferences
  • Special Guest: Katharine Jarmul.

    Sponsored By:

    • Python Testing with pytestSimple, Rapid, Effective, and Scalable
    The fastest way to learn pytest. From 0 to expert in under 200 pages.
  • Patreon SupportersHelp support the show with as little as $1 per month.
  • Funds help pay for expenses associated with the show.

    Support Test & Code - Software Testing, Development, Python

    Links:

    • @kjam on Twitter — Data Magic and Computer Sorcery
    • Kjamistan: Data Science
    • datafuzz’s Python library — The goal of datafuzz is to give you the ability to test your data science code and models with BAD data.
    • Hypothesis Python library — Hypothesis is a Python library for finding edge cases in your code you wouldn’t have thought to look for.
    ...more
    View all episodesView all episodes
    Download on the App Store

    Test & CodeBy Brian Okken

    • 4.7
    • 4.7
    • 4.7
    • 4.7
    • 4.7

    4.7

    70 ratings


    More shows like Test & Code

    View all
    Software Engineering Radio - the podcast for professional software developers by se-radio@computer.org

    Software Engineering Radio - the podcast for professional software developers

    272 Listeners

    The Changelog: Software Development, Open Source by Changelog Media

    The Changelog: Software Development, Open Source

    283 Listeners

    Thoughtworks Technology Podcast by Thoughtworks

    Thoughtworks Technology Podcast

    41 Listeners

    Data Skeptic by Kyle Polich

    Data Skeptic

    483 Listeners

    Talk Python To Me by Michael Kennedy

    Talk Python To Me

    592 Listeners

    Software Engineering Daily by Software Engineering Daily

    Software Engineering Daily

    625 Listeners

    Soft Skills Engineering by Jamison Dance and Dave Smith

    Soft Skills Engineering

    269 Listeners

    Python Bytes by Michael Kennedy and Brian Okken

    Python Bytes

    213 Listeners

    Data Engineering Podcast by Tobias Macey

    Data Engineering Podcast

    142 Listeners

    Syntax - Tasty Web Development Treats by Wes Bos & Scott Tolinski - Full Stack JavaScript Web Developers

    Syntax - Tasty Web Development Treats

    981 Listeners

    Practical AI by Practical AI LLC

    Practical AI

    190 Listeners

    The Stack Overflow Podcast by The Stack Overflow Podcast

    The Stack Overflow Podcast

    64 Listeners

    The Real Python Podcast by Real Python

    The Real Python Podcast

    140 Listeners

    Oxide and Friends by Oxide Computer Company

    Oxide and Friends

    47 Listeners

    The Pragmatic Engineer by Gergely Orosz

    The Pragmatic Engineer

    52 Listeners