Test & Code

33: Katharine Jarmul - Testing in Data Science


Listen Later

A discussion with Katharine Jarmul, aka kjam, about some of the challenges of data science with respect to testing.

Some of the topics we discuss:

  • experimentation vs testing
  • testing pipelines and pipeline changes
  • automating data validation
  • property based testing
  • schema validation and detecting schema changes
  • using unit test techniques to test data pipeline stages
  • testing nodes and transitions in DAGs
  • testing expected and unexpected data
  • missing data and non-signals
  • corrupting a dataset with noise
  • fuzz testing for both data pipelines and web APIs
  • datafuzz
  • hypothesis
  • testing internal interfaces
  • documenting and sharing domain expertise to build good reasonableness
  • intermediary data and stages
  • neural networks
  • speaking at conferences
  • Special Guest: Katharine Jarmul.

    Sponsored By:

    • Python Testing with pytestSimple, Rapid, Effective, and Scalable
    The fastest way to learn pytest. From 0 to expert in under 200 pages.
  • Patreon SupportersHelp support the show with as little as $1 per month.
  • Funds help pay for expenses associated with the show.

    Support Test & Code - Software Testing, Development, Python

    Links:

    • @kjam on Twitter — Data Magic and Computer Sorcery
    • Kjamistan: Data Science
    • datafuzz’s Python library — The goal of datafuzz is to give you the ability to test your data science code and models with BAD data.
    • Hypothesis Python library — Hypothesis is a Python library for finding edge cases in your code you wouldn’t have thought to look for.
    ...more
    View all episodesView all episodes
    Download on the App Store

    Test & CodeBy Brian Okken

    • 4.7
    • 4.7
    • 4.7
    • 4.7
    • 4.7

    4.7

    70 ratings


    More shows like Test & Code

    View all
    Radiolab by WNYC Studios

    Radiolab

    44,019 Listeners

    Software Engineering Daily by Software Engineering Daily

    Software Engineering Daily

    624 Listeners

    Heavy Networking by Packet Pushers

    Heavy Networking

    328 Listeners

    Talk Python To Me by Michael Kennedy

    Talk Python To Me

    588 Listeners

    Python Bytes by Michael Kennedy and Brian Okken

    Python Bytes

    214 Listeners

    Darknet Diaries by Jack Rhysider

    Darknet Diaries

    7,999 Listeners

    The Real Python Podcast by Real Python

    The Real Python Podcast

    141 Listeners

    Unexplainable by Vox

    Unexplainable

    2,278 Listeners

    Network Automation Nerds by Packet Pushers

    Network Automation Nerds

    3 Listeners