November 30, 2017

33: Katharine Jarmul - Testing in Data Science

Listen Later

37 minutes

A discussion with Katharine Jarmul, aka kjam, about some of the challenges of data science with respect to testing.

Some of the topics we discuss:

experimentation vs testing

testing pipelines and pipeline changes

automating data validation

property based testing

schema validation and detecting schema changes

using unit test techniques to test data pipeline stages

testing nodes and transitions in DAGs

testing expected and unexpected data

missing data and non-signals

corrupting a dataset with noise

fuzz testing for both data pipelines and web APIs

datafuzz

hypothesis

testing internal interfaces

documenting and sharing domain expertise to build good reasonableness

intermediary data and stages

neural networks

speaking at conferences

Special Guest: Katharine Jarmul.

Links:

@kjam on Twitter — Data Magic and Computer Sorcery
Kjamistan: Data Science
datafuzz’s Python library — The goal of datafuzz is to give you the ability to test your data science code and models with BAD data.
Hypothesis Python library — Hypothesis is a Python library for finding edge cases in your code you wouldn’t have thought to look for.

...more

View all episodes

View all episodes

Download on the App Store

Download on the App Store

Get it on Google Play

Test & Code

By Brian Okken

4.7

7070 ratings

November 30, 2017

33: Katharine Jarmul - Testing in Data Science

Listen Later

37 minutes

A discussion with Katharine Jarmul, aka kjam, about some of the challenges of data science with respect to testing.

Some of the topics we discuss:

experimentation vs testing

testing pipelines and pipeline changes

automating data validation

property based testing

schema validation and detecting schema changes

using unit test techniques to test data pipeline stages

testing nodes and transitions in DAGs

testing expected and unexpected data

missing data and non-signals

corrupting a dataset with noise

fuzz testing for both data pipelines and web APIs

datafuzz

hypothesis

testing internal interfaces

documenting and sharing domain expertise to build good reasonableness

intermediary data and stages

neural networks

speaking at conferences

Special Guest: Katharine Jarmul.

Links:

@kjam on Twitter — Data Magic and Computer Sorcery
Kjamistan: Data Science
datafuzz’s Python library — The goal of datafuzz is to give you the ability to test your data science code and models with BAD data.
Hypothesis Python library — Hypothesis is a Python library for finding edge cases in your code you wouldn’t have thought to look for.

...more

More shows like Test & Code

Radiolab by WNYC Studios

Radiolab

43,837 Listeners

Software Engineering Daily by Software Engineering Daily

Software Engineering Daily

626 Listeners

Heavy Networking by Packet Pushers

Heavy Networking

326 Listeners

Talk Python To Me by Michael Kennedy

Talk Python To Me

583 Listeners

Python Bytes by Michael Kennedy and Brian Okken

Python Bytes

214 Listeners

Darknet Diaries by Jack Rhysider

Darknet Diaries

8,077 Listeners

The Real Python Podcast by Real Python

The Real Python Podcast

140 Listeners

Unexplainable by Vox

Unexplainable

2,303 Listeners

Network Automation Nerds by Packet Pushers

Network Automation Nerds

5 Listeners