A discussion with Katharine Jarmul, aka kjam, about some of the challenges of data science with respect to testing.
Some of the topics we discuss:
experimentation vs testingtesting pipelines and pipeline changesautomating data validationproperty based testingschema validation and detecting schema changesusing unit test techniques to test data pipeline stagestesting nodes and transitions in DAGstesting expected and unexpected datamissing data and non-signalscorrupting a dataset with noisefuzz testing for both data pipelines and web APIsdatafuzzhypothesistesting internal interfacesdocumenting and sharing domain expertise to build good reasonableness intermediary data and stages neural networksspeaking at conferencesSpecial Guest: Katharine Jarmul.
Sponsored By:
- Python Testing with pytest, 2nd edition: The fastest way to learn pytest and practical testing practices.
- Patreon Supporters: Help support the show with as little as $1 per month and be the first to know when new episodes come out.
Links:
- @kjam on Twitter — Data Magic and Computer Sorcery
- Kjamistan: Data Science
- datafuzz’s Python library — The goal of datafuzz is to give you the ability to test your data science code and models with BAD data.
- Hypothesis Python library — Hypothesis is a Python library for finding edge cases in your code you wouldn’t have thought to look for.
★ Support this podcast on Patreon ★