Test & Code

95: Data Science Pipeline Testing with Great Expectations - Abe Gong


Listen Later

Data science and machine learning are affecting more of our lives every day. Decisions based on data science and machine learning are heavily dependent on the quality of the data, and the quality of the data pipeline.

Some of the software in the pipeline can be tested to some extent with traditional testing tools, like pytest.

But what about the data? The data entering the pipeline, and at various stages along the pipeline, should be validated.

That's where pipeline tests come in.

Pipeline tests are applied to data. Pipeline tests help you guard against upstream data changes and monitor data quality.

Abe Gong and Superconductive are building an open source project called Great Expectations. It's a tool to help you build pipeline tests.

This is quite an interesting idea, and I hope it gains traction and takes off.

Special Guest: Abe Gong.


Links:

  • Great Expectations
...more
View all episodesView all episodes
Download on the App Store

Test & CodeBy Brian Okken

  • 4.7
  • 4.7
  • 4.7
  • 4.7
  • 4.7

4.7

70 ratings


More shows like Test & Code

View all
Radiolab by WNYC Studios

Radiolab

44,013 Listeners

Software Engineering Daily by Software Engineering Daily

Software Engineering Daily

625 Listeners

Heavy Networking by Packet Pushers

Heavy Networking

328 Listeners

Talk Python To Me by Michael Kennedy

Talk Python To Me

588 Listeners

Python Bytes by Michael Kennedy and Brian Okken

Python Bytes

214 Listeners

Darknet Diaries by Jack Rhysider

Darknet Diaries

8,001 Listeners

The Real Python Podcast by Real Python

The Real Python Podcast

141 Listeners

Unexplainable by Vox

Unexplainable

2,278 Listeners

Network Automation Nerds by Packet Pushers

Network Automation Nerds

3 Listeners