O'Reilly Data Show Podcast

Enabling end-to-end machine learning pipelines in real-world applications

06.20.2019 - By O'Reilly MediaPlay

Download our free app to listen on your phone

Download on the App StoreGet it on Google Play

In this episode of the Data Show, I spoke with Nick Pentreath, principal engineer at IBM. Pentreath was an early and avid user of Apache Spark, and he subsequently became a Spark committer and PMC member. Most recently his focus has been on machine learning, particularly deep learning, and he is part of a group within IBM focused on building open source tools that enable end-to-end machine learning pipelines.

We had a great conversation spanning many topics, including:

AI Fairness 360 (AIF360), a set of fairness metrics for data sets and machine learning models

Adversarial Robustness Toolbox (ART), a Python library for adversarial attacks and defenses.

Model Asset eXchange (MAX), a curated and standardized collection of free and open source deep learning models.

Tools for model development, governance, and operations, including MLflow, Seldon Core, and Fabric for deep learning

Reinforcement learning in the enterprise, and the emergence of relevant open source tools like Ray.

Related resources:

“Modern Deep Learning: Tools and Techniques”—a new tutorial at the Artificial Intelligence conference in San Jose

Harish Doddi on “Simplifying machine learning lifecycle management”

Sharad Goel and Sam Corbett-Davies on “Why it’s hard to design fair machine learning models”

“Managing risk in machine learning”: considerations for a world where ML models are becoming mission critical

“The evolution and expanding utility of Ray”

“Local Interpretable Model-Agnostic Explanations (LIME): An Introduction”

Forough Poursabzi Sangdeh on why “It’s time for data scientists to collaborate with researchers in other disciplines”

More episodes from O'Reilly Data Show Podcast