O'Reilly Data Show Podcast

Simplifying machine learning lifecycle management


Listen Later

In this episode of the Data Show, I spoke with Harish Doddi, co-founder and CEO of Datatron, a startup focused on helping companies deploy and manage machine learning models. As companies move from machine learning prototypes to products and services, tools and best practices for productionizing and managing models are just starting to emerge. Today’s data science and data engineering teams work with a variety of machine learning libraries, data ingestion, and data storage technologies. Risk and compliance considerations mean that the ability to reproduce machine learning workflows is essential to meet audits in certain application domains. And as data science and data engineering teams continue to expand, tools need to enable and facilitate collaboration.
As someone who specializes in helping teams turn machine learning prototypes into production-ready services, I wanted to hear what Doddi has learned while working with organizations that aspire to “become machine learning companies.”
Here are some highlights from our conversation:
A central platform for building, deploying, and managing machine learning models
In one of the companies where I worked, we had built infrastructure related to Spark. We were a heavy Spark shop. So we built everything around Spark and other components. But later, when that organization grew, a lot of people came from a TensorFlow background. That suddenly created a little bit of frustration in the team because everybody wanted to move to TensorFlow. But we had invested a lot of time, effort and energy in building the infrastructure for Spark.
… We suddenly had hidden technical debt that needed to be addressed. … Let’s say right now you have two models running in production and you know that in the next two or three years you are going to deploy 20 to 30 models. You need to start thinking about this ahead of time.
… That’s why these days I observed that organizations are creating centralized teams. The centralized team is responsible for maintaining flexible machine learning infrastructure that can be used to deploy, operate, and monitor many models simultaneously.
Feature store: Create, manage, and share canonical features
When I talk to companies these days, everybody knows that their data scientists are duplicating work because they don’t have a centralized feature store. Everybody I talk to really wants to build or even buy a feature store, depending on what is easiest for them.
… The number of data scientists within most companies is increasing. And one of the pain points I’ve observed is when a new data scientist joins an organization, there is an extreme amount of ramp-up period. A new data scientist needs to figure out what the data sets are, what the features are, so on and so forth. But if an organization had a feature store, the ramp-up period can be much faster.
Related resources:
“Lessons learned turning machine learning models into real products and services”
“What are machine learning engineers?”: examining a new role focused on creating data products and making data science work in production
“MLflow: A Platform for Managing the Machine Learning Lifecycle”
“Managing risk in machine learning models”: Andrew Burt and Steven Touw on how companies can manage models they cannot fully explain
“We need to build machine learning tools to augment machine learning engineers”
When models go rogue: David Talby on hard-earned lessons about using machine learning in production
...more
View all episodesView all episodes
Download on the App Store

O'Reilly Data Show PodcastBy O'Reilly Media

  • 4
  • 4
  • 4
  • 4
  • 4

4

63 ratings


More shows like O'Reilly Data Show Podcast

View all
The Changelog: Software Development, Open Source by Changelog Media

The Changelog: Software Development, Open Source

285 Listeners

O'Reilly Radar Podcast - O'Reilly Media Podcast by O'Reilly Media

O'Reilly Radar Podcast - O'Reilly Media Podcast

35 Listeners

Data Skeptic by Kyle Polich

Data Skeptic

475 Listeners

Talk Python To Me by Michael Kennedy

Talk Python To Me

580 Listeners

Software Engineering Daily by Software Engineering Daily

Software Engineering Daily

624 Listeners

O'Reilly Design Podcast - O'Reilly Media Podcast by O'Reilly Media

O'Reilly Design Podcast - O'Reilly Media Podcast

8 Listeners

AWS Podcast by Amazon Web Services

AWS Podcast

203 Listeners

Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

Super Data Science: ML & AI Podcast with Jon Krohn

295 Listeners

Python Bytes by Michael Kennedy and Brian Okken

Python Bytes

214 Listeners

Data Engineering Podcast by Tobias Macey

Data Engineering Podcast

139 Listeners

DataFramed by DataCamp

DataFramed

266 Listeners

Practical AI by Practical AI LLC

Practical AI

196 Listeners

Google DeepMind: The Podcast by Hannah Fry

Google DeepMind: The Podcast

188 Listeners

Me, Myself, and AI by MIT Sloan Management Review and Boston Consulting Group (BCG)

Me, Myself, and AI

101 Listeners

AI Chat: ChatGPT & AI News, Artificial Intelligence, OpenAI, Machine Learning by Jaeden Schafer

AI Chat: ChatGPT & AI News, Artificial Intelligence, OpenAI, Machine Learning

139 Listeners

This Day in AI Podcast by Michael Sharkey, Chris Sharkey

This Day in AI Podcast

178 Listeners

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis by Nathaniel Whittemore

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis

397 Listeners