Simplifying machine learning lifecycle management

08.16.2018 - By O'Reilly Media Play

Download our free app to listen on your phone

In this episode of the Data Show, I spoke with Harish Doddi, co-founder and CEO of Datatron, a startup focused on helping companies deploy and manage machine learning models. As companies move from machine learning prototypes to products and services, tools and best practices for productionizing and managing models are just starting to emerge. Today’s data science and data engineering teams work with a variety of machine learning libraries, data ingestion, and data storage technologies. Risk and compliance considerations mean that the ability to reproduce machine learning workflows is essential to meet audits in certain application domains. And as data science and data engineering teams continue to expand, tools need to enable and facilitate collaboration.

As someone who specializes in helping teams turn machine learning prototypes into production-ready services, I wanted to hear what Doddi has learned while working with organizations that aspire to “become machine learning companies.”

Here are some highlights from our conversation:

A central platform for building, deploying, and managing machine learning models

In one of the companies where I worked, we had built infrastructure related to Spark. We were a heavy Spark shop. So we built everything around Spark and other components. But later, when that organization grew, a lot of people came from a TensorFlow background. That suddenly created a little bit of frustration in the team because everybody wanted to move to TensorFlow. But we had invested a lot of time, effort and energy in building the infrastructure for Spark.

… We suddenly had hidden technical debt that needed to be addressed. … Let’s say right now you have two models running in production and you know that in the next two or three years you are going to deploy 20 to 30 models. You need to start thinking about this ahead of time.

… That’s why these days I observed that organizations are creating centralized teams. The centralized team is responsible for maintaining flexible machine learning infrastructure that can be used to deploy, operate, and monitor many models simultaneously.

Feature store: Create, manage, and share canonical features

When I talk to companies these days, everybody knows that their data scientists are duplicating work because they don’t have a centralized feature store. Everybody I talk to really wants to build or even buy a feature store, depending on what is easiest for them.

… The number of data scientists within most companies is increasing. And one of the pain points I’ve observed is when a new data scientist joins an organization, there is an extreme amount of ramp-up period. A new data scientist needs to figure out what the data sets are, what the features are, so on and so forth. But if an organization had a feature store, the ramp-up period can be much faster.

Related resources:

“Lessons learned turning machine learning models into real products and services”

“What are machine learning engineers?”: examining a new role focused on creating data products and making data science work in production

“MLflow: A Platform for Managing the Machine Learning Lifecycle”

“Managing risk in machine learning models”: Andrew Burt and Steven Touw on how companies can manage models they cannot fully explain

“We need to build machine learning tools to augment machine learning engineers”

When models go rogue: David Talby on hard-earned lessons about using machine learning in production

More episodes from O'Reilly Data Show Podcast