O'Reilly Data Show Podcast

Building tools for enterprise data science


Listen Later

In this episode of the Data Show, I spoke with Vitaly Gordon, VP of data science and engineering at Salesforce. As the use of machine learning becomes more widespread, we need tools that will allow data scientists to scale so they can tackle many more problems and help many more people. We need automation tools for the many stages involved in data science, including data preparation, feature engineering, model selection and hyperparameter tuning, as well as monitoring.
I wanted the perspective of someone who is already faced with having to support many models in production. The proliferation of models is still a theoretical consideration for many data science teams, but Gordon and his colleagues at Salesforce already support hundreds of thousands of customers who need custom models built on custom data. They recently took their learnings public and open sourced TransmogrifAI, a library for automated machine learning for structured data, which sits on top of Apache Spark.
Here are some highlights from our conversation:
The need for an internal data science platform
It’s more about how much commonality there is between every single data science use case—how many of the problems are redundant and repeatable.
… A lot of data scientists solve problems that honestly have a lot to do with engineering, a lot to do with things that are not pure modeling.
TransmogrifAI
TransmogrifAI is an automated machine library for mostly structured data, and the problem that it aims to solve is that we at Salesforce have hundreds of thousands of customers. While all of them share a common set of data, the Salesforce platform itself is extremely customizable. Actually, 80% of the data inside the Salesforce platform actually sits in what we refer to as custom objects, which one can think of as custom tables in a database.
… We don’t build models that are shared between customers. We always use a single customer’s data. We have hundreds of thousands of models potentially that we need to build, and because of that, we needed to automate the entire process. We just cannot throw people at the problem. We basically created TransmogrifAI to automate the entire end-to-end process for creating a model for a user and we decided to open source it a couple months ago.
Related resources:
“What machine learning means for software development”
“We need to build machine learning tools to augment machine learning engineers”
Francesca Lazzeri and Jaya Mathew on “Lessons learned while helping enterprises adopt machine learning”
Tim Kraska on “How machine learning will accelerate data management systems”
“Managing risk in machine learning models”: Andrew Burt and Steven Touw on how companies can manage models they cannot fully explain.
“Lessons learned turning machine learning models into real products and services”
...more
View all episodesView all episodes
Download on the App Store

O'Reilly Data Show PodcastBy O'Reilly Media

  • 4
  • 4
  • 4
  • 4
  • 4

4

63 ratings


More shows like O'Reilly Data Show Podcast

View all
The Changelog: Software Development, Open Source by Changelog Media

The Changelog: Software Development, Open Source

283 Listeners

O'Reilly Radar Podcast - O'Reilly Media Podcast by O'Reilly Media

O'Reilly Radar Podcast - O'Reilly Media Podcast

36 Listeners

Data Skeptic by Kyle Polich

Data Skeptic

482 Listeners

Talk Python To Me by Michael Kennedy

Talk Python To Me

592 Listeners

Software Engineering Daily by Software Engineering Daily

Software Engineering Daily

623 Listeners

O'Reilly Design Podcast - O'Reilly Media Podcast by O'Reilly Media

O'Reilly Design Podcast - O'Reilly Media Podcast

8 Listeners

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) by Sam Charrington

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

446 Listeners

AWS Podcast by Amazon Web Services

AWS Podcast

202 Listeners

Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

Super Data Science: ML & AI Podcast with Jon Krohn

297 Listeners

NVIDIA AI Podcast by NVIDIA

NVIDIA AI Podcast

323 Listeners

Machine Learning Guide by OCDevel

Machine Learning Guide

764 Listeners

AI Today Podcast by AI & Data Today

AI Today Podcast

146 Listeners

DataFramed by DataCamp

DataFramed

267 Listeners

Practical AI by Practical AI LLC

Practical AI

192 Listeners

Google DeepMind: The Podcast by Hannah Fry

Google DeepMind: The Podcast

197 Listeners

Last Week in AI by Skynet Today

Last Week in AI

287 Listeners

This Day in AI Podcast by Michael Sharkey, Chris Sharkey

This Day in AI Podcast

199 Listeners