MLOps.community

MLOps Coffee Sessions #10 Analyzing the Article “Continuous Delivery and Automation Pipelines in Machine Learning" // Part 2


Listen Later

Second installation, David and Demetrios are reviewing the Google paper about Continuous training and automated pipelines. They dive deep into machine learning monitoring and also what exactly continuous training actually entails. Some key highlights are:

Automatically retraining and serving the models:
When to do it?
Outlier detection
Drift detection

Outlier detection:
What is it?
How you deal with it
Drift detection
Individual features may start to drift. This could be a bug, or it could be perfectly normal behavior that indicates that the world has changed, requiring the model to be retrained.

Example changes:
shifts in people’s preferences
marketing campaigns
competitor moves
the weather
the news cycle
Locations
Time
Devices (clients)

If the world you're working with is changing over time, model deployment should be treated as a continuous process. What this tells me is that you should keep the data scientists and engineers working on the model instead of immediately moving to another project.

Deeper dive into concept drift
Feature/target distributions change


An overview of concept drift applications: “.. data analysis applications, data evolve over time and must be analyzed in near real time. Patterns and relations in such data often evolve over time; thus, models built for analyzing such data quickly become obsolete over time. In machine learning and data mining, this phenomenon is referred to as concept drift.”
https://www.win.tue.nl/~mpechen/publications/pubs/CD_applications15.pdf
https://www-ai.cs.tu-dortmund.de/LEHRE/FACHPROJEKT/SS12/paper/concept-drift/tsymbal2004.pdf


Types of concept drift:
Sudden
Gradual

Google, in some way, is trying to address this concern - the world is changing, and you want your ML system to change as well, so it can avoid decreased performance but also improve over time and adapt to its environment. This sort of robustness is necessary for certain domains.
Continuous delivery and automation of pipelines (data, training, prediction service) was built with this in mind. Minimizing the commit-to-deploy interval and maximizing the velocity of software delivery and its components: maintainability, extensibility, and testability
Then the pipeline is ready, you can now run it. So you can do this continuously. After the pipeline is deployed to the production environment, it will be executed automatically and repetitively to produce a trained model that is stored in a central model registry.
This pipeline should be able to be run on a schedule or based on triggers: certain events that you have configured for your business domain - new data or drop in performance from the prod model.
The link between the model artifact and the pipeline is never severed. What pipeline trained them? What data was extracted, validated, and how was it prepared? What was the training configuration, and how was it evaluated? Etc. metrics are key here! Lineage tracking!!!
Keeping a close tie between the dev/experiment pipeline and the continuous production pipeline helps avoid inconsistencies between model artifacts produced by the pipeline and models being served - hard to debug


Join our Slack community: https://go.mlops.community/slack
Follow us on Twitter: @mlopscommunity
Sign up for the next meetup: https://go.mlops.community/register


Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/
Connect with David on LinkedIn: https://www.linkedin.com/in/aponteanalytics/
Connect with Cris Sterry on LinkedIn: https://www.linkedin.com/in/chrissterry/

...more
View all episodesView all episodes
Download on the App Store

MLOps.communityBy Demetrios

  • 4.6
  • 4.6
  • 4.6
  • 4.6
  • 4.6

4.6

23 ratings


More shows like MLOps.community

View all
This Week in Startups by Jason Calacanis

This Week in Startups

1,296 Listeners

The Changelog: Software Development, Open Source by Changelog Media

The Changelog: Software Development, Open Source

288 Listeners

The a16z Show by Andreessen Horowitz

The a16z Show

1,105 Listeners

Software Engineering Daily by Software Engineering Daily

Software Engineering Daily

626 Listeners

Talk Python To Me by Michael Kennedy

Talk Python To Me

583 Listeners

Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

Super Data Science: ML & AI Podcast with Jon Krohn

306 Listeners

NVIDIA AI Podcast by NVIDIA

NVIDIA AI Podcast

343 Listeners

Practical AI by Practical AI LLC

Practical AI

212 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

551 Listeners

Big Technology Podcast by Alex Kantrowitz

Big Technology Podcast

512 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

150 Listeners

Latent Space: The AI Engineer Podcast by Latent.Space

Latent Space: The AI Engineer Podcast

101 Listeners

This Day in AI Podcast by Michael Sharkey, Chris Sharkey

This Day in AI Podcast

228 Listeners

The AI Daily Brief: Artificial Intelligence News and Analysis by Nathaniel Whittemore

The AI Daily Brief: Artificial Intelligence News and Analysis

688 Listeners

AI + a16z by a16z

AI + a16z

34 Listeners