MLOps.community

Machine Learning SRE // Niall Murphy // MLOps Coffee Sessions #54


Listen Later

Coffee Sessions #54 with Niall Murphy, Machine Learning SRE.


//Abstract
SRE is making its way into the machine learning world. Software engineering for machine learning requires reliability, performance, and maintainability. Site reliability engineering is the field that deals with reliability and ensuring constant, real-time performance. Niall Murphy, most recently Global Head of SRE at Microsoft Azure, helps us understand what SRE can do for modern ML products and teams.

Building machine learning teams requires a diverse set of technical experiences, and Niall shares his thoughts on how to do that most effectively. Machine learning organizations need to start to take advantage of SRE best practices like SLOs, which Niall walks through. Production machine learning depends on high-quality software engineering, and we get Niall's take on how to ensure that in a machine learning context.

// Bio
Niall Murphy has been interested in Internet infrastructure since the mid-1990s. He has worked with all of the major cloud providers from their Dublin, Ireland offices - most recently at Microsoft, where he was global head of Azure Site Reliability Engineering (SRE). His books have sold approximately a quarter of a million copies worldwide, most notably the award-winning Site Reliability Engineering, and he is probably one of the few people in the world to hold degrees in Computer Science, Mathematics, and Poetry Studies. He lives in Dublin, Ireland, with his wife and two children.

--------------- ✌️Connect With Us ✌️ -------------
Join our slack community: https://go.mlops.community/slack
Follow us on Twitter: @mlopscommunity
Sign up for the next meetup: https://go.mlops.community/register

Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/
Connect with David on LinkedIn: https://www.linkedin.com/in/aponteanalytics/
Connect with Vishnu on LinkedIn: https://www.linkedin.com/in/vrachakonda/
Connect with Niall on LinkedIn: https://www.linkedin.com/in/niallm/

Timestamps:
[00:00] Introduction to Niall Murphy
[00:36] SRE background to Machine Learning space transition
[07:10] SLO's being a challenge in the ML space
[09:42] SRE Hiring Investments
[15:10] Behavior of teams concept
[17:45] Challenges dealing with ML production
[18:27] Update on Reliable Machine Learning book
[22:46] Monitoring
[25:05] Difference between ML and SRE
[29:18] Incident response in Machine Learning
[34:46] Rollbacks
[35:50] Machine Learning burden overtime
[42:42] Niall's journey to the SRE space and focus to develop himself

...more
View all episodesView all episodes
Download on the App Store

MLOps.communityBy Demetrios

  • 4.9
  • 4.9
  • 4.9
  • 4.9
  • 4.9

4.9

20 ratings


More shows like MLOps.community

View all
Software Engineering Radio - the podcast for professional software developers by se-radio@computer.org

Software Engineering Radio - the podcast for professional software developers

272 Listeners

Data Skeptic by Kyle Polich

Data Skeptic

482 Listeners

Software Engineering Daily by Software Engineering Daily

Software Engineering Daily

624 Listeners

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) by Sam Charrington

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

445 Listeners

Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

Super Data Science: ML & AI Podcast with Jon Krohn

298 Listeners

NVIDIA AI Podcast by NVIDIA

NVIDIA AI Podcast

323 Listeners

Data Engineering Podcast by Tobias Macey

Data Engineering Podcast

142 Listeners

DataFramed by DataCamp

DataFramed

267 Listeners

Practical AI by Practical AI LLC

Practical AI

189 Listeners

The Stack Overflow Podcast by The Stack Overflow Podcast

The Stack Overflow Podcast

63 Listeners

Machine Learning Street Talk (MLST) by Machine Learning Street Talk (MLST)

Machine Learning Street Talk (MLST)

87 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

120 Listeners

Latent Space: The AI Engineer Podcast by swyx + Alessio

Latent Space: The AI Engineer Podcast

75 Listeners

AI + a16z by a16z

AI + a16z

31 Listeners

The Pragmatic Engineer by Gergely Orosz

The Pragmatic Engineer

52 Listeners