June 30, 2023

Scaling Machine Learning with Spark • Adi Polak & Holden Karau

40 minutes

This interview was recorded for the GOTO Book Club.
gotopia.tech/bookclub

Read the full transcription of the interview here

Adi Polak - VP of Developer Experience at Treeverse & Contributing to lakeFS OSS
Holden Karau - Co-Author of "Kubeflow for Machine Learning" & many more books & Open Source Engineer at Netflix

DESCRIPTION
Learn how to build end-to-end scalable machine learning solutions with Apache Spark. With this practical guide, author Adi Polak introduces data and ML practitioners to creative solutions that supersede today's traditional methods. You'll learn a more holistic approach that takes you beyond specific requirements and organizational goals--allowing data and ML practitioners to collaborate and understand each other better.

Scaling Machine Learning with Spark examines several technologies for building end-to-end distributed ML workflows based on the Apache Spark ecosystem with Spark MLlib, MLflow, TensorFlow, and PyTorch. If you're a data scientist who works with machine learning, this book shows you when and why to use each technology.

You will:
• Explore machine learning, including distributed computing concepts and terminology
• Manage the ML lifecycle with MLflow
• Ingest data and perform basic preprocessing with Spark
• Explore feature engineering, and use Spark to extract features
• Train a model with MLlib and build a pipeline to reproduce it
• Build a data system to combine the power of Spark with deep learning
• Get a step-by-step example of working with distributed TensorFlow
• Use PyTorch to scale machine learning and its internal architecture

* Book description: © O’Reilly

The interview is based on the book "Scaling Machine Learning with Spark"

RECOMMENDED BOOKS
Adi Polak • Machine Learning with Apache Spark
Holden Karau, Trevor Grant, Boris Lublinsky, Richard Liu & Ilan Filonenko • Kubeflow for Machine Learning
Holden Karau • Distributed Computing 4 Kids
Holden Karau • Scaling Python with Dask
Holden Karau & Boris Lublinsky • Scaling Python with Ray
Holden Karau & Rachel Warren • High Performance Spark
Holden Karau, Konwinski, Wendell & Zaharia • Learning Spark
Holden Karau & Krishna Sankar • Fast Data Processing with Spark 2nd Edition

Bluesky
Twitter
Instagram
LinkedIn
Facebook

CHANNEL MEMBERSHIP BONUS
Join this channel to get early access to videos & other perks:
https://www.youtube.com/channel/UCs_tLP3AiwYKwdUHpltJPuA/join

Looking for a unique learning experience?
Attend the next GOTO conference near you! Get your ticket: gotopia.tech

SUBSCRIBE TO OUR YOUTUBE CHANNEL - new videos posted daily!

...more