Google Cloud Platform Podcast

Beam and Spark with Holden Karau


Listen Later

Holden Karau is on the podcast this week to talk all about Spark and Beam, two open source tools that helps process data at scale, with Mark and Melanie.

Holden Karau

Holden Karau is a transgender Canadian open source developer advocate @ Google with a focus on Apache Spark, BEAM, and related “big data” tools. She is the co-author of Learning Spark, High Performance Spark, and another Spark book that’s a bit more out of date. She is a commiter on and PMC on Apache Spark and committer on SystemML & Mahout projects. She was tricked into the world of big data while trying to improve search and recommendation systems and has long since forgotten her original goal.

Cool things of the week
  • Twitter’s collaboration with Google Cloud blog & tweet
  • Kaggle CERN TrackML Particle Tracking Challenge Competition site
  • Open-sourcing gVisor, a sandboxed container runtime blog & repo
  • Announcing Stackdriver Kubernetes Monitoring blog
  • MLPerf: collaborative effort to standardize ML benchmarks site
Interview
  • Spark site & community site
  • Beam site
  • Cloud Dataflow site & docs
  • Cloud Dataproc site & docs
  • Using Spark on Kubernetes Engine blog
  • Testing future Apache Spark releases and changes on Google Kubernetes Engine and Cloud Dataproc blog
  • Spark Packages site
  • Spark testing base repo
  • Flink site
  • Arrow site

Upcoming Talks:

  • PyCon 2018 & Debugging PySpark talk
  • Scala Days & Keeping the “fun” in Spark talk
  • Strata London & Understanding Spark tuning with auto-tuning talk
  • J on the Beach & General Purpose Big Data Systems are eating the world talk
  • Spark Summit 2018 & Accelerating TF with Apache Arrow on Spark talk
Question of the week

I have a continuous integration build process setup with Container Builder, but it’s all sequential. I want to speed things up by processing parts of it in parallel. How do I do that?

  • Configure Build Step Order docs
Where can you find us next?

Mark can be found streaming Agones development on Twitch.

Melanie is speaking at the internet2 Global Summit, May 9th in San Diego, and will also be talking at the Understand Risk Forum on May 17th, in Mexico City.

Special shout out: Google I/O and PyCon are both happening this week

...more
View all episodesView all episodes
Download on the App Store

Google Cloud Platform PodcastBy Google Cloud Platform

  • 4.6
  • 4.6
  • 4.6
  • 4.6
  • 4.6

4.6

101 ratings


More shows like Google Cloud Platform Podcast

View all
The Vergecast by The Verge

The Vergecast

3,666 Listeners

Software Engineering Daily by Software Engineering Daily

Software Engineering Daily

624 Listeners

Acquired by Ben Gilbert and David Rosenthal

Acquired

4,230 Listeners

AWS Podcast by Amazon Web Services

AWS Podcast

202 Listeners

The Daily by The New York Times

The Daily

110,655 Listeners

Kubernetes Podcast from Google by Abdel Sghiouar, Kaslin Fields

Kubernetes Podcast from Google

184 Listeners

Talks at Google by Talks at Google

Talks at Google

118 Listeners

The Journal. by The Wall Street Journal & Spotify Studios

The Journal.

5,945 Listeners

Google DeepMind: The Podcast by Hannah Fry

Google DeepMind: The Podcast

194 Listeners

Hard Fork by The New York Times

Hard Fork

5,455 Listeners

Huberman Lab by Scicomm Media

Huberman Lab

28,580 Listeners

Cloud Security Podcast by Google by Anton Chuvakin

Cloud Security Podcast by Google

39 Listeners

The Weekly Show with Jon Stewart by Comedy Central

The Weekly Show with Jon Stewart

10,508 Listeners

Google Cloud Basics by Jason Meers

Google Cloud Basics

0 Listeners

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis by Nathaniel Whittemore

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis

505 Listeners