The New Stack Podcast

Is Apache Spark Too Costly? An Amazon Engineer Tells His Story


Listen Later

Is Apache Spark too costly? Amazon Principal Engineer Patrick Ames tackled this question during an interview with The New Stack Makers, sharing insights into transitioning from Spark to Ray for managing large-scale data. Ames, described as a "go-to" engineer for exabyte-scale projects, emphasized a goal-driven approach to solving complex engineering problems, from simplifying daily chores to optimizing software solutions.

Initially, Spark was chosen at Amazon for its simplicity and open-source flexibility, allowing efficient merging of data with minimal SQL code. The team leveraged Spark in a decoupled architecture over S3 storage, scaling it to handle thousands of jobs daily. However, as data volumes grew to hundreds of terabytes and beyond, Spark’s limitations became apparent. Long processing times and high costs prompted a search for alternatives.

Enter Ray—a unified framework designed for scaling AI and Python applications. After experimentation, Ames and his team noted significant efficiency improvements, driving the shift from Spark to Ray to meet scalability and cost-efficiency needs.

Learn more from The New Stack about Apache Spark and Ray: 

Amazon to Save Millions Moving From Apache Spark to Ray

How Ray, a Distributed AI Framework, Helps Power ChatGPT 

Join our community of newsletter subscribers to stay on top of the news and at the top of your game


Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

...more
View all episodesView all episodes
Download on the App Store

The New Stack PodcastBy The New Stack

  • 4.3
  • 4.3
  • 4.3
  • 4.3
  • 4.3

4.3

31 ratings


More shows like The New Stack Podcast

View all
Freakonomics Radio by Freakonomics Radio + Stitcher

Freakonomics Radio

32,304 Listeners

The Joe Rogan Experience by Joe Rogan

The Joe Rogan Experience

230,196 Listeners

The Tim Ferriss Show by Tim Ferriss: Bestselling Author, Human Guinea Pig

The Tim Ferriss Show

16,196 Listeners

The New Stack Analysts by The New Stack

The New Stack Analysts

9 Listeners

The New Stack @ Scale by The New Stack

The New Stack @ Scale

3 Listeners

Software Engineering Radio - the podcast for professional software developers by team@se-radio.net (SE-Radio Team)

Software Engineering Radio - the podcast for professional software developers

272 Listeners

Pivot by New York Magazine

Pivot

9,749 Listeners

The a16z Show by Andreessen Horowitz

The a16z Show

1,101 Listeners

Software Engineering Daily by Software Engineering Daily

Software Engineering Daily

623 Listeners

The Cloudcast by Massive Studios

The Cloudcast

151 Listeners

The New Stack Context by The New Stack

The New Stack Context

4 Listeners

DevOps Paradox by Darin Pope & Viktor Farcic

DevOps Paradox

25 Listeners

All-In with Chamath, Jason, Sacks & Friedberg by All-In Podcast, LLC

All-In with Chamath, Jason, Sacks & Friedberg

10,275 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

519 Listeners

Hard Fork by The New York Times

Hard Fork

5,536 Listeners

The Rest Is History by Goalhanger

The Rest Is History

15,904 Listeners