The New Stack Podcast

Is Apache Spark Too Costly? An Amazon Engineer Tells His Story


Listen Later

Is Apache Spark too costly? Amazon Principal Engineer Patrick Ames tackled this question during an interview with The New Stack Makers, sharing insights into transitioning from Spark to Ray for managing large-scale data. Ames, described as a "go-to" engineer for exabyte-scale projects, emphasized a goal-driven approach to solving complex engineering problems, from simplifying daily chores to optimizing software solutions.

Initially, Spark was chosen at Amazon for its simplicity and open-source flexibility, allowing efficient merging of data with minimal SQL code. The team leveraged Spark in a decoupled architecture over S3 storage, scaling it to handle thousands of jobs daily. However, as data volumes grew to hundreds of terabytes and beyond, Spark’s limitations became apparent. Long processing times and high costs prompted a search for alternatives.

Enter Ray—a unified framework designed for scaling AI and Python applications. After experimentation, Ames and his team noted significant efficiency improvements, driving the shift from Spark to Ray to meet scalability and cost-efficiency needs.

Learn more from The New Stack about Apache Spark and Ray: 

Amazon to Save Millions Moving From Apache Spark to Ray

How Ray, a Distributed AI Framework, Helps Power ChatGPT 

Join our community of newsletter subscribers to stay on top of the news and at the top of your game


Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

...more
View all episodesView all episodes
Download on the App Store

The New Stack PodcastBy The New Stack

  • 4.3
  • 4.3
  • 4.3
  • 4.3
  • 4.3

4.3

31 ratings


More shows like The New Stack Podcast

View all
The New Stack Analysts by The New Stack

The New Stack Analysts

9 Listeners

The New Stack @ Scale by The New Stack

The New Stack @ Scale

3 Listeners

The Changelog: Software Development, Open Source by Changelog Media

The Changelog: Software Development, Open Source

289 Listeners

The a16z Show by Andreessen Horowitz

The a16z Show

1,087 Listeners

Software Engineering Daily by Software Engineering Daily

Software Engineering Daily

626 Listeners

Thoughtworks Technology Podcast by Thoughtworks

Thoughtworks Technology Podcast

43 Listeners

The New Stack Context by The New Stack

The New Stack Context

4 Listeners

Y Combinator Startup Podcast by Y Combinator

Y Combinator Startup Podcast

226 Listeners

Syntax - Tasty Web Development Treats by Wes Bos & Scott Tolinski - Full Stack JavaScript Web Developers

Syntax - Tasty Web Development Treats

988 Listeners

CoRecursive: Coding Stories by Adam Gordon Bell - Software Developer

CoRecursive: Coding Stories

190 Listeners

Practical AI by Practical AI LLC

Practical AI

211 Listeners

AWS Podcast by Amazon Web Services

AWS Podcast

202 Listeners

The Stack Overflow Podcast by The Stack Overflow Podcast

The Stack Overflow Podcast

64 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

501 Listeners

Big Technology Podcast by Alex Kantrowitz

Big Technology Podcast

494 Listeners

AI and I by Dan Shipper

AI and I

33 Listeners

BG2Pod with Brad Gerstner and Bill Gurley by BG2Pod

BG2Pod with Brad Gerstner and Bill Gurley

467 Listeners

AI + a16z by a16z

AI + a16z

35 Listeners