November 21, 2024

Is Apache Spark Too Costly? An Amazon Engineer Tells His Story

Listen Later

25 minutes

Is Apache Spark too costly? Amazon Principal Engineer Patrick Ames tackled this question during an interview with The New Stack Makers, sharing insights into transitioning from Spark to Ray for managing large-scale data. Ames, described as a "go-to" engineer for exabyte-scale projects, emphasized a goal-driven approach to solving complex engineering problems, from simplifying daily chores to optimizing software solutions.

Initially, Spark was chosen at Amazon for its simplicity and open-source flexibility, allowing efficient merging of data with minimal SQL code. The team leveraged Spark in a decoupled architecture over S3 storage, scaling it to handle thousands of jobs daily. However, as data volumes grew to hundreds of terabytes and beyond, Spark’s limitations became apparent. Long processing times and high costs prompted a search for alternatives.

Enter Ray—a unified framework designed for scaling AI and Python applications. After experimentation, Ames and his team noted significant efficiency improvements, driving the shift from Spark to Ray to meet scalability and cost-efficiency needs.

Learn more from The New Stack about Apache Spark and Ray:

Amazon to Save Millions Moving From Apache Spark to Ray

How Ray, a Distributed AI Framework, Helps Power ChatGPT

Join our community of newsletter subscribers to stay on top of the news and at the top of your game.

Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

...more

View all episodes

View all episodes

Download on the App Store

Download on the App Store

Get it on Google Play

The New Stack Podcast

By The New Stack

4.3

3131 ratings

November 21, 2024

Is Apache Spark Too Costly? An Amazon Engineer Tells His Story

Listen Later

25 minutes

Is Apache Spark too costly? Amazon Principal Engineer Patrick Ames tackled this question during an interview with The New Stack Makers, sharing insights into transitioning from Spark to Ray for managing large-scale data. Ames, described as a "go-to" engineer for exabyte-scale projects, emphasized a goal-driven approach to solving complex engineering problems, from simplifying daily chores to optimizing software solutions.

Initially, Spark was chosen at Amazon for its simplicity and open-source flexibility, allowing efficient merging of data with minimal SQL code. The team leveraged Spark in a decoupled architecture over S3 storage, scaling it to handle thousands of jobs daily. However, as data volumes grew to hundreds of terabytes and beyond, Spark’s limitations became apparent. Long processing times and high costs prompted a search for alternatives.

Enter Ray—a unified framework designed for scaling AI and Python applications. After experimentation, Ames and his team noted significant efficiency improvements, driving the shift from Spark to Ray to meet scalability and cost-efficiency needs.

Learn more from The New Stack about Apache Spark and Ray:

Amazon to Save Millions Moving From Apache Spark to Ray

How Ray, a Distributed AI Framework, Helps Power ChatGPT

Join our community of newsletter subscribers to stay on top of the news and at the top of your game.

Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

...more

More shows like The New Stack Podcast

Software Engineering Radio by se-radio@computer.org

Software Engineering Radio

273 Listeners

The Changelog: Software Development, Open Source by Changelog Media

The Changelog: Software Development, Open Source

288 Listeners

The Cloudcast by Massive Studios

The Cloudcast

156 Listeners

Thoughtworks Technology Podcast by Thoughtworks

Thoughtworks Technology Podcast

42 Listeners

The New Stack Analysts by The New Stack

The New Stack Analysts

9 Listeners

Software Engineering Daily by Software Engineering Daily

Software Engineering Daily

625 Listeners

The New Stack @ Scale by The New Stack

The New Stack @ Scale

3 Listeners

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) by Sam Charrington

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

430 Listeners

The New Stack Context by The New Stack

The New Stack Context

4 Listeners

AWS Podcast by Amazon Web Services

AWS Podcast

205 Listeners

Syntax - Tasty Web Development Treats by Wes Bos & Scott Tolinski - Full Stack JavaScript Web Developers

Syntax - Tasty Web Development Treats

983 Listeners

CoRecursive: Coding Stories by Adam Gordon Bell - Software Developer

CoRecursive: Coding Stories

188 Listeners

Kubernetes Podcast from Google by Abdel Sghiouar, Kaslin Fields

Kubernetes Podcast from Google

182 Listeners

Practical AI by Practical AI LLC

Practical AI

206 Listeners

The Stack Overflow Podcast by The Stack Overflow Podcast

The Stack Overflow Podcast

62 Listeners

Oxide and Friends by Oxide Computer Company

Oxide and Friends

59 Listeners

Latent Space: The AI Engineer Podcast by swyx + Alessio

Latent Space: The AI Engineer Podcast

96 Listeners

The Pragmatic Engineer by Gergely Orosz

The Pragmatic Engineer

62 Listeners