Disseminate: The Computer Science Research Podcast

Hani Al-Sayeh | Juggler: Autonomous Cost Optimization and Performance Prediction of Big Data Applications | #6


Listen Later

Summary:

Distributed in-memory processing frameworks accelerate iterative workloads by caching suitable datasets in memory rather than recomputing them in each iteration. Selecting appropriate datasets to cache as well as allocating a suitable cluster configuration for caching these datasets play a crucial role in achieving optimal performance. In practice, both are tedious, time-consuming tasks and are often neglected by end users, who are typically not aware of workload semantics, sizes of intermediate data, and cluster specification. To address these problems, Hani and his colleagues developed Juggler, an end-to-end framework, which autonomously selects appropriate datasets for caching and recommends a correspondingly suitable cluster configuration to end users, with the aim of achieving optimal execution time and cost.


Questions:

1:02 - Can you introduce your work and describe the current workflow for developing big data applications in the cloud?

2:49 - What is the challenge (maybe hidden challenge) facing application developers in this workflow? What harms performance?

5:36 - How does Juggler solve this problem?

11:55 - As an end user, how do I interact with Juggler?

14:07 - Can you talk us through your evaluation of Juggler? What were the key insights?

16:30 - What other tools are similar to Juggler? How do they compare?

18:17 - What are the limitations of Juggler?

21:57 - Who will find Juggler the most useful? Who is it for?

24:05 - Is Juggler publicly available?

24:23 - What is the most interesting (maybe unexpected) lesson you learned while working on this topic?

27:50 - What is next for Juggler? What do you have planned for future research?

28:49 - What attracted you to this research area? 

29:45 - What do you think is the biggest challenge now in this area?


Links:
  • Juggler: Autonomous Cost Optimization and Performance Prediction of Big Data Applications (SIGMOD 2022 paper)
  • Juggler SIGMOD 22 presentation
  • CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics (NSDI 2017 paper)
  • Ernest: Efficient Performance Prediction for Large-Scale Advanced Analytics (NSDI 2016 paper)


Contact:

Hosted on Acast. See acast.com/privacy for more information.

...more
View all episodesView all episodes
Download on the App Store

Disseminate: The Computer Science Research PodcastBy Jack Waudby

  • 5
  • 5
  • 5
  • 5
  • 5

5

6 ratings


More shows like Disseminate: The Computer Science Research Podcast

View all
The Changelog: Software Development, Open Source by Changelog Media

The Changelog: Software Development, Open Source

284 Listeners

Software Engineering Daily by Software Engineering Daily

Software Engineering Daily

621 Listeners

The Daily by The New York Times

The Daily

111,864 Listeners

Oxide and Friends by Oxide Computer Company

Oxide and Friends

47 Listeners

Developer Voices by Kris Jenkins

Developer Voices

28 Listeners

localfirst.fm by localfirst.fm

localfirst.fm

18 Listeners

Better Offline by Cool Zone Media and iHeartPodcasts

Better Offline

491 Listeners