July 11, 2022

Hani Al-Sayeh | Juggler: Autonomous Cost Optimization and Performance Prediction of Big Data Applications | #6

32 minutes

Summary:

Distributed in-memory processing frameworks accelerate iterative workloads by caching suitable datasets in memory rather than recomputing them in each iteration. Selecting appropriate datasets to cache as well as allocating a suitable cluster configuration for caching these datasets play a crucial role in achieving optimal performance. In practice, both are tedious, time-consuming tasks and are often neglected by end users, who are typically not aware of workload semantics, sizes of intermediate data, and cluster specification. To address these problems, Hani and his colleagues developed Juggler, an end-to-end framework, which autonomously selects appropriate datasets for caching and recommends a correspondingly suitable cluster configuration to end users, with the aim of achieving optimal execution time and cost.

Questions:

1:02 - Can you introduce your work and describe the current workflow for developing big data applications in the cloud?

2:49 - What is the challenge (maybe hidden challenge) facing application developers in this workflow? What harms performance?

5:36 - How does Juggler solve this problem?

11:55 - As an end user, how do I interact with Juggler?

14:07 - Can you talk us through your evaluation of Juggler? What were the key insights?

16:30 - What other tools are similar to Juggler? How do they compare?

18:17 - What are the limitations of Juggler?

21:57 - Who will find Juggler the most useful? Who is it for?

24:05 - Is Juggler publicly available?

24:23 - What is the most interesting (maybe unexpected) lesson you learned while working on this topic?

27:50 - What is next for Juggler? What do you have planned for future research?

28:49 - What attracted you to this research area?

29:45 - What do you think is the biggest challenge now in this area?

Links:

Juggler: Autonomous Cost Optimization and Performance Prediction of Big Data Applications (SIGMOD 2022 paper)
Juggler SIGMOD 22 presentation
CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics (NSDI 2017 paper)
Ernest: Efficient Performance Prediction for Large-Scale Advanced Analytics (NSDI 2016 paper)

Contact:

Email: [email protected]
LinkedIn
TU Ilmenau Database and Information Systems Group

Hosted on Acast. See acast.com/privacy for more information.

...more

View all episodes

By Jack Waudby

66 ratings