Disseminate: The Computer Science Research Podcast

Michael Abebe | Proteus: Autonomous Adaptive Storage for Mixed Workloads | #7


Listen Later

Summary:

Enterprises use distributed database systems to meet the demands of mixed or hybrid transaction/analytical processing (HTAP) workloads that contain both transactional (OLTP) and analytical (OLAP) requests. Distributed HTAP systems typically maintain a complete copy of data in row-oriented storage format that is well-suited for OLTP workloads and a second complete copy in column-oriented storage format optimised for OLAP workloads. Maintaining these data copies consumes significant storage space and system resources. Conversely, if a system stores data in a single format, OLTP or OLAP workload performance suffers.


In this interview, Michael talks about Proteus, a distributed HTAP database system that adaptively and autonomously selects and changes its storage layout to optimize for mixed workloads. Proteus generates physical execution plans that utilize storage-aware operators for efficient transaction execution. For HTAP workloads, Proteus delivers superior performance while providing OLTP and OLAP performance on par with designs specialized for either type of workload.


Questions:

0:56: Can you start off by explaining what a mixed workload is? 

1:58: What is the challenge database systems face in trying to support these mixed workloads? 

3:23: How have previous database systems tried to support mixed workloads? 

5:19: What are the design goals of Proteus? 

7:23: Can you elaborate more on the architecture of Proteus and how it makes decisions? 

8:46: Can you dig into how you predict the transaction latency, what is the mechanism behind this? 

10:35: It feels to me that you are accumulating a lot of metadata, this must have some overhead, how does this impact performance? 

12:08: It sounds like the Adaptive Storage Advisor is a centralized coordinator, what are the limitations of this decision choice?  

13:35: Are we in the context of a data-center here or can Proteus handle a geo-distributed deployment? 

14:34: Changing the storage layout has some implicit cost, how does Proteus decide whether a storage layout change is good or bad? 

16:57: How does Proteus predict what the transaction is going to be?

18:46: How did you evaluate Proteus?

20:20: If you had to summarize your work, what is the one key insight the listener can take away?

21:07: Is Proteus publicly available? 

21:39: What are the next steps? 

22:57: What is the most unexpected lesson you have learned whilst working on distributed database systems? 

24:21: Do you think a single system catering for both workload types is better than two specialized engines? 

26:10: What attracted you to work on this topic?


Links:
  • Paper: https://cs.uwaterloo.ca/~mtabebe/publications/abebeProteus2022SIGMOD.pdf 
  • Presentation: https://www.youtube.com/watch?v=qbe29viYTas
  • Uni of Waterloo Data Systems Group: https://uwaterloo.ca/data-systems-group/ 


Contact:


Hosted on Acast. See acast.com/privacy for more information.

...more
View all episodesView all episodes
Download on the App Store

Disseminate: The Computer Science Research PodcastBy Jack Waudby

  • 5
  • 5
  • 5
  • 5
  • 5

5

6 ratings


More shows like Disseminate: The Computer Science Research Podcast

View all
The Changelog: Software Development, Open Source by Changelog Media

The Changelog: Software Development, Open Source

284 Listeners

Software Engineering Daily by Software Engineering Daily

Software Engineering Daily

621 Listeners

The Daily by The New York Times

The Daily

111,864 Listeners

Oxide and Friends by Oxide Computer Company

Oxide and Friends

47 Listeners

Developer Voices by Kris Jenkins

Developer Voices

28 Listeners

localfirst.fm by localfirst.fm

localfirst.fm

18 Listeners

Better Offline by Cool Zone Media and iHeartPodcasts

Better Offline

491 Listeners