Data Engineering Podcast

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel


Listen Later

Summary

Data processing technologies have dramatically improved in their sophistication and raw throughput. Unfortunately, the volumes of data that are being generated continue to double, requiring further advancements in the platform capabilities to keep up. As the sophistication increases, so does the complexity, leading to challenges for user experience. Jignesh Patel has been researching these areas for several years in his work as a professor at Carnegie Mellon University. In this episode he illuminates the landscape of problems that we are faced with and how his research is aimed at helping to solve these problems.

Announcements
  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino.
  • Your host is Tobias Macey and today I'm interviewing Jignesh Patel about the research that he is conducting on technical scalability and user experience improvements around data management
  • Interview
    • Introduction
    • How did you get involved in the area of data management?
    • Can you start by summarizing your current areas of research and the motivations behind them?
    • What are the open questions today in technical scalability of data engines?
      • What are the experimental methods that you are using to gain understanding in the opportunities and practical limits of those systems?
      • As you strive to push the limits of technical capacity in data systems, how does that impact the usability of the resulting systems?
        • When performing research and building prototypes of the projects, what is your process for incorporating user experience into the implementation of the product?
        • What are the main sources of tension between technical scalability and user experience/ease of comprehension?
        • What are some of the positive synergies that you have been able to realize between your teaching, research, and corporate activities?
          • In what ways do they produce conflict, whether personally or technically?
          • What are the most interesting, innovative, or unexpected ways that you have seen your research used?
          • What are the most interesting, unexpected, or challenging lessons that you have learned while working on research of the scalability limits of data systems?
          • What is your heuristic for when a given research project needs to be terminated or productionized?
          • What do you have planned for the future of your academic research?
          • Contact Info
            • Website
            • LinkedIn
            • Parting Question
              • From your perspective, what is the biggest gap in the tooling or technology for data management today?
              • Closing Announcements
                • Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning.
                • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
                • If you've learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story.
                • To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers
                • Links
                  • Carnegie Mellon Universe
                  • Parallel Databases
                  • Genomics
                  • Proteomics
                  • Moore's Law
                  • Dennard Scaling
                  • Generative AI
                  • Quantum Computing
                  • Voltron Data
                    • Podcast Episode
                    • Von Neumann Architecture
                    • Two's Complement
                    • Ottertune
                      • Podcast Episode
                      • dbt
                      • Informatica
                      • Mozart Data
                        • Podcast Episode
                        • DataChat
                        • Von Neumann Bottleneck
                        • The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

                          Sponsored By:

                          • Starburst: ![Starburst Logo](https://files.fireside.fm/file/fireside-uploads/images/c/c6161a3f-a67b-48ef-b087-52f1f1573292/UpvN7wDT.png)
                          This episode is brought to you by Starburst - a data lake analytics platform for data engineers who are battling to build and scale high quality data pipelines on the data lake. Powered by Trino, Starburst runs petabyte-scale SQL analytics fast at a fraction of the cost of traditional methods, helping you meet all your data needs ranging from AI/ML workloads to data applications to complete analytics.
                          Trusted by the teams at Comcast and Doordash, Starburst delivers the adaptability and flexibility a lakehouse ecosystem promises, while providing a single point of access for your data and all your data governance allowing you to discover, transform, govern, and secure all in one place. Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Try Starburst Galaxy today, the easiest and fastest way to get started using Trino, and get $500 of credits free. [dataengineeringpodcast.com/starburst](https://www.dataengineeringpodcast.com/starburst)

                          Support Data Engineering Podcast

                          ...more
                          View all episodesView all episodes
                          Download on the App Store

                          Data Engineering PodcastBy Tobias Macey

                          • 4.5
                          • 4.5
                          • 4.5
                          • 4.5
                          • 4.5

                          4.5

                          140 ratings


                          More shows like Data Engineering Podcast

                          View all
                          Software Engineering Radio by se-radio@computer.org

                          Software Engineering Radio

                          273 Listeners

                          The Changelog: Software Development, Open Source by Changelog Media

                          The Changelog: Software Development, Open Source

                          292 Listeners

                          Software Engineering Daily by Software Engineering Daily

                          Software Engineering Daily

                          623 Listeners

                          The Cloudcast by Massive Studios

                          The Cloudcast

                          153 Listeners

                          Talk Python To Me by Michael Kennedy

                          Talk Python To Me

                          586 Listeners

                          Thoughtworks Technology Podcast by Thoughtworks

                          Thoughtworks Technology Podcast

                          42 Listeners

                          Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

                          Super Data Science: ML & AI Podcast with Jon Krohn

                          303 Listeners

                          Python Bytes by Michael Kennedy and Brian Okken

                          Python Bytes

                          214 Listeners

                          Syntax - Tasty Web Development Treats by Wes Bos & Scott Tolinski - Full Stack JavaScript Web Developers

                          Syntax - Tasty Web Development Treats

                          984 Listeners

                          DataFramed by DataCamp

                          DataFramed

                          268 Listeners

                          Practical AI by Practical AI LLC

                          Practical AI

                          214 Listeners

                          AWS Podcast by Amazon Web Services

                          AWS Podcast

                          201 Listeners

                          The Stack Overflow Podcast by The Stack Overflow Podcast

                          The Stack Overflow Podcast

                          62 Listeners

                          The Real Python Podcast by Real Python

                          The Real Python Podcast

                          141 Listeners

                          Latent Space: The AI Engineer Podcast by swyx + Alessio

                          Latent Space: The AI Engineer Podcast

                          95 Listeners