Data Engineering Podcast

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel


Listen Later

Summary

Data processing technologies have dramatically improved in their sophistication and raw throughput. Unfortunately, the volumes of data that are being generated continue to double, requiring further advancements in the platform capabilities to keep up. As the sophistication increases, so does the complexity, leading to challenges for user experience. Jignesh Patel has been researching these areas for several years in his work as a professor at Carnegie Mellon University. In this episode he illuminates the landscape of problems that we are faced with and how his research is aimed at helping to solve these problems.

Announcements
  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino.
  • Your host is Tobias Macey and today I'm interviewing Jignesh Patel about the research that he is conducting on technical scalability and user experience improvements around data management
  • Interview
    • Introduction
    • How did you get involved in the area of data management?
    • Can you start by summarizing your current areas of research and the motivations behind them?
    • What are the open questions today in technical scalability of data engines?
      • What are the experimental methods that you are using to gain understanding in the opportunities and practical limits of those systems?
      • As you strive to push the limits of technical capacity in data systems, how does that impact the usability of the resulting systems?
        • When performing research and building prototypes of the projects, what is your process for incorporating user experience into the implementation of the product?
        • What are the main sources of tension between technical scalability and user experience/ease of comprehension?
        • What are some of the positive synergies that you have been able to realize between your teaching, research, and corporate activities?
          • In what ways do they produce conflict, whether personally or technically?
          • What are the most interesting, innovative, or unexpected ways that you have seen your research used?
          • What are the most interesting, unexpected, or challenging lessons that you have learned while working on research of the scalability limits of data systems?
          • What is your heuristic for when a given research project needs to be terminated or productionized?
          • What do you have planned for the future of your academic research?
          • Contact Info
            • Website
            • LinkedIn
            • Parting Question
              • From your perspective, what is the biggest gap in the tooling or technology for data management today?
              • Closing Announcements
                • Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning.
                • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
                • If you've learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story.
                • To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers
                • Links
                  • Carnegie Mellon Universe
                  • Parallel Databases
                  • Genomics
                  • Proteomics
                  • Moore's Law
                  • Dennard Scaling
                  • Generative AI
                  • Quantum Computing
                  • Voltron Data
                    • Podcast Episode
                    • Von Neumann Architecture
                    • Two's Complement
                    • Ottertune
                      • Podcast Episode
                      • dbt
                      • Informatica
                      • Mozart Data
                        • Podcast Episode
                        • DataChat
                        • Von Neumann Bottleneck
                        • The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

                          Sponsored By:

                          • Starburst: ![Starburst Logo](https://files.fireside.fm/file/fireside-uploads/images/c/c6161a3f-a67b-48ef-b087-52f1f1573292/UpvN7wDT.png)
                          This episode is brought to you by Starburst - a data lake analytics platform for data engineers who are battling to build and scale high quality data pipelines on the data lake. Powered by Trino, Starburst runs petabyte-scale SQL analytics fast at a fraction of the cost of traditional methods, helping you meet all your data needs ranging from AI/ML workloads to data applications to complete analytics.
                          Trusted by the teams at Comcast and Doordash, Starburst delivers the adaptability and flexibility a lakehouse ecosystem promises, while providing a single point of access for your data and all your data governance allowing you to discover, transform, govern, and secure all in one place. Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Try Starburst Galaxy today, the easiest and fastest way to get started using Trino, and get $500 of credits free. [dataengineeringpodcast.com/starburst](https://www.dataengineeringpodcast.com/starburst)

                          Support Data Engineering Podcast

                          ...more
                          View all episodesView all episodes
                          Download on the App Store

                          Data Engineering PodcastBy Tobias Macey

                          • 4.6
                          • 4.6
                          • 4.6
                          • 4.6
                          • 4.6

                          4.6

                          135 ratings


                          More shows like Data Engineering Podcast

                          View all
                          Software Engineering Radio - the podcast for professional software developers by se-radio@computer.org

                          Software Engineering Radio - the podcast for professional software developers

                          272 Listeners

                          The Changelog: Software Development, Open Source by Changelog Media

                          The Changelog: Software Development, Open Source

                          284 Listeners

                          The Cloudcast by Massive Studios

                          The Cloudcast

                          152 Listeners

                          Thoughtworks Technology Podcast by Thoughtworks

                          Thoughtworks Technology Podcast

                          42 Listeners

                          Data Skeptic by Kyle Polich

                          Data Skeptic

                          480 Listeners

                          Talk Python To Me by Michael Kennedy

                          Talk Python To Me

                          591 Listeners

                          Software Engineering Daily by Software Engineering Daily

                          Software Engineering Daily

                          627 Listeners

                          The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) by Sam Charrington

                          The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

                          442 Listeners

                          Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

                          Super Data Science: ML & AI Podcast with Jon Krohn

                          295 Listeners

                          Python Bytes by Michael Kennedy and Brian Okken

                          Python Bytes

                          213 Listeners

                          DataFramed by DataCamp

                          DataFramed

                          266 Listeners

                          Practical AI by Practical AI LLC

                          Practical AI

                          189 Listeners

                          The Stack Overflow Podcast by The Stack Overflow Podcast

                          The Stack Overflow Podcast

                          64 Listeners

                          The Real Python Podcast by Real Python

                          The Real Python Podcast

                          139 Listeners

                          Latent Space: The AI Engineer Podcast by swyx + Alessio

                          Latent Space: The AI Engineer Podcast

                          76 Listeners