Data Engineering Podcast

Fast And Flexible Headless Data Analytics With Cube.JS


Listen Later

Summary

One of the perennial challenges of data analytics is having a consistent set of definitions, along with a flexible and performant API endpoint for querying them. In this episode Artom Keydunov and Pavel Tiunov share their work on Cube.js and the various ways that it is being used in the open source community.

Announcements
  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
  • Atlan is a collaborative workspace for data-driven teams, like Github for engineering or Figma for design teams. By acting as a virtual hub for data assets ranging from tables and dashboards to SQL snippets & code, Atlan enables teams to create a single source of truth for all their data assets, and collaborate across the modern data stack through deep integrations with tools like Snowflake, Slack, Looker and more. Go to dataengineeringpodcast.com/atlan today and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $3000 on an annual subscription
  • Modern Data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days. Datafold helps Data teams gain visibility and confidence in the quality of their analytical data through data profiling, column-level lineage and intelligent anomaly detection. Datafold also helps automate regression testing of ETL code with its Data Diff feature that instantly shows how a change in ETL or BI code affects the produced data, both on a statistical level and down to individual rows and values. Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Go to dataengineeringpodcast.com/datafold today to start a 30-day trial of Datafold.
  • Your host is Tobias Macey and today I’m interviewing Artyom Keydunov and Pavel Tiunov about Cube.js a framework for building analytics APIs to power your applications and BI dashboards
  • Interview
    • Introduction
    • How did you get involved in the area of data management?
    • Can you describe what Cube is and the story behind it?
    • What are the main use cases and platform architectures that you are focused on?
      • Who are the target personas that will be using and managing Cube.js?
      • The name comes from the concept of an OLAP cube. Can you discuss the applications of OLAP cubes and their role in the current state of the data ecosystem?
        • How does the idea of an OLAP cube compare to the recent focus on a dedicated metrics layer?
        • What are the pieces of a data platform that might be replaced by Cube.js?
        • Can you describe the design and architecture of the Cube platform?
          • How has the focus and target use case for the Cube platform evolved since you first started working on it?
          • One of the perpetually hard problems in computer science is cache management. How have you approached that challenge in the pre-aggregation layer of the Cube framework?
          • What is your overarching design philosophy for the API of the Cube system?
          • Can you talk through the workflow of someone building a cube and querying it from a downstream system?
            • What do the iteration cycles look like as you go from initial proof of concept to a more sophisticated usage of Cube.js?
            • What are some of the data modeling steps that are needed in the source systems?
            • The perennial problem of embedding SQL into another host language or DSL is how to deal with validation and developer tooling. What are the utilities that you and the community have built to reduce friction while writing the definitions of a cube?
            • What are the methods available for maintaining visibility across all of the cubes defined within and across installations of Cube.js?
              • What are the opportunities for composing multiple cubes together to form a higher level aggregation?
              • What are the most interesting, innovative, or unexpected ways that you have seen Cube.js used?
              • What are the most interesting, unexpected, or challenging lessons that you have learned while working on Cube?
              • When is Cube the wrong choice?
              • What do you have planned for the future of Cube?
              • Contact Info
                • Artom
                  • keydunov on GitHub
                  • @keydunov on Twitter
                  • LinkedIn
                  • Pavel
                    • LinkedIn
                    • @paveltiunov87 on Twitter
                    • paveltiunov on GitHub
                    • Parting Question
                      • From your perspective, what is the biggest gap in the tooling or technology for data management today?
                      • Closing Announcements
                        • Thank you for listening! Don’t forget to check out our other show, Podcast.__init__ to learn about the Python language, its community, and the innovative ways it is being used.
                        • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
                        • If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story.
                        • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
                        • Links
                          • Cube.js
                          • Statsbot
                          • chart.js
                          • Highcharts
                          • D3
                          • OLAP Cube
                          • dbt
                          • Superset
                            • Podcast Episode
                            • Streamlit
                              • Podcast.__init__ Episode
                              • Parquet
                              • Hasura
                              • kSQLDB
                                • Podcast Episode
                                • Materialize
                                  • Podcast Episode
                                  • Meltano
                                    • Podcast Episode
                                    • The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

                                      Support Data Engineering Podcast

                                      ...more
                                      View all episodesView all episodes
                                      Download on the App Store

                                      Data Engineering PodcastBy Tobias Macey

                                      • 4.6
                                      • 4.6
                                      • 4.6
                                      • 4.6
                                      • 4.6

                                      4.6

                                      135 ratings


                                      More shows like Data Engineering Podcast

                                      View all
                                      Software Engineering Radio - the podcast for professional software developers by se-radio@computer.org

                                      Software Engineering Radio - the podcast for professional software developers

                                      272 Listeners

                                      The Changelog: Software Development, Open Source by Changelog Media

                                      The Changelog: Software Development, Open Source

                                      282 Listeners

                                      The Cloudcast by Massive Studios

                                      The Cloudcast

                                      152 Listeners

                                      Thoughtworks Technology Podcast by Thoughtworks

                                      Thoughtworks Technology Podcast

                                      42 Listeners

                                      Data Skeptic by Kyle Polich

                                      Data Skeptic

                                      481 Listeners

                                      Talk Python To Me by Michael Kennedy

                                      Talk Python To Me

                                      591 Listeners

                                      Software Engineering Daily by Software Engineering Daily

                                      Software Engineering Daily

                                      627 Listeners

                                      The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) by Sam Charrington

                                      The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

                                      440 Listeners

                                      Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

                                      Super Data Science: ML & AI Podcast with Jon Krohn

                                      298 Listeners

                                      Python Bytes by Michael Kennedy and Brian Okken

                                      Python Bytes

                                      213 Listeners

                                      DataFramed by DataCamp

                                      DataFramed

                                      265 Listeners

                                      Practical AI by Practical AI LLC

                                      Practical AI

                                      189 Listeners

                                      The Stack Overflow Podcast by The Stack Overflow Podcast

                                      The Stack Overflow Podcast

                                      64 Listeners

                                      The Real Python Podcast by Real Python

                                      The Real Python Podcast

                                      140 Listeners

                                      Latent Space: The AI Engineer Podcast by swyx + Alessio

                                      Latent Space: The AI Engineer Podcast

                                      76 Listeners