Data Engineering Podcast

Migrate And Modify Your Data Platform Confidently With Compilerworks


Listen Later

Summary

A major concern that comes up when selecting a vendor or technology for storing and managing your data is vendor lock-in. What happens if the vendor fails? What if the technology can’t do what I need it to? Compilerworks set out to reduce the pain and complexity of migrating between platforms, and in the process added an advanced lineage tracking capability. In this episode Shevek, CTO of Compilerworks, takes us on an interesting journey through the many technical and social complexities that are involved in evolving your data platform and the system that they have built to make it a manageable task.

Announcements
  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
  • Schema changes, missing data, and volume anomalies caused by your data sources can happen without any advanced notice if you lack visibility into your data-in-motion. That leaves DataOps reactive to data quality issues and can make your consumers lose confidence in your data. By connecting to your pipeline orchestrator like Apache Airflow and centralizing your end-to-end metadata, Databand.ai lets you identify data quality issues and their root causes from a single dashboard. With Databand.ai, you’ll know whether the data moving from your sources to your warehouse will be available, accurate, and usable when it arrives. Go to dataengineeringpodcast.com/databand to sign up for a free 30-day trial of Databand.ai and take control of your data quality today.
  • We’ve all been asked to help with an ad-hoc request for data by the sales and marketing team. Then it becomes a critical report that they need updated every week or every day. Then what do you do? Send a CSV via email? Write some Python scripts to automate it? But what about incremental sync, API quotas, error handling, and all of the other details that eat up your time? Today, there is a better way. With Census, just write SQL or plug in your dbt models and start syncing your cloud warehouse to SaaS applications like Salesforce, Marketo, Hubspot, and many more. Go to dataengineeringpodcast.com/census today to get a free 14-day trial.
  • Your host is Tobias Macey and today I’m interviewing Shevek about Compilerworks and his work on writing compilers to automate data lineage tracking from your SQL code
  • Interview
    • Introduction
    • How did you get involved in the area of data management?
    • Can you describe what Compilerworks is and the story behind it?
    • What is a compiler?
      • How are you applying compilers to the challenges of data processing systems?
      • What are some use cases that Compilerworks is uniquely well suited to?
      • There are a number of other methods and systems available for tracking and/or computing data lineage. What are the benefits of the approach that you are taking with Compilerworks?
      • Can you describe the design and implementation of the Compilerworks platform?
        • How has the system changed or evolved since you first began working on it?
        • What programming languages and SQL dialects do you currently support?
          • Which have been the most challenging to work with?
          • How do you handle verification/validation of the algebraic representation of SQL code given the variability of implementations and the flexibility of the specification?
          • Can you talk through the process of getting Compilerworks integrated into a customer’s infrastructure?
            • What is a typical workflow for someone using Compilerworks to manage their data lineage?
            • How does Compilerworks simplify the process of migrating between data warehouses/processing platforms?
            • What are the most interesting, innovative, or unexpected ways that you have seen Compilerworks used?
            • What are the most interesting, unexpected, or challenging lessons that you have learned while working on Compilerworks?
            • When is Compilerworks the wrong choice?
            • What do you have planned for the future of Compilerworks?
            • Contact Info
              • @shevek on GitHub
              • Webiste
              • Parting Question
                • From your perspective, what is the biggest gap in the tooling or technology for data management today?
                • Links
                  • Compilerworks
                  • Compiler
                  • ANSI SQL
                  • Spark SQL
                  • Google Flume Paper
                  • SAS
                  • Informatica
                  • Trie Data Structure
                  • Satisfiability Solver
                  • Lisp
                  • Scheme
                  • Snooker
                  • Qemu Java API
                  • The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

                    Support Data Engineering Podcast

                    ...more
                    View all episodesView all episodes
                    Download on the App Store

                    Data Engineering PodcastBy Tobias Macey

                    • 4.6
                    • 4.6
                    • 4.6
                    • 4.6
                    • 4.6

                    4.6

                    135 ratings


                    More shows like Data Engineering Podcast

                    View all
                    Software Engineering Radio - the podcast for professional software developers by se-radio@computer.org

                    Software Engineering Radio - the podcast for professional software developers

                    272 Listeners

                    The Changelog: Software Development, Open Source by Changelog Media

                    The Changelog: Software Development, Open Source

                    283 Listeners

                    The Cloudcast by Massive Studios

                    The Cloudcast

                    152 Listeners

                    Thoughtworks Technology Podcast by Thoughtworks

                    Thoughtworks Technology Podcast

                    42 Listeners

                    Data Skeptic by Kyle Polich

                    Data Skeptic

                    481 Listeners

                    Talk Python To Me by Michael Kennedy

                    Talk Python To Me

                    590 Listeners

                    Software Engineering Daily by Software Engineering Daily

                    Software Engineering Daily

                    625 Listeners

                    The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) by Sam Charrington

                    The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

                    441 Listeners

                    Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

                    Super Data Science: ML & AI Podcast with Jon Krohn

                    298 Listeners

                    Python Bytes by Michael Kennedy and Brian Okken

                    Python Bytes

                    213 Listeners

                    DataFramed by DataCamp

                    DataFramed

                    265 Listeners

                    Practical AI by Practical AI LLC

                    Practical AI

                    190 Listeners

                    The Stack Overflow Podcast by The Stack Overflow Podcast

                    The Stack Overflow Podcast

                    64 Listeners

                    The Real Python Podcast by Real Python

                    The Real Python Podcast

                    140 Listeners

                    Latent Space: The AI Engineer Podcast by swyx + Alessio

                    Latent Space: The AI Engineer Podcast

                    76 Listeners