Data Engineering Weekly

DEW #122: dbt Reimagined, Change Data Capture @ Brex, on Data Products and how to describe them


Listen Later

DBT Reimagined by Pedram Navid

https://pedram.substack.com/p/dbt-reimagined


The challenge with this, having the Jinja templating, I found out two things. One is like; it is on runtime. So you have to build it and then run some simulations to understand whether you did it correctly or not.

Jinja Templates also add cognitive load. The developers have to know how the Jinja template will work; how SQL will work, and it becomes a bit difficult to read and understand.


In this conversation with Aswin, we discuss the article "DBT Reimagined" by Pedram Navid. We talked about the strengths and weaknesses of DBT and what we would like to see in a future version of the tool.

Aswin agrees with Pedram Navid that a DSL would be better than a templated language for DBT. He also points out that the Jinja templating system can be difficult to read and understand.

I agree with both Aswin and Pedram Navid. A DSL would be a great way to improve DBT. It would make the tool more powerful and easier to use.

I'm also interested in a native programming language for DBT. It would allow developers to write their own custom functions and operators, giving them even more flexibility in using the tool.

The conversation shifts to the advantages of DSL over templated code, and they discuss other tools like SQL Mesh, Malloy, and an internal tool by Criteo. I believe that more experimentation with SQL is needed.

Overall, the article "DBT Reimagined" is a valuable contribution to discussing the future of data transformation tools. It raises some important questions about the strengths and weaknesses of DBT and offers some interesting ideas for how to improve.


Change Data Capture at Brex by Jun Zhao

https://medium.com/brexeng/change-data-capture-at-brex-c71263616dd7

Aswin provided a great definition of CDC, explaining it as a mechanism to listen to database replication logs and capture, stream, and reproduce data in real time🕒. He shared his first encounter with CDC back in 2013, working on a Proof of Concept (POC) for a bank🏦.

Aswin explains that CDC is a way to capture changes made to data in a database. This can be useful for a variety of reasons, such as:

  • Auditing: CDC can be used to track changes made to data, which can be useful for auditing purposes.

  • Compliance: CDC can be used to ensure that data complies with regulations.

  • Data replication: CDC can replicate data from one database to another.

  • Data integration: CDC can be used to integrate data from multiple sources.

  • Aswin also discusses some of the challenges of using the CDC, such as:

    • Complexity: CDC can be a complex process to implement.

    • Cost: CDC can be a costly process to implement.

    • Performance: CDC can impact the performance of the database.

    • On Data Products and How to describe them by Max Illis

    • https://medium.com/@maxillis/on-data-products-and-how-to-describe-them-76ae1b7abda4

      The library example is close to heart for Aswin since his father started his career as a librarian! 📖

      👨‍💻 Aswin highlights Max's broad definition of data products, including data sets, tables, views, APIs, and machine learning models. Anand agrees that BI dashboards can also be data products. 📊

      🔍We emphasize the importance of exposing tribal knowledge and democratizing the data product world. Max's journey from skeptic to believer in data products is very admirable. 🌟

      📝We dive into data products' structural and behavioral properties and Max's detailed description of build-time and runtime properties. They also appreciate the idea of reference queries to facilitate data consumption. 🧩

      🚀In conclusion, Max's blog post on data products is one of the best written up on data products around! Big thanks to Max for sharing his thoughts! 🙌

      Change Data Capture at Brex by Jun ZhaoOn Data Products and How to describe them by Max Illis

      ...more
      View all episodesView all episodes
      Download on the App Store

      Data Engineering WeeklyBy Ananth Packkildurai

      • 2.7
      • 2.7
      • 2.7
      • 2.7
      • 2.7

      2.7

      3 ratings


      More shows like Data Engineering Weekly

      View all
      Software Engineering Radio - the podcast for professional software developers by team@se-radio.net (SE-Radio Team)

      Software Engineering Radio - the podcast for professional software developers

      273 Listeners

      The Changelog: Software Development, Open Source by Changelog Media

      The Changelog: Software Development, Open Source

      289 Listeners

      Software Engineering Daily by Software Engineering Daily

      Software Engineering Daily

      624 Listeners

      Talk Python To Me by Michael Kennedy

      Talk Python To Me

      583 Listeners

      Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

      Super Data Science: ML & AI Podcast with Jon Krohn

      302 Listeners

      Data Engineering Podcast by Tobias Macey

      Data Engineering Podcast

      146 Listeners

      Y Combinator Startup Podcast by Y Combinator

      Y Combinator Startup Podcast

      227 Listeners

      DataFramed by DataCamp

      DataFramed

      268 Listeners

      Tech Brew Ride Home by Morning Brew

      Tech Brew Ride Home

      961 Listeners

      Kubernetes Podcast from Google by Abdel Sghiouar, Kaslin Fields

      Kubernetes Podcast from Google

      180 Listeners

      Practical AI by Practical AI LLC

      Practical AI

      205 Listeners

      The Real Python Podcast by Real Python

      The Real Python Podcast

      141 Listeners

      Big Technology Podcast by Alex Kantrowitz

      Big Technology Podcast

      501 Listeners

      The Data Engineering Show by The Firebolt Data Bros

      The Data Engineering Show

      8 Listeners

      Latent Space: The AI Engineer Podcast by swyx + Alessio

      Latent Space: The AI Engineer Podcast

      92 Listeners