Data Engineering Weekly

DEW #120: The Case for Data Contracts, Action-Position data quality assessment framework & Stop emphasizing the Data Catalog


Listen Later

Please read Data Engineering Weekly Edition #120 

Topic 1: Colin Campbell: The Case for Data Contracts - Preventative data quality rather than reactive data quality

In this episode, we focus on the importance of data contracts in preventing data quality issues. We discuss an article by Colin Campbell highlighting the need for a data catalog and the market scope for data contract solutions. We also touch on the idea that data creation will be a decentralized process and the role of tools like data contracts in enabling successful decentralized data modeling. We emphasize the importance of creating high-quality data and the need for technological and organizational solutions to achieve this goal.

Key highlights of the conversation

  • "Preventative data quality rather than reactive data quality. It should start with contracts." - Colin Campbell. - Author of the article
  • "Contracts put a preventive structure in place" - Ashwin.
  • "The successful data-driven companies all do one thing very well. They create high-quality data." - Ananth.
  • Link:

    https://uncomfortablyidiosyncratic.substack.com/p/the-case-for-data-contracts

    https://www.dataengineeringweekly.com/p/introducing-schemata-a-decentralized


    Topic 2: Yerachmiel Feltzman: Action-Position data quality assessment framework

    In this conversation, we discuss a framework for data quality assessment called the Action Position framework. The framework helps define what actions should be taken based on the severity of the data quality problem. We also discuss two patterns for data quality: Write-Audit-Publish (WAP) and Audit-Write-Publish (AWP). The WAP pattern involves writing data, auditing it, and publishing it, while the AWP pattern involves auditing data, writing it, and publishing it. We encourage readers to share their best practices for addressing data quality issues.

    Are you using any Data Quality framework in your organization? Do you have any best practices on how you address data quality issues? What do you think of the action-position data quality framework? Please add your comments in the SubStack chat.

    Link:

    https://medium.com/everything-full-stack/action-position-data-quality-assessment-framework-d833f6b77b7

    Dremio WAP pattern: https://www.dremio.com/resources/webinars/the-write-audit-publish-pattern-via-apache-iceberg/


    Topic 3: Guy Fighel - Stop emphasizing the Data Catalog

    We discuss the limitations of data catalogs and the author’s view on the semantic layer as an alternative. The author argues that data catalogs are passive and quickly become outdated and that a stronger contract with enforced data quality could be a better solution. We also highlight the cost factors of implementing a data catalog and suggest that a more decentralized approach may be necessary to keep up with the increasing number of data sources. Innovation in this space is needed to improve organizations' discoverability and consumption of data assets.

    Link:

    https://www.linkedin.com/pulse/stop-emphasizing-data-catalog-guy-fighel/

    https://www.dataengineeringweekly.com/p/data-catalog-a-broken-promise


    ...more
    View all episodesView all episodes
    Download on the App Store

    Data Engineering WeeklyBy Ananth Packkildurai

    • 2.7
    • 2.7
    • 2.7
    • 2.7
    • 2.7

    2.7

    3 ratings


    More shows like Data Engineering Weekly

    View all
    Software Engineering Radio by se-radio@computer.org

    Software Engineering Radio

    271 Listeners

    The Changelog: Software Development, Open Source by Changelog Media

    The Changelog: Software Development, Open Source

    291 Listeners

    Software Engineering Daily by Software Engineering Daily

    Software Engineering Daily

    623 Listeners

    Talk Python To Me by Michael Kennedy

    Talk Python To Me

    587 Listeners

    Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

    Super Data Science: ML & AI Podcast with Jon Krohn

    301 Listeners

    Data Engineering Podcast by Tobias Macey

    Data Engineering Podcast

    146 Listeners

    Y Combinator Startup Podcast by Y Combinator

    Y Combinator Startup Podcast

    230 Listeners

    DataFramed by DataCamp

    DataFramed

    268 Listeners

    Tech Brew Ride Home by Morning Brew

    Tech Brew Ride Home

    968 Listeners

    Kubernetes Podcast from Google by Abdel Sghiouar, Kaslin Fields

    Kubernetes Podcast from Google

    182 Listeners

    Practical AI by Practical AI LLC

    Practical AI

    211 Listeners

    The Real Python Podcast by Real Python

    The Real Python Podcast

    141 Listeners

    Big Technology Podcast by Alex Kantrowitz

    Big Technology Podcast

    475 Listeners

    The Data Engineering Show by The Firebolt Data Bros

    The Data Engineering Show

    8 Listeners

    Latent Space: The AI Engineer Podcast by swyx + Alessio

    Latent Space: The AI Engineer Podcast

    96 Listeners