O'Reilly Data Show Podcast

Acquiring and sharing high-quality data


Listen Later

In this episode of the Data Show, I spoke with Roger Chen, co-founder and CEO of Computable Labs, a startup focused on building tools for the creation of data networks and data exchanges. Chen has also served as co-chair of O’Reilly’s Artificial Intelligence Conference since its inception in 2016. This conversation took place the day after Chen and his collaborators released an interesting new white paper, Fair value and decentralized governance of data. Current-generation AI and machine learning technologies rely on large amounts of data, and to the extent they can use their large user bases to create “data silos,” large companies in large countries (like the U.S. and China) enjoy a competitive advantage. With that said, we are awash in articles about the dangers posed by these data silos. Privacy and security, disinformation, bias, and a lack of transparency and control are just some of the issues that have plagued the perceived owners of “data monopolies.”

In recent years, researchers and practitioners have begun building tools focused on helping organizations acquire, build, and share high-quality data. Chen and his collaborators are doing some of the most interesting work in this space, and I recommend their new white paper and accompanying open source projects.

Sequence of basic market transactions in the Computable Labs protocol. Source: Roger Chen, used with permission.

We had a great conversation spanning many topics, including:

  • Why he chose to focus on data governance and data markets.
  • The unique and fundamental challenges in accurately pricing data.
  • The importance of data lineage and provenance, and the approach they took in their proposed protocol.
  • What cooperative governance is and why it’s necessary.
  • How their protocol discourages an unscrupulous user from just scraping all data available in a data market.
  • Related resources:

    • Roger Chen: “Data liquidity in the age of inference”
    • Ihab Ilyas and Ben lorica on “The quest for high-quality data”
    • Chris Ré: “Software 2.0 and Snorkel”
    • Alex Ratner on “Creating large training data sets quickly”
    • Jeff Jonas on “Real-time entity resolution made accessible”
    • “Data collection and data markets in the age of privacy and machine learning”
    • Guillaume Chaslot on “The importance of transparency and user control in machine learning”
    • ...more
      View all episodesView all episodes
      Download on the App Store

      O'Reilly Data Show PodcastBy O'Reilly Media

      • 4
      • 4
      • 4
      • 4
      • 4

      4

      63 ratings


      More shows like O'Reilly Data Show Podcast

      View all
      The Changelog: Software Development, Open Source by Changelog Media

      The Changelog: Software Development, Open Source

      285 Listeners

      O'Reilly Radar Podcast - O'Reilly Media Podcast by O'Reilly Media

      O'Reilly Radar Podcast - O'Reilly Media Podcast

      35 Listeners

      Data Skeptic by Kyle Polich

      Data Skeptic

      475 Listeners

      Talk Python To Me by Michael Kennedy

      Talk Python To Me

      580 Listeners

      Software Engineering Daily by Software Engineering Daily

      Software Engineering Daily

      624 Listeners

      O'Reilly Design Podcast - O'Reilly Media Podcast by O'Reilly Media

      O'Reilly Design Podcast - O'Reilly Media Podcast

      8 Listeners

      AWS Podcast by Amazon Web Services

      AWS Podcast

      203 Listeners

      Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

      Super Data Science: ML & AI Podcast with Jon Krohn

      295 Listeners

      Python Bytes by Michael Kennedy and Brian Okken

      Python Bytes

      214 Listeners

      Data Engineering Podcast by Tobias Macey

      Data Engineering Podcast

      139 Listeners

      DataFramed by DataCamp

      DataFramed

      266 Listeners

      Practical AI by Practical AI LLC

      Practical AI

      196 Listeners

      Google DeepMind: The Podcast by Hannah Fry

      Google DeepMind: The Podcast

      188 Listeners

      Me, Myself, and AI by MIT Sloan Management Review and Boston Consulting Group (BCG)

      Me, Myself, and AI

      99 Listeners

      AI Chat: ChatGPT & AI News, Artificial Intelligence, OpenAI, Machine Learning by Jaeden Schafer

      AI Chat: ChatGPT & AI News, Artificial Intelligence, OpenAI, Machine Learning

      139 Listeners

      This Day in AI Podcast by Michael Sharkey, Chris Sharkey

      This Day in AI Podcast

      178 Listeners

      The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis by Nathaniel Whittemore

      The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis

      397 Listeners