DataTalks.Club

Large-Scale Entity Resolution - Sonal Goyal


Listen Later

We talked about:

  • Sonal’s background
  • How the idea for Zingg came about
  • What Zingg is
  • The difference between entity resolution and identity resolution
  • How duplicate detection relates to entity resolution
  • How Sonal decided to start working on Zingg
  • How Zingg works
  • What Zingg runs on
  • Switching from consultancy to working on a new open source solution
  • Why Zingg is open source
  • Open source licensing
  • Working on Zingg initially vs now
  • Zingg’s current and future team
  • Sonal’s biggest current challenge
  • Avoiding problems with entity/identity resolution through database design
  • Identity resolution vs basic joins, data fusions, and fuzzy joins
  • Deterministic matching vs probabilistic machine learning
  • Identity and entity resolution applications for fraud detection
  • Graph algorithms vs classic ML in entity resolution
  • Identity resolution success stories
  • What Sonal would do differently given the chance to start over with Zingg
  • Advice for those seeking to realize their own solution to a data problem
  • Reading suggestion from Sonal
  • Conclusion

  • Links:

    • Open-Source Spotlight demo "Zingg":https://www.youtube.com/watch?v=zOabyZxN9b0
    • Creative Selection: Inside Apple's Design Process During the Golden Age of Steve Jobs book: https://www.amazon.com/Creative-Selection-Inside-Apples-Process/dp/1250194466

    • ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp

      Join DataTalks.Club: https://datatalks.club/slack.html

      Our events: https://datatalks.club/events.html

      ...more
      View all episodesView all episodes
      Download on the App Store

      DataTalks.ClubBy DataTalks.Club

      • 5
      • 5
      • 5
      • 5
      • 5

      5

      7 ratings


      More shows like DataTalks.Club

      View all
      Talk Python To Me by Michael Kennedy

      Talk Python To Me

      583 Listeners

      Data Career Podcast: Helping You Land a Data Analyst Job FAST by Avery Smith - Data Career Coach

      Data Career Podcast: Helping You Land a Data Analyst Job FAST

      156 Listeners