DataTalks.Club

Dataset Creation and Curation - Christiaan Swart


Listen Later

We talked about:

  • Christiaan’s background
  • Usual ways of collecting and curating data
  • Getting the buy-in from experts and executives
  • Starting an annotation booklet
  • Pre-labeling
  • Dataset collection
  • Human level baseline and feedback
  • Using the annotation booklet to boost annotation productivity
  • Putting yourself in the shoes of annotators (and measuring performance)
  • Active learning
  • Distance supervision
  • Weak labeling
  • Dataset collection in career positioning and project portfolios
  • IPython widgets
  • GDPR compliance and non-English NLP
  • Finding Christiaan online

  • Links:

    • My personal blog: https://useml.net/
    • Comtura, my company: https://comtura.ai/
    • LI: https://www.linkedin.com/in/christiaan-swart-51a68967/
    • Twitter: https://twitter.com/swartchris8/

    • ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp

      Join DataTalks.Club: https://datatalks.club/slack.html

      Our events: https://datatalks.club/events.html

      ...more
      View all episodesView all episodes
      Download on the App Store

      DataTalks.ClubBy DataTalks.Club

      • 5
      • 5
      • 5
      • 5
      • 5

      5

      7 ratings


      More shows like DataTalks.Club

      View all
      Talk Python To Me by Michael Kennedy

      Talk Python To Me

      583 Listeners

      Data Career Podcast: Helping You Land a Data Analyst Job FAST by Avery Smith - Data Career Coach

      Data Career Podcast: Helping You Land a Data Analyst Job FAST

      156 Listeners