The Five Nine

What is data curation and why does it matter?


Listen Later

This week we’re going back down the AI rabbit hole, but we’re venturing down a new tunnel to talk about something called data curation. 

Though AI is still a developing technology, it’s well enough known at this point that models are only as good as the data they’re trained on. But for enterprises looking to fine tune publicly available models, it can be a challenge to make sure they’re making the right data available. Why? Well, the vast majority of enterprise data is what is known as unstructured data. That includes any data that’s not numeric – photos, videos, emails, PDFs, you name it.  

Enter data curation – which is basically just the process of sorting through all this data to decide what is relevant to train the model and what’s not. Today this is mostly a tedious, manual process. But is it even worth the hassle? 

We spoke to Vincent Chen, Director of Product and Founding Engineer at Snorkel AI to get the lowdown on how data curation works, why it matters and whether it’s worth the hassle.  

To learn more about the topics in this episode: 

  • Snorkel AI dives into hot market of data curation https://www.fierce-network.com/cloud/snorkel-ai-dives-hot-market-data-curation 
  • Data storage gets spicy with help from AI https://www.fierce-network.com/ai/data-storage-gets-spicy-help-ai 
  • GenAI could illuminate decades worth of dark data https://www.fierce-network.com/cloud/unstructured-data-pandoras-box-genai-its-key  

See omnystudio.com/listener for privacy information.

...more
View all episodesView all episodes
Download on the App Store

The Five NineBy The Five Nine