Machine Learning Archives - Software Engineering Daily

Snorkel: Training Dataset Management with Braden Hancock


Listen Later

Machine learning models require the use of training data, and that data needs to be labeled. Today, we have high quality data infrastructure tools such as TensorFlow, but we don’t have large high quality data sets. For many applications, the state of the art is to manually label training examples and feed them into the training process.

Snorkel is a system for scaling the creation of labeled training data. In Snorkel, human subject matter experts create labeling functions, and these functions are applied to large quantities of data in order to label it. 

For example, if I want to generate training data about spam emails, I don’t have to hire 1000 email experts to look at emails and determine if they are spam or not. I can hire just a few email experts, and have them define labeling functions that can indicate whether an email is spam. If that doesn’t make sense, don’t worry. We discuss it in more detail in this episode.

Braden Hancock works on Snorkel, and he joins the show to talk about the labeling problems in machine learning, and how Snorkel helps alleviate those problems. We have done many shows on machine learning in the past, which you can find on SoftwareDaily.com. Also, if you are interested in writing about machine learning, we have a new writing feature that you can check out by going to SoftwareDaily.com/write.

Sponsorship inquiries: [email protected]

The post Snorkel: Training Dataset Management with Braden Hancock appeared first on Software Engineering Daily.

...more
View all episodesView all episodes
Download on the App Store

Machine Learning Archives - Software Engineering DailyBy Machine Learning Archives - Software Engineering Daily

  • 4.4
  • 4.4
  • 4.4
  • 4.4
  • 4.4

4.4

69 ratings


More shows like Machine Learning Archives - Software Engineering Daily

View all
The Changelog: Software Development, Open Source by Changelog Media

The Changelog: Software Development, Open Source

286 Listeners

Data Skeptic by Kyle Polich

Data Skeptic

474 Listeners

Talk Python To Me by Michael Kennedy

Talk Python To Me

584 Listeners

Software Engineering Daily by Software Engineering Daily

Software Engineering Daily

630 Listeners

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) by Sam Charrington

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

431 Listeners

Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

Super Data Science: ML & AI Podcast with Jon Krohn

293 Listeners

Python Bytes by Michael Kennedy and Brian Okken

Python Bytes

212 Listeners

NVIDIA AI Podcast by NVIDIA

NVIDIA AI Podcast

322 Listeners

Syntax - Tasty Web Development Treats by Wes Bos & Scott Tolinski - Full Stack JavaScript Web Developers

Syntax - Tasty Web Development Treats

987 Listeners

DataFramed by DataCamp

DataFramed

270 Listeners

Practical AI by Practical AI LLC

Practical AI

196 Listeners

Last Week in AI by Skynet Today

Last Week in AI

280 Listeners

AI Chat: ChatGPT & AI News, Artificial Intelligence, OpenAI, Machine Learning by Jaeden Schafer

AI Chat: ChatGPT & AI News, Artificial Intelligence, OpenAI, Machine Learning

140 Listeners

This Day in AI Podcast by Michael Sharkey, Chris Sharkey

This Day in AI Podcast

191 Listeners

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis by Nathaniel Whittemore

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis

421 Listeners