June 13, 2024

Common Crawl Faces Backlash from Publishers Over AI Training Data

Listen Later

4 minutes

Common Crawl, a nonprofit web archive, is facing backlash from publishers, including Danish media outlets, over its role in AI training data. The publishers are demanding that Common Crawl remove copies of their articles from past data sets and stop crawling their websites. Common Crawl plans to comply, citing its inability to engage in costly legal battles. This controversy has significant implications for academic research, which heavily relies on Common Crawl's data sets, and raises concerns about the future of innovation in the AI field.

---

Send in a voice message: https://podcasters.spotify.com/pod/show/tonyphoang/message

...more

View all episodes

View all episodes

Download on the App Store

Download on the App Store

Get it on Google Play

The Artificial Intelligence Podcast

By Dr. Tony Hoang

4.6

99 ratings

June 13, 2024

Common Crawl Faces Backlash from Publishers Over AI Training Data

Listen Later

4 minutes

Common Crawl, a nonprofit web archive, is facing backlash from publishers, including Danish media outlets, over its role in AI training data. The publishers are demanding that Common Crawl remove copies of their articles from past data sets and stop crawling their websites. Common Crawl plans to comply, citing its inability to engage in costly legal battles. This controversy has significant implications for academic research, which heavily relies on Common Crawl's data sets, and raises concerns about the future of innovation in the AI field.

---

Send in a voice message: https://podcasters.spotify.com/pod/show/tonyphoang/message

...more

More shows like The Artificial Intelligence Podcast

Practical AI by Practical AI LLC

Practical AI

208 Listeners

The Ancients by History Hit

The Ancients

3,368 Listeners