
Sign up to save your podcasts
Or


Common Crawl, a nonprofit web archive, is facing backlash from publishers, including Danish media outlets, over its role in AI training data. The publishers are demanding that Common Crawl remove copies of their articles from past data sets and stop crawling their websites. Common Crawl plans to comply, citing its inability to engage in costly legal battles. This controversy has significant implications for academic research, which heavily relies on Common Crawl's data sets, and raises concerns about the future of innovation in the AI field.
By Dr. Tony Hoang4.6
99 ratings
Common Crawl, a nonprofit web archive, is facing backlash from publishers, including Danish media outlets, over its role in AI training data. The publishers are demanding that Common Crawl remove copies of their articles from past data sets and stop crawling their websites. Common Crawl plans to comply, citing its inability to engage in costly legal battles. This controversy has significant implications for academic research, which heavily relies on Common Crawl's data sets, and raises concerns about the future of innovation in the AI field.

91,142 Listeners

32,147 Listeners

229,051 Listeners

1,095 Listeners

340 Listeners

56,472 Listeners

153 Listeners

8,889 Listeners

2,040 Listeners

9,909 Listeners

70 Listeners

1,864 Listeners

80 Listeners

268 Listeners

4,233 Listeners