
Sign up to save your podcasts
Or
Common Crawl, a nonprofit web archive, is facing backlash from publishers, including Danish media outlets, over its role in AI training data. The publishers are demanding that Common Crawl remove copies of their articles from past data sets and stop crawling their websites. Common Crawl plans to comply, citing its inability to engage in costly legal battles. This controversy has significant implications for academic research, which heavily relies on Common Crawl's data sets, and raises concerns about the future of innovation in the AI field.
4.9
88 ratings
Common Crawl, a nonprofit web archive, is facing backlash from publishers, including Danish media outlets, over its role in AI training data. The publishers are demanding that Common Crawl remove copies of their articles from past data sets and stop crawling their websites. Common Crawl plans to comply, citing its inability to engage in costly legal battles. This controversy has significant implications for academic research, which heavily relies on Common Crawl's data sets, and raises concerns about the future of innovation in the AI field.
1,272 Listeners
9,240 Listeners
331 Listeners
4,716 Listeners
111,917 Listeners
192 Listeners
2,543 Listeners
2,969 Listeners
9,207 Listeners
5,462 Listeners
28,494 Listeners
15,335 Listeners
173 Listeners
121 Listeners
491 Listeners