Panel - Strategies for responsible AI training in media with Natali Helberger, Paul Keller, Hanneke Holthuis, Daan Odijk & Dayana Spagnuelo
The use of media content for training Generative AI models is subject to intense
controversies. Some media organisations are closing deals with companies such as
OpenAI about the use of their contents for training (like Springer), others take
the ways to the courts to object to their data being used (New York Times vs
Open AI) or might consider training their own models. All of them struggle with
the challenge of sorting out the conditions for using media data for training AI
models in a way that, on the one hand, respects the (intellectual) property
rights, economic and competitive interests of the media, and on the other hand,
contributes to more responsible models, trained on high-quality and
multi-language content. Looking more broadly, the provision of high-quality,
publicly available data for AI training is seen as a measure needed to break
concentrations of power - since these largely depend on asymmetries in access to
data. Media is one such category of public interest data sources, alongside
research or heritage data.
The goal of this panel is to map the different competing interests and
considerations and explore the extent to which regulations such as the AI Act or
the Directive on Copyrights and the Single Market offer workable solutions and
where there is room for improvement. As Europe aims to build common data spaces
for media, the panel will map out different potential strategies.
Natali HelbergerPaul KellerHanneke HolthuisDaan OdijkDayana Spagnuelohttps://conference.publicspaces.net/en/session/media-content-for-responsible-ai