This paper argues that compensating human labor for training data is the largest cost in developing Large Language Models, significantly exceeding model training expenses, and suggests fairer practices for the future.
https://arxiv.org/abs//2504.12427
YouTube: https://www.youtube.com/@ArxivPapers
TikTok: https://www.tiktok.com/@arxiv_papers
Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016
Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers