A marketplace for high-quality training data models for AI is about to emerge, and it will be extremely lucrative.
A good data model is supposed to represent something in the real world.However, many data models are based on data exclusively from the internet.Just image the downstream consequences of that.For example, a data model based upon social media user-generated content will be full of:Bias.Miss-truths and half-truths.Opinions (some of them dangerous).Invalidate data with no sources, no peer review...If a data model is built off bad data, and then that data is used to train an AI, that AI will contain the same bias, miss-truths, dangerous opinions etc.Getting clean data to drive good decisions, be they human or AI, is becoming increasingly difficult.We are swamped in data, but the signal-to-noise ratio is low.The garbage in/garbage out problem has never been greater, and thanks to AI, the downstream consequences have never been higher.The business opportunity here is great however: a marketplace for high-quality training data models for AI is about to emerge, and it will be extremely lucrative.Ironically, a return to offline data such as peer-reviewed papers and books may the solution.Such legacy silos of data will become the new gold rush.In such a market place, the quality of an AI will be judged by the quality of it's training data.What I am working on this week:Designing an internet search indexer for the Alpha Framework.Media I am enjoying this week:Diaspora by Greg Egan.Notes and subscription links are here: https://techleader.pro/a/638-Tech-Leader-Pro-podcast-2024-week-12,-the-AI-training-data-market-place