March 23, 2024

Tech Leader Pro podcast 2024 week 12, the AI training data market place

5 minutes

A marketplace for high-quality training data models for AI is about to emerge, and it will be extremely lucrative.

Notes:

A good data model is supposed to represent something in the real world.

However, many data models are based on data exclusively from the internet.

Just image the downstream consequences of that.

For example, a data model based upon social media user-generated content will be full of:

Bias.

Miss-truths and half-truths.

Opinions (some of them dangerous).

Invalidate data with no sources, no peer review...

If a data model is built off bad data, and then that data is used to train an AI, that AI will contain the same bias, miss-truths, dangerous opinions etc.

Getting clean data to drive good decisions, be they human or AI, is becoming increasingly difficult.

We are swamped in data, but the signal-to-noise ratio is low.

The garbage in/garbage out problem has never been greater, and thanks to AI, the downstream consequences have never been higher.

The business opportunity here is great however: a marketplace for high-quality training data models for AI is about to emerge, and it will be extremely lucrative.

Ironically, a return to offline data such as peer-reviewed papers and books may the solution.

Such legacy silos of data will become the new gold rush.

In such a market place, the quality of an AI will be judged by the quality of it's training data.

What I am working on this week:

Designing an internet search indexer for the Alpha Framework.

Media I am enjoying this week:

Diaspora by Greg Egan.

Notes and subscription links are here: https://techleader.pro/a/638-Tech-Leader-Pro-podcast-2024-week-12,-the-AI-training-data-market-place

...more

View all episodes

By John Collins

March 23, 2024

Tech Leader Pro podcast 2024 week 12, the AI training data market place

5 minutes

A marketplace for high-quality training data models for AI is about to emerge, and it will be extremely lucrative.

Notes:

A good data model is supposed to represent something in the real world.

However, many data models are based on data exclusively from the internet.

Just image the downstream consequences of that.

For example, a data model based upon social media user-generated content will be full of:

Bias.

Miss-truths and half-truths.

Opinions (some of them dangerous).

Invalidate data with no sources, no peer review...

If a data model is built off bad data, and then that data is used to train an AI, that AI will contain the same bias, miss-truths, dangerous opinions etc.

Getting clean data to drive good decisions, be they human or AI, is becoming increasingly difficult.

We are swamped in data, but the signal-to-noise ratio is low.

The garbage in/garbage out problem has never been greater, and thanks to AI, the downstream consequences have never been higher.

The business opportunity here is great however: a marketplace for high-quality training data models for AI is about to emerge, and it will be extremely lucrative.

Ironically, a return to offline data such as peer-reviewed papers and books may the solution.

Such legacy silos of data will become the new gold rush.

In such a market place, the quality of an AI will be judged by the quality of it's training data.

What I am working on this week:

Designing an internet search indexer for the Alpha Framework.

Media I am enjoying this week:

Diaspora by Greg Egan.

Notes and subscription links are here: https://techleader.pro/a/638-Tech-Leader-Pro-podcast-2024-week-12,-the-AI-training-data-market-place

...more

Share Tech Leader Pro podcast 2024 week 12, the AI training data market place

Sign up to save your podcasts

Tech Leader Pro podcast 2024 week 12, the AI training data market place

Tech Leader Pro podcast 2024 week 12, the AI training data market place