ClickAI Radio

CAIR 31: I Don't Have Enough Data To Use AI !!!


Listen Later

We look at the question, How can I use AI if I don't have enough data?
How much data do I need to get value from AI?

 

Hi everybody welcome to another episode of ClickAI Radio. This is Grant. All right. So this is a very common problem, especially in the AI machine learning world, right? So you got a business, you want to be able to apply AI to your business, but you look at your data and you're like, I don't have enough data? Or do I? That's always a big question around AI, how much data does one actually need? We were working with a newly formed company, they were providing coaching to their clients. And they wanted to have insights on their client base so that well, they could improve their messaging, as well as their client acquisition. However, they only had a handful of clients. So they didn't really have a sufficient amount of data to leverage AI. And so we went about the business of building out a framework. So I'll introduce that framework to you later, as I walk through this. First and foremost, what is the lack of data? Well, one of the biggest problems around AI is those darn computers need a ton of information, to be able to start extracting the patterns and the predictions from it. So the very nature of AI requires this now, what would be some ways to get around this, one of the groups out of kt nuggets stated, hey, if you want to get rid of or overcome this problem of low data, well, you can always reduce the number of classifiers. Holy smokes, what does that mean? That is such a nerdy talk.

What that really means is when you're looking at your business, and you're breaking down the business problem, you want to reduce the amount of category. So you might have sales, for example. And you might say, well, I want to look at sales by geography, or I want to look at sales by salesperson, each of those would represent a category. So the fewer categories you have, typically the less amount of data that you need. So you could actually reduce your categories significantly, and have a lower or smaller amount of data and you can get started sooner rather than later. Now as time goes by, and as your data amount crows or the volume grows, you can certainly introduce more categories. Each category itself needs a sufficient amount of information or data in order for the AI to be effective at it. So what would be some of those numbers?

Well, here's here's some examples. If you're going to do some classical machine learning AI work, typically you'll need maybe 1000 rows of data. Now I'm not talking about AI that's used for images, or voice or text or things like that, what I'm really talking about is AI specifically for business information. So might be sales information, or might be information coming from manufacturing or payments, or whatever that might be. So at the very least, for your classical machine learning where you're looking to do some linear dependencies, you want to understand the input and the output relationships, generally 1000 rows of information of information would be would be the lowest amount that you would go with. Now if you're gonna go for some of these other kinds of AI, right, where you're looking to do advanced neural networks, then you're going to do well over 1000 bits or pieces of information, right? Whether that's images or whatever, you're going to get 10,000 50,000 100,000 of those.

But for the purposes of this conversation, what I'm going to focus on is not the image style kind of AI but rather the AI to help someone with, say a sales problem, or a refund problem or something like that, that's going to pull out of their business transactions, the information that they're getting from that. Now, the problem with a small amount of data is that it creates a problem called overfitting and overfitting creates this situation where it starts to create predictions that are too tightly coupled to a very small amount of information such that the the, the algorithms are too tightly connected to just a small set of data. What it means is you're getting predictions and analysis that are not generalized enough, right? Hence, they're over fitted. All right. So what can one do to do to address it? So the first thing is, is reduce the number of categories that I've mentioned. There's another technique also. And it's called data generation or data augmentation.

And this is a technique that's being used by some organizations where let's say that you had 1000 rows of information, then you apply some techniques where you generate or augment or, or artificially, if you will create additional data based off of the set of data you already have. And this allows organizations to get started sooner rather than later as well. In fact, the open AI group, they use something or they produce something called GPT, two or GPT. Three, that's a technique for generating human language, data augmentation. But I'm not talking about that I'm talking about business information, as it relates, in this case, to your sales or to refunds or to your payments and your payment processing. Well, this data augmentation technique is certainly something that you can do. But one of the things that I found most useful without getting overly complex for the business, because going through these techniques that I've that I've mentioned briefly, right, such as data augmentation, that's fairly invasive set of activities to do. A lot of businesses, especially small and medium businesses don't want to nor should get deeply entrenched in that it takes up a lot of time.

So there's another approach that we've developed that I call the continuous feed framework. And the continuous feed framework, its purpose is to line up your business for AI, right? It's Think of it like a drip system in your garden, right, where you're continually dripping little bits of water to a plant or something like that. The approach is you connect your business systems up to the AI engines, and like this drip system in your garden, you slowly start dripping data into the AI engines, the AI engines, then over time, continue to run and rerun on the data that's available. But what's it look what it's looking for, is for overfitting, it's looking to determine, Do I have enough information to in fact, have a generalized set of patterns that are now predictable and reliable. So when the model begins to develop this accuracy, and these predictability characteristics, then AI analysis and prediction activities can begin to be harvested, which produces insights for your business.

And I know I sound like Cloudy with a Chance of Meatballs right there. You know, it's Science, Science, Science, Science, right? All the guy hears is bigger, right? Bigger is better. Let me say it another way. If you have a small amount of data in your business, then you can easily and non invasively get started with AI by connecting your business systems up to these AI engines. And then like, like I said, like this drip system slowly build up that information. And when the AI starts to exhibit non overfitting, and highly accurate models, that's when you know, alright, I have enough that I can start taking the insights out of this and and using it to impact my business, which actually, ironically, has the effect of producing more data for you in the future. So it starts to become this nice, iterative and symbiotic relationship. All right, everybody, thank you for joining and until next time, get a continuous feed framework.

Thank you for joining Grant on ClickAI Radio. Don't forget to subscribe and leave feedback. And remember to download your FREE eBook visit ClickAIRadio.com now.

...more
View all episodesView all episodes
Download on the App Store

ClickAI RadioBy Grant Larsen

  • 5
  • 5
  • 5
  • 5
  • 5

5

1 ratings