April 22, 2025

Measuring the Prime Ingredient in Enterprise AI

10 minutes

In this Tech Barometer podcast, MLCommons Co-founder David Kanter talks about creating the MLPerf benchmark to help enterprises understand AI workload performance of various data storage technologies.

Find more enterprise cloud news, features stories and profiles at The Forecast.

Transcript:

David Kanter: Data is the prime ingredient in modern machine learning. I think there was an economist headline saying data is the new oil. Of course, where does that oil live? That oil lives on storage. Storage is kind of like Baskin and Robbins, except it’s possible there might be more than 31 flavors.

Jason Lopez: David Kanter is the co-founder of MLCommons. This is the Tech Barometer podcast. I’m Jason Lopez on this podcast, storage and the critical importance it plays in the age of AI. In this episode, David Cantor explains MLPerf Storage, a benchmark suite of MLCommons. This benchmark is designed to measure machine learning workloads in the context of storage systems, which hold the data used to train models. He says, to unlock the full potential of AI, storage systems need to be tailored to the process of machine learning, which is quite critical to AI training. And this is how the idea for the MLPerf Storage benchmark came about. Data fuels AI’s ability to learn and make decisions. The more data you have, the more powerful your models, the greater the breakthroughs. This is at the heart of the AI scaling laws developed by Greg Diamos, a co-founder of MLCommons. His work on AI scaling laws, neural network optimization, and GPU acceleration has had a big influence on the field of AI development.

[Related: Get a Grip on Data Storage in Quest for Enterprise AI]

David Kanter: The lesson of the scaling laws is that if you get enough data and you get a big enough model and enough compute to combine those together, that’s when you really get these qualitatively different outcomes, whether it’s a self-driving car or the ability to recognize an image better than a human. Data is really the top priority here.

Jason Lopez: But it’s not just about having data.

David Kanter: It’s how you process the data, how you manipulate the data. And there’s a ton of work that really shows that data is a first-class citizen. And of course, data needs a place to live, and that’s storage. You’re going to take batches of data and feed it into your compute system. And then you’re going to compute a forward pass and see how good the model is at predicting on that batch of data, compute the errors, and then adjust the model so it gets better, and then over time, the model will hopefully and often converge to an answer.

Jason Lopez: After enough iterations, the model becomes good at recognizing whatever you’re training it. It has converged to an optimal solution, but it can take a lot of data, which means a lot of storage. And to take it one step further, the faster and more efficient that data storage, the better for training, inference, and tuning models. In MLPerf’s first storage evaluations, the Nutanix Unified Storage Platform was a benchmark leader, and Cantor commented on these benchmark tools.

David Kanter: We built up a great set of infrastructure using some tooling from Argonne National Lab to help us measure sort of that data loading for AI. And critically, we can do it without having to have accelerators. You can actually run our benchmark and ask the question, hey, what would it take to feed 4,000 accelerators without having to shell out for 4,000 accelerators?

Jason Lopez: That’s how artificial intelligence is changing the way we process and analyze information. Making a model to see how something works is nothing new, but AI is raising the bar dramatically. It does this by requiring massive amounts of data to train the system.

David Kanter: I think the really critical thing is the size and type of data. We’ve got three different workloads in the benchmark, 3D images, and 2D images, and a scientific workload. So if we’re doing image recognition for, say, smaller images, you know, each image might be 100 kilobytes or so. When you want to feed your compute system, you’re going to be pulling in a batch of images, so your storage system has to be able to keep up. If you have really big images, then each image fetch is going to be a lot of data all in one fell swoop. But if you have smaller data, like let’s talk about large language models, right, you’re going to be working on a lot of text. That’s much smaller data, and it’s going to have actually a pretty different impact on the storage system.

Jason Lopez: Which leads to this insight. Image size is an issue, but it’s not just about images.

David Kanter: It’s actually about how many data samples, you know, whether it’s a 3D volume, whether it’s a 2D image, whether it’s a sentence, you know, whatever it is, how many of these samples are you getting? And critically, how many accelerators can you keep busy with a given storage system? If you’ve got a system that has, you know, 65 of these generation of accelerators, you know, five nodes of Nutanix may be right for you. Or, you know, maybe you’re looking to build a cluster that’s a bit more expandable, and maybe you should get 10. But that’s ultimately what the benchmark is telling us.

Jason Lopez: What the benchmark reveals helps users to understand how their system choices work together. That’s the big overview. But when you zoom in, the benchmarks evaluate a range of things, such as the safety of chatbot-gen AI systems measured in the AI Luminate benchmark, or the performance of large language models and other AI workloads on PCs in the MLPerf client benchmark.

David Kanter: An initiative that we’ve started recently at MLCommons is trying to turn our expertise in AI measurement to how can we make sure that these AI systems are going to be responsible and doing the right thing for us as we intend.

Jason Lopez: One intention, which gets a lot of headlines around AI, is safety. But another intention is building systems which don’t break the bank and don’t gobble up inordinate amounts of power. The owners of data centers benefit from the MLPerf storage benchmark with insights into how to manage costs or how to build data centers for optimum performance. Today, companies are rethinking how they build data centers.

David Kanter: As we’re shifting into the AI era, a lot of these systems use so much power that we may have to double the footprint of data centers. Obviously, that’s a big deal. And so we’re seeing people looking for new sources of energy and thinking more about where data centers are with energy in mind.

Jason Lopez: Older data centers often lack the electric capacity and cooling infrastructure needed to support hardware like GPUs, as well as train large-scale models, which use far more power than traditional computing. The next generation of data centers needs to be powerful and efficient.

David Kanter: Some of those older generation of data centers that we built in the 90s and 2000s just don’t work for AI. I think part of what we’re seeing is we need new data centers that can do AI. And that’s, you know, stressing the whole system. But the great news is they’re actually way more efficient than what we had before. And the systems we build are more efficient. Every day we’re discovering new things.

Jason Lopez: Kanter says you can’t optimize what you don’t measure, in this case, energy use. Training a single large AI model can produce as much carbon as five cars over its lifetime. Tools like MLPerf Storage measures AI performance per watt, helping developers choose more efficient hardware and software.

David Kanter: From a sustainability standpoint, we focused on measuring energy usage because you can measure it. We want to measure the power consumption of inference and training systems. I’m thrilled that inference we delivered a couple of years ago and then training, which is a bit more complicated because they’re bigger systems. We were able to get the first power measurements for AI training systems late last year. And we’ve got hopefully more coming soon with MLPerf training. The goal of my organization in many ways is how can we measure things in the AI world and help to make AI better for everyone. And where better means faster, more capable, more energy efficient and safer.

Jason Lopez: David Kanter is the co-founder of MLCommons, the organization which developed the benchmark suite MLPerf Storage. Some of the key players in this organization we’ve interviewed for Tech Barometer previously, Debo Dutta, chief AI officer at Nutanix, Greg Diamos, one of the key developers behind generative AI apps and discoverer of scaling laws. And Alex Karagris, the co-founder and co-chair for the MLCommons Medical Working Group.

Look for these podcasts and print stories at our Forecast news page, theforecastbynutanix.com. That’s all one word, theforecastbynutanix.com. Tech Barometer is a production of The Forecast. I’m Jason Lopez. Thanks for listening.

...more