
Sign up to save your podcasts
Or
In this video interview, Liqid CTO Sumit Puri explains why dynamic IT infrastructure that taps into pools of GPUs and scale-up memory can quickly and efficiently run virtual machines, containers and AI workloads on-premises and at the edge.
Transcript:
Jason Lopez: Liquid is a software-defined infrastructure company that enables more efficient use of high-cost, high-power resources like GPUs in data centers. Instead of installing GPUs directly into each server, Liquid creates centralized GPU pools and dynamically allocates them to servers based on workload demands. This composable approach maximizes GPU utilization, reducing waste, and improving performance.
Sumit Puri: We saw this vision of GPUs being important in the data center many years back, and then all of a sudden this thing called ChatGPT burst upon the scene and made this front and center in everyone’s mind. And so we’ve been focusing on pooling and sharing these resources for a long time, and now all of a sudden AI is forcing itself into the mainstream, especially in the areas that we focus on, things like the enterprise, and now all of it’s coming kind of market for us in a very, very good way. And so we ended up building the product, and the market ended up coming our way.
[Related: Nutanix CEO Stokes Surge in IT Ecosystem Partnerships]
Jason Lopez: Liquid addresses the growing demand for AI inference workloads, especially for enterprises that want to keep their data on-prem instead of moving to the cloud.
Sumit Puri: What Liquid focuses on is very much on building power-efficient, cost-efficient solutions for inference. And it’s interesting to think whether it’s on-prem or it’s in the cloud, a lot of the data, which is what a lot of this AI is driven on, it lives on-prem. Eighty-three percent of the data is actually on-prem. And so one of two things must happen. We either must move the data into the cloud, or we must bring the GPUs on-prem. We think there’s a lot of customers who are not willing to wholesale move their data to public cloud, and so therefore we want to build efficient solutions to allow them to process their data on-prem.
[Related: Swarms of AI Agents Powering Businesses and Daily Life]
Jason Lopez: There are advantages of pooling and sharing GPUs instead of deploying them directly in each server.
Sumit Puri: There’s three primary benefits why somebody takes this journey of pooling and sharing the GPUs. One is around performance. I need that server to have a lot of GPUs because I need it to run very fast. We’re not limited by two or four or eight in a box. We can compose 30 GPUs to a server and give you the fastest servers on the planet. That’s one reason. The other is cost. If I have to deploy 30 GPUs, do I want to buy four servers, put eight GPUs in every server, buy a bunch of networking? Or do I want to buy a single server, deploy 30 GPUs, reduce my cost, reduce my power, and have a more efficient way of deploying these resources? And the third reason is agility. It’s very hard to predict, will my workload need one? Will it need two? Will it need four? Will it need eight? Do I use an H100? Do I use an A100? Do I use an L40S? There’s too many choices, and so we try to take the guesswork out of it. Let’s put a centralized pool of whatever device type we need and pick the right tool for the right job at the right time.
[Related: Search for VMware Alternatives That Meet Existing and Future Needs]
Jason Lopez: The AI infrastructure landscape is evolving from training to inference, especially for companies that are not building foundational models.
Sumit Puri: The first chapter of this entire AI journey was very much focused on training and building these foundational models. And the way that we see that going forward, there’s probably only going to be 10 companies on the planet that can afford to build these massively large 100,000, 200,000 GPU clusters to build the foundational model. The other 100,000 customers that are out there, they’re going to take these models, open-source models like Llama, they’re going to bring them on-prem, they’re going to fine-tune that model, they’re going to do RAG, they’re going to do inference, and that part of the journey now is just starting. We think by the end of the decade, inference actually is going to be a larger piece of the overall AI pie than is something like the training portion of it.
Jason Lopez: Liquid makes AI inference more accessible and efficient for enterprises, especially with Kubernetes and model deployments.
Sumit Puri: So NVIDIA has a big push for something called NIMS, NVIDIA Inference Microservices, which is basically a container. And what they’ve said, it’s very, very difficult for people to get all the layers of the stack perfectly right to deploy these models, and so we’re going to containerize these models, and that is the way that enterprises are going to go off and deploy this. We have a plugin for our solution where what we do is we suck the container in, we probe the container, we figure out what type of GPU and the quantity of GPU in the backend, and we connect that GPU resource to the specific server in the Kubernetes cluster, then we deploy that container on that machine, so you have this perfect matching of hardware to container, and we automated the entire process. We’re at a point now where we have one-click deployment of inference. You say, give me Llama7b go, and we will automate the entire process on the backend, and within two minutes, give you a model that you can speak with. When you’re done with it, you hit the delete button, we’ll rip those GPUs off, put them back into a centralized pool so that next container, that next model that you’re looking to deploy has resources to use. We think that’s how you get there, is you have to make it easy for enterprises to deploy these things. They don’t have the large armies of data scientists to do this on their own, so the more that we can automate this, the easier it is for those companies to consume, the more that we can make it more efficient, and the way that we think about efficiency is tokens per dollar and tokens per watt, because that’s what the enterprises are limited by, they’re limited by power and money. If you’re a hyperscaler and you have unlimited power, unlimited money, we can’t help you. But if you’re an enterprise, that is the metric you need to think about. Tokens per dollar, tokens per watt, automation, ease of use, those are the things that are needed to get AI to scale.
[Related: Bracing Data Centers for Wave of AI Workloads]
Jason Lopez: The company supports virtualization and dynamic GPU allocation in enterprise environments.
Sumit Puri: We are a platform for a variety of different applications. One of the applications we are very well suited for is virtualization. If we think about VMs for a second, it’s very difficult for enterprises to predict which VM they’re going to deploy at what time and what resources that VM might need. We deploy infrastructure for three to five years and making that long-term prediction is very difficult. We’ve partnered with Nutanix where we can put a centralized pool of GPUs inside of a Nutanix cluster and depending on the requirements of a specific VM, we can match the GPUs on the fly dynamically, hot-plugging GPUs into servers to meet the requirements of the VMs on Nutanix.
5
55 ratings
In this video interview, Liqid CTO Sumit Puri explains why dynamic IT infrastructure that taps into pools of GPUs and scale-up memory can quickly and efficiently run virtual machines, containers and AI workloads on-premises and at the edge.
Transcript:
Jason Lopez: Liquid is a software-defined infrastructure company that enables more efficient use of high-cost, high-power resources like GPUs in data centers. Instead of installing GPUs directly into each server, Liquid creates centralized GPU pools and dynamically allocates them to servers based on workload demands. This composable approach maximizes GPU utilization, reducing waste, and improving performance.
Sumit Puri: We saw this vision of GPUs being important in the data center many years back, and then all of a sudden this thing called ChatGPT burst upon the scene and made this front and center in everyone’s mind. And so we’ve been focusing on pooling and sharing these resources for a long time, and now all of a sudden AI is forcing itself into the mainstream, especially in the areas that we focus on, things like the enterprise, and now all of it’s coming kind of market for us in a very, very good way. And so we ended up building the product, and the market ended up coming our way.
[Related: Nutanix CEO Stokes Surge in IT Ecosystem Partnerships]
Jason Lopez: Liquid addresses the growing demand for AI inference workloads, especially for enterprises that want to keep their data on-prem instead of moving to the cloud.
Sumit Puri: What Liquid focuses on is very much on building power-efficient, cost-efficient solutions for inference. And it’s interesting to think whether it’s on-prem or it’s in the cloud, a lot of the data, which is what a lot of this AI is driven on, it lives on-prem. Eighty-three percent of the data is actually on-prem. And so one of two things must happen. We either must move the data into the cloud, or we must bring the GPUs on-prem. We think there’s a lot of customers who are not willing to wholesale move their data to public cloud, and so therefore we want to build efficient solutions to allow them to process their data on-prem.
[Related: Swarms of AI Agents Powering Businesses and Daily Life]
Jason Lopez: There are advantages of pooling and sharing GPUs instead of deploying them directly in each server.
Sumit Puri: There’s three primary benefits why somebody takes this journey of pooling and sharing the GPUs. One is around performance. I need that server to have a lot of GPUs because I need it to run very fast. We’re not limited by two or four or eight in a box. We can compose 30 GPUs to a server and give you the fastest servers on the planet. That’s one reason. The other is cost. If I have to deploy 30 GPUs, do I want to buy four servers, put eight GPUs in every server, buy a bunch of networking? Or do I want to buy a single server, deploy 30 GPUs, reduce my cost, reduce my power, and have a more efficient way of deploying these resources? And the third reason is agility. It’s very hard to predict, will my workload need one? Will it need two? Will it need four? Will it need eight? Do I use an H100? Do I use an A100? Do I use an L40S? There’s too many choices, and so we try to take the guesswork out of it. Let’s put a centralized pool of whatever device type we need and pick the right tool for the right job at the right time.
[Related: Search for VMware Alternatives That Meet Existing and Future Needs]
Jason Lopez: The AI infrastructure landscape is evolving from training to inference, especially for companies that are not building foundational models.
Sumit Puri: The first chapter of this entire AI journey was very much focused on training and building these foundational models. And the way that we see that going forward, there’s probably only going to be 10 companies on the planet that can afford to build these massively large 100,000, 200,000 GPU clusters to build the foundational model. The other 100,000 customers that are out there, they’re going to take these models, open-source models like Llama, they’re going to bring them on-prem, they’re going to fine-tune that model, they’re going to do RAG, they’re going to do inference, and that part of the journey now is just starting. We think by the end of the decade, inference actually is going to be a larger piece of the overall AI pie than is something like the training portion of it.
Jason Lopez: Liquid makes AI inference more accessible and efficient for enterprises, especially with Kubernetes and model deployments.
Sumit Puri: So NVIDIA has a big push for something called NIMS, NVIDIA Inference Microservices, which is basically a container. And what they’ve said, it’s very, very difficult for people to get all the layers of the stack perfectly right to deploy these models, and so we’re going to containerize these models, and that is the way that enterprises are going to go off and deploy this. We have a plugin for our solution where what we do is we suck the container in, we probe the container, we figure out what type of GPU and the quantity of GPU in the backend, and we connect that GPU resource to the specific server in the Kubernetes cluster, then we deploy that container on that machine, so you have this perfect matching of hardware to container, and we automated the entire process. We’re at a point now where we have one-click deployment of inference. You say, give me Llama7b go, and we will automate the entire process on the backend, and within two minutes, give you a model that you can speak with. When you’re done with it, you hit the delete button, we’ll rip those GPUs off, put them back into a centralized pool so that next container, that next model that you’re looking to deploy has resources to use. We think that’s how you get there, is you have to make it easy for enterprises to deploy these things. They don’t have the large armies of data scientists to do this on their own, so the more that we can automate this, the easier it is for those companies to consume, the more that we can make it more efficient, and the way that we think about efficiency is tokens per dollar and tokens per watt, because that’s what the enterprises are limited by, they’re limited by power and money. If you’re a hyperscaler and you have unlimited power, unlimited money, we can’t help you. But if you’re an enterprise, that is the metric you need to think about. Tokens per dollar, tokens per watt, automation, ease of use, those are the things that are needed to get AI to scale.
[Related: Bracing Data Centers for Wave of AI Workloads]
Jason Lopez: The company supports virtualization and dynamic GPU allocation in enterprise environments.
Sumit Puri: We are a platform for a variety of different applications. One of the applications we are very well suited for is virtualization. If we think about VMs for a second, it’s very difficult for enterprises to predict which VM they’re going to deploy at what time and what resources that VM might need. We deploy infrastructure for three to five years and making that long-term prediction is very difficult. We’ve partnered with Nutanix where we can put a centralized pool of GPUs inside of a Nutanix cluster and depending on the requirements of a specific VM, we can match the GPUs on the fly dynamically, hot-plugging GPUs into servers to meet the requirements of the VMs on Nutanix.