
Sign up to save your podcasts
Or


While AI training garners most of the spotlight — and investment — the demands ofAI inferenceare shaping up to be an even bigger challenge. In this episode ofThe New Stack Makers, Sid Sheth, founder and CEO of d-Matrix, argues that inference is anything but one-size-fits-all. Different use cases — from low-cost to high-interactivity or throughput-optimized — require tailored hardware, and existing GPU architectures aren’t built to address all these needs simultaneously.
“The world of inference is going to be truly heterogeneous,” Sheth said, meaning specialized hardware will be required to meet diverse performance profiles. A major bottleneck? The distance between memory and compute. Inference, especially in generative AI and agentic workflows, requires constant memory access, so minimizing the distance data must travel is key to improving performance and reducing cost.
To address this, d-Matrix developed Corsair, a modular platform where memory and compute are vertically stacked — “like pancakes” — enabling faster, more efficient inference. The result is scalable, flexible AI infrastructure purpose-built for inference at scale.
Learn more from The New Stack about inference compute and AI
Scaling AI Inference at the Edge with Distributed PostgreSQL
Deep Infra Is Building an AI Inference Cloud for Developers
Join our community of newsletter subscribers to stay on top of the news and at the top of your game
By The New Stack4.3
3131 ratings
While AI training garners most of the spotlight — and investment — the demands ofAI inferenceare shaping up to be an even bigger challenge. In this episode ofThe New Stack Makers, Sid Sheth, founder and CEO of d-Matrix, argues that inference is anything but one-size-fits-all. Different use cases — from low-cost to high-interactivity or throughput-optimized — require tailored hardware, and existing GPU architectures aren’t built to address all these needs simultaneously.
“The world of inference is going to be truly heterogeneous,” Sheth said, meaning specialized hardware will be required to meet diverse performance profiles. A major bottleneck? The distance between memory and compute. Inference, especially in generative AI and agentic workflows, requires constant memory access, so minimizing the distance data must travel is key to improving performance and reducing cost.
To address this, d-Matrix developed Corsair, a modular platform where memory and compute are vertically stacked — “like pancakes” — enabling faster, more efficient inference. The result is scalable, flexible AI infrastructure purpose-built for inference at scale.
Learn more from The New Stack about inference compute and AI
Scaling AI Inference at the Edge with Distributed PostgreSQL
Deep Infra Is Building an AI Inference Cloud for Developers
Join our community of newsletter subscribers to stay on top of the news and at the top of your game

32,246 Listeners

229,674 Listeners

16,174 Listeners

9 Listeners

3 Listeners

273 Listeners

9,724 Listeners

1,105 Listeners

626 Listeners

154 Listeners

4 Listeners

25 Listeners

10,254 Listeners

551 Listeners

5,576 Listeners

15,506 Listeners