Tech Talks Daily

d-Matrix - Ultra-low Latency Batched Inference for Gen AI


Listen Later

What happens when the real bottleneck in artificial intelligence is no longer training models, but actually running them at scale?

In this episode of Tech Talks Daily, I sit down with Satyam Srivastava from d-Matrix to explore a shift that is quietly reshaping the entire AI infrastructure landscape. While much of the early AI race focused on training ever larger models, the next phase of AI adoption is increasingly defined by inference. That is the moment when trained models are deployed and used to generate real-world results millions of times a day.

Satyam brings a unique perspective shaped by years of experience in signal processing, machine learning, and hardware architecture, including time spent at NVIDIA and Intel working on graphics, media technologies, and AI systems. Now at d-Matrix, he is helping design next-generation computing architectures focused on one of the biggest challenges facing the AI industry today: efficiently running large language models without overwhelming data centers with unsustainable power and infrastructure demands.

During our conversation, we explored why the industry underestimated the infrastructure implications of inference at scale. While training large models grabs headlines, the real operational pressure often comes later when those models must serve millions of queries in real time. That shift places enormous strain on memory bandwidth, energy consumption, and data movement inside modern data centers.

Satyam explains how d-Matrix identified this challenge years before generative AI exploded into the mainstream. Instead of focusing on training hardware like many AI startups at the time, the company concentrated on inference efficiency. That decision is becoming increasingly relevant as organizations begin to realize that simply adding more GPUs to data centers is not a sustainable long-term strategy.

We also discuss the growing power constraints surrounding AI infrastructure, and why efficiency-driven design may be the only realistic path forward. With electricity supply, cooling capacity, and semiconductor availability all becoming limiting factors, the industry is being forced to rethink how AI systems are architected. Custom silicon, purpose-built accelerators, and heterogeneous computing environments are now emerging as key pieces of the puzzle.

The conversation also touches on the geopolitical and economic importance of AI semiconductor leadership, and why the relationship between frontier AI labs, infrastructure providers, and chip designers is becoming increasingly strategic. As governments and companies compete to maintain technological leadership, the question of who controls the hardware powering AI may prove just as important as the models themselves.

Looking ahead, Satyam shares his perspective on how the role of engineers will evolve as AI infrastructure becomes more specialized and energy-aware. Foundational engineering skills remain essential, but the next generation of engineers will also need to think in terms of entire systems, combining software, hardware, and AI tools to build more efficient computing environments.

As AI continues to move from research labs into everyday products and services, are organizations prepared for the infrastructure shift that comes with an inference-driven future? And could efficiency, rather than raw computing power, become the defining metric of the next phase of the AI race?

...more
View all episodesView all episodes
Download on the App Store

Tech Talks DailyBy Neil C. Hughes

  • 5
  • 5
  • 5
  • 5
  • 5

5

200 ratings


More shows like Tech Talks Daily

View all
This Week in Startups by Jason Calacanis

This Week in Startups

1,296 Listeners

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch by Harry Stebbings

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

536 Listeners

WSJ Tech News Briefing by The Wall Street Journal

WSJ Tech News Briefing

1,649 Listeners

The a16z Show by Andreessen Horowitz

The a16z Show

1,105 Listeners

Software Engineering Daily by Software Engineering Daily

Software Engineering Daily

626 Listeners

CyberWire Daily by N2K Networks

CyberWire Daily

1,028 Listeners

Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

Super Data Science: ML & AI Podcast with Jon Krohn

306 Listeners

NVIDIA AI Podcast by NVIDIA

NVIDIA AI Podcast

343 Listeners

Y Combinator Startup Podcast by Y Combinator

Y Combinator Startup Podcast

233 Listeners

Practical AI by Practical AI LLC

Practical AI

212 Listeners

Big Technology Podcast by Alex Kantrowitz

Big Technology Podcast

512 Listeners

Cybersecurity Headlines by CISO Series

Cybersecurity Headlines

139 Listeners

Business Breakdowns by Colossus | Investing & Business Podcasts

Business Breakdowns

355 Listeners

Bloomberg Tech by Bloomberg

Bloomberg Tech

69 Listeners

The AI Daily Brief: Artificial Intelligence News and Analysis by Nathaniel Whittemore

The AI Daily Brief: Artificial Intelligence News and Analysis

688 Listeners

Consulting the Future by Neil C. Hughes

Consulting the Future

0 Listeners

Startup Builders & Backers by Neil C. Hughes

Startup Builders & Backers

0 Listeners

IT Infrastructure as a Conversation by Neil C. Hughes

IT Infrastructure as a Conversation

0 Listeners

AI at Work by Neil C. Hughes

AI at Work

0 Listeners

The Business of Cybersecurity by Neil C. Hughes

The Business of Cybersecurity

0 Listeners

Business Technology Perspectives by Neil C. Hughes

Business Technology Perspectives

0 Listeners

Conversations from the Show Floor by Neil C. Hughes

Conversations from the Show Floor

0 Listeners