Complex Systems with Patrick McKenzie (patio11)

Inference engineering and the real-world deployment of LLMs, with Philip Kiely


Listen Later

Patrick McKenzie (patio11) and Philip Kiely, early employee at Baseten, discuss the inference stack: the critical layer of software and hardware that sits between a model’s weights and a user’s prompt. They cover inference engineering, how intermediate layers are evolving over a technical stack that is changing every six months, and how sophisticated organizations are actually consuming LLMs beyond just writing their questions into chatbot apps.

Full transcript available here: www.complexsystemspodcast.com/inference-engineering-with-philip-kiely/


Presenting Sponsors: Mercury, Meter, & Granola


Complex Systems is presented by Mercury—radically better banking for founders. Mercury offers the best wire experience anywhere: fast, reliable, and free for domestic U.S. wires, so you can stay focused on growing your business. Apply online in minutes at mercury.com.

Networking infrastructure has a way of accumulating technical debt faster than almost anything else in IT. Meter handles the full stack (wired, wireless, and cellular) as a single integrated solution: designed, deployed, and managed end-to-end so there's only one vendor to call when something goes wrong. Visit meter.com/complexsystems to book a demo. 


If meetings consistently leave you with hazy action items and lost context, Granola handles the transcription so you can actually participate and gives you searchable notes afterward. Try it free at granola.ai/complexsystems with code COMPLEXSYSTEMS

Links:

  • Download Inference Engineering: https://www.baseten.com/inference-engineering/ 
  • Philip's website: https://philipkiely.com/ 
  • Stripe's Emily Sands on Complex Systems: https://www.complexsystemspodcast.com/episodes/the-past-present-and-future-of-ai-with-stripe/ 
  • Des Traynor on Complex Systems: https://www.complexsystemspodcast.com/episodes/des-traynor/  

Timestamps:
(00:00) Intro
(00:30) The AI deployment pipeline
(03:04) Evolution of abstraction layers in engineering
(05:14) Defining inference and model weights
(08:45) Architecture of language and diffusion models
(10:11) AI adoption in the broader economy
(11:30) The shift toward agentic workflows and RL
(14:55) Function calling and real-world actions
(20:10) Sponsors: Mercury | Meter
(22:59) Technologies for agentic tools: MCP and skills
(25:32) The craft of writing a harness
(29:56) Using AI for automated proofreading and tool creation
(34:12) Balancing LLMs with deterministic code
(37:31) Observability and chain of thought reasoning
(39:31) Sponsor: Granola
(41:21) Observability and chain of thought reasoning
(50:45) Speculative decoding and hidden states
(55:37) The value of smaller, task-specific models
(59:55) Internal competencies versus buying solutions
(01:09:27) Self-publishing a technical book in record time
(01:23:20) Wrap

...more
View all episodesView all episodes
Download on the App Store

Complex Systems with Patrick McKenzie (patio11)By Patrick McKenzie

  • 4.9
  • 4.9
  • 4.9
  • 4.9
  • 4.9

4.9

140 ratings


More shows like Complex Systems with Patrick McKenzie (patio11)

View all
Odd Lots by Bloomberg

Odd Lots

2,000 Listeners

EconTalk by Russ Roberts

EconTalk

4,267 Listeners

Conversations with Tyler by Mercatus Center at George Mason University

Conversations with Tyler

2,457 Listeners

The a16z Show by Andreessen Horowitz

The a16z Show

1,105 Listeners

Macro Musings with David Beckworth by Mercatus Center at George Mason University

Macro Musings with David Beckworth

385 Listeners

Invest Like the Best with Patrick O'Shaughnessy by Colossus | Investing & Business Podcasts

Invest Like the Best with Patrick O'Shaughnessy

2,346 Listeners

Machine Learning Street Talk (MLST) by Machine Learning Street Talk (MLST)

Machine Learning Street Talk (MLST)

100 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

551 Listeners

ACQ2 by Acquired by Ben Gilbert and David Rosenthal

ACQ2 by Acquired

301 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

146 Listeners

Latent Space: The AI Engineer Podcast by Latent.Space

Latent Space: The AI Engineer Podcast

102 Listeners

"Econ 102" with Noah Smith and Erik Torenberg by Turpentine

"Econ 102" with Noah Smith and Erik Torenberg

147 Listeners

Money Stuff: The Podcast by Bloomberg

Money Stuff: The Podcast

402 Listeners

AI + a16z by a16z

AI + a16z

34 Listeners

TBPN by John Coogan & Jordi Hays

TBPN

139 Listeners