September 18, 2025

Defeating Nondeterminism in LLM Inference

15 minutes

Link: https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference/

This episode of "The AI Research Deep Dive" explores a blog post from Thinking Machines Lab that solves a frustrating mystery: why large language models give different answers to the same prompt even with deterministic settings. The host explains how the authors debunked the common theory of random floating-point errors, instead identifying the true culprit as a lack of "batch invariance" in modern inference libraries. Listeners will learn how the way a user's request is batched with others randomly changes the underlying GPU calculations, leading to different results. The episode covers the team's solution—custom-engineered GPU kernels that enforce consistency—and discusses the profound implications for achieving perfect reproducibility and enabling more stable, "truly on-policy" reinforcement learning.

...more

View all episodes

By The AI Research Deep Dive

September 18, 2025

Defeating Nondeterminism in LLM Inference

15 minutes

Link: https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference/

...more

Share Defeating Nondeterminism in LLM Inference

Sign up to save your podcasts

Defeating Nondeterminism in LLM Inference

Defeating Nondeterminism in LLM Inference