The AI Research Deep Dive

Defeating Nondeterminism in LLM Inference


Listen Later

Link: https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference/

This episode of "The AI Research Deep Dive" explores a blog post from Thinking Machines Lab that solves a frustrating mystery: why large language models give different answers to the same prompt even with deterministic settings. The host explains how the authors debunked the common theory of random floating-point errors, instead identifying the true culprit as a lack of "batch invariance" in modern inference libraries. Listeners will learn how the way a user's request is batched with others randomly changes the underlying GPU calculations, leading to different results. The episode covers the team's solution—custom-engineered GPU kernels that enforce consistency—and discusses the profound implications for achieving perfect reproducibility and enabling more stable, "truly on-policy" reinforcement learning.

...more
View all episodesView all episodes
Download on the App Store

The AI Research Deep DiveBy The AI Research Deep Dive