Share ∇−reasoner: LLM reasoning via test-time gradient descent in latent space

Copy link

March 14, 2026

∇−reasoner: LLM reasoning via test-time gradient descent in latent space

21 minutes

This paper introduces ∇-Reasoner, a novel framework that improves Large Language Model (LLM) reasoning by applying gradient-based optimization during the inference process. Unlike traditional methods that rely on random sampling or discrete searches, this approach uses Differentiable Textual Optimization (DTO) to refine token logits through first-order gradients derived from reward models and likelihood signals. By iteratively updating textual representations in latent space, the system allows for bidirectional information flow, enabling the model to correct its reasoning chains on the fly. To ensure efficiency, the framework incorporates gradient caching and rejection sampling, which reduce the computational burden typically associated with backpropagation. Empirical results demonstrate that $\nabla$-Reasoner significantly boosts accuracy on complex mathematical benchmarks while requiring fewer model calls than existing search-based baselines. Ultimately, the research establishes a theoretical and practical shift toward treating test-time reasoning as a continuous optimization problem rather than a simple stochastic generation task.

...more

View all episodes

By Enoch H. Kang

March 14, 2026

∇−reasoner: LLM reasoning via test-time gradient descent in latent space

21 minutes

...more

Sign up to save your podcasts