
Sign up to save your podcasts
Or


On a recent episode of the The New Stack Agents, Inception Labs CEO Stefano Ermon introduced Mercury 2, a large language model built on diffusion rather than the standard autoregressive approach. Traditional LLMs generate text token by token from left to right, which Ermon describes as “fancy autocomplete.” In contrast, diffusion models begin with a rough draft and refine it in parallel, similar to image systems like Stable Diffusion.
This parallel process allows Mercury 2 to produce over 1,000 tokens per second—five to ten times faster than optimized models from labs such as OpenAI, Anthropic, and Google, according to company tests. Ermon argues diffusion models better leverage GPUs, with support from investor Nvidia to optimize performance.
While Mercury 2 matches mid-tier models like Claude Haiku and Google Flash rather than top systems such as Claude Opus or GPT-4, Ermon believes diffusion’s speed and economic advantages will become increasingly compelling as AI applications scale.
Learn more from The New Stack about the latest developments around around large language model built on diffusion:
How Diffusion-Based LLM AI Speeds Up Reasoning
Get Ready for Faster Text Generation With Diffusion LLMs
Join our community of newsletter subscribers to stay on top of the news and at the top of your game.
By The New Stack4.3
3131 ratings
On a recent episode of the The New Stack Agents, Inception Labs CEO Stefano Ermon introduced Mercury 2, a large language model built on diffusion rather than the standard autoregressive approach. Traditional LLMs generate text token by token from left to right, which Ermon describes as “fancy autocomplete.” In contrast, diffusion models begin with a rough draft and refine it in parallel, similar to image systems like Stable Diffusion.
This parallel process allows Mercury 2 to produce over 1,000 tokens per second—five to ten times faster than optimized models from labs such as OpenAI, Anthropic, and Google, according to company tests. Ermon argues diffusion models better leverage GPUs, with support from investor Nvidia to optimize performance.
While Mercury 2 matches mid-tier models like Claude Haiku and Google Flash rather than top systems such as Claude Opus or GPT-4, Ermon believes diffusion’s speed and economic advantages will become increasingly compelling as AI applications scale.
Learn more from The New Stack about the latest developments around around large language model built on diffusion:
How Diffusion-Based LLM AI Speeds Up Reasoning
Get Ready for Faster Text Generation With Diffusion LLMs
Join our community of newsletter subscribers to stay on top of the news and at the top of your game.

32,246 Listeners

229,674 Listeners

16,174 Listeners

9 Listeners

3 Listeners

273 Listeners

9,724 Listeners

1,105 Listeners

626 Listeners

154 Listeners

4 Listeners

25 Listeners

10,254 Listeners

551 Listeners

5,576 Listeners

15,506 Listeners