March 03, 2026

dLLM: Diffusion Gets a Framework

31 minutes

Episode 0021: dLLM: Diffusion Gets a Framework

Why it matters. Every major language model in production today — GPT, Claude, Gemini, Llama — generates text the same way: left to right, one token at a time. That sequential assumption has been so productive for so long that most researchers treat it as fixed. A team at UC Berkeley and the University of Illinois just published dLLM: Simple Diffusion Language Modeling, a unified open-source framework that refuses to take autoregression for granted. Diffusion language models generate entire sequences through iterative denoising — bidirectionally, in parallel — and dLLM is the infrastructure that lets the field measure, compare, and build on them systematically for the first time.

UC Berkeley and the University of Illinois. The paper comes out of UC Berkeley and the University of Illinois Urbana-Champaign. The full paper is at arXiv:2602.22661. The framework — including pretrained checkpoints — is live at github.com/ZHZisZZ/dllm. The paper's HuggingFace page, with community discussion and model cards, is at huggingface.co/papers/2602.22661.

The Researchers. Zhanhui Zhou (UC Berkeley, equal contributor) and Lingjie Chen (UIUC, equal contributor) led the engineering. Hanghang Tong is a professor at UIUC whose work spans graph learning and large-scale data mining. Dawn Song is a MacArthur Fellow and professor at UC Berkeley, known for foundational work in security, deep learning, and the intersection of the two — she co-founded Oasis Labs and has been a central figure in open ML research for two decades.

Key Technical Concepts. dLLM unifies two discrete diffusion paradigms that have been developing in isolated codebases. Masked Diffusion Language Modeling (MDLM) formalizes the masking objective pioneered by BERT (arXiv:1810.04805) within a principled probabilistic diffusion framework, extending ideas from Simplified and Generalized Masked Diffusion for Discrete Data (arXiv:2406.04329). Block Diffusion (BD3LM) hybridizes autoregressive and diffusion generation — sequences are divided into blocks, causal across blocks and jointly denoised within each block, offering a tunable dial between the two paradigms. Both trainers are built natively on HuggingFace Accelerate, PEFT, and FSDP, making them immediately accessible to anyone already in that ecosystem. The framework's unified evaluation harness solves a reproducibility crisis that has plagued the DLM literature since discrete diffusion models for text were first formalized (arXiv:2107.03006). Perhaps most consequentially, dLLM includes conversion recipes for adapting existing pretrained checkpoints — BERT-style models via continued training under MDLM, and causal Transformer (arXiv:1706.03762) models via attention mask modification — dramatically lowering the compute barrier for community experimentation on open-weight models like Llama, Qwen, and Mistral.

Daily Tech Feed: From the Labs is available on Apple Podcasts, Spotify, and wherever fine podcasts are distributed. Visit us at pod.c457.org for all our shows. New episodes daily.

...more

View all episodes

By Daily Tech Feed

March 03, 2026

dLLM: Diffusion Gets a Framework

31 minutes

Episode 0021: dLLM: Diffusion Gets a Framework

Daily Tech Feed: From the Labs is available on Apple Podcasts, Spotify, and wherever fine podcasts are distributed. Visit us at pod.c457.org for all our shows. New episodes daily.

...more

Share dLLM: Diffusion Gets a Framework

Sign up to save your podcasts

dLLM: Diffusion Gets a Framework

dLLM: Diffusion Gets a Framework