Share X-Omni: Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again

Copy link

July 31, 2025

X-Omni: Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again

16 minutes

Arxiv: https://arxiv.org/abs/2507.22058

This episode of "The AI Research Deep Dive" unpacks "X-Omni," a paper from Tencent that makes a bold claim: reinforcement learning can make autoregressive image models "great again." The host explains how this method tackles the historical weaknesses of autoregressive models, like blurry images and notoriously bad spelling. Listeners will learn about X-Omni's clever three-part architecture, which uses a large language model as a high-level planner, a semantic tokenizer for visual concepts, and a powerful diffusion model as a renderer. The episode's core focus is the sophisticated reinforcement learning loop that fine-tunes the model using a panel of "expert" reward models—including an "art critic" and a "spelling bee judge"—to achieve state-of-the-art results in generating coherent images with long, perfectly-spelled text.

...more

View all episodes

By The AI Research Deep Dive

July 31, 2025

X-Omni: Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again

16 minutes

Arxiv: https://arxiv.org/abs/2507.22058

...more

Sign up to save your podcasts