The AI Research Deep Dive

X-Omni: Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again


Listen Later

Arxiv: https://arxiv.org/abs/2507.22058

This episode of "The AI Research Deep Dive" unpacks "X-Omni," a paper from Tencent that makes a bold claim: reinforcement learning can make autoregressive image models "great again." The host explains how this method tackles the historical weaknesses of autoregressive models, like blurry images and notoriously bad spelling. Listeners will learn about X-Omni's clever three-part architecture, which uses a large language model as a high-level planner, a semantic tokenizer for visual concepts, and a powerful diffusion model as a renderer. The episode's core focus is the sophisticated reinforcement learning loop that fine-tunes the model using a panel of "expert" reward models—including an "art critic" and a "spelling bee judge"—to achieve state-of-the-art results in generating coherent images with long, perfectly-spelled text.

...more
View all episodesView all episodes
Download on the App Store

The AI Research Deep DiveBy The AI Research Deep Dive