Today's stories explore how artificial intelligence is becoming more culturally aware and creative, with new systems that better represent Southeast Asian cultures, generate endless talking videos from voice commands, and compose full-length songs with lyrics. These breakthroughs highlight both the promise and challenge of making AI more inclusive and expressive, while raising questions about how these technologies might reshape entertainment, cultural representation, and human creativity.
Links to all the papers we discussed: Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural
Vision-Language Dataset for Southeast Asia, LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through
Two-Stage Rule-Based RL, YuE: Scaling Open Foundation Models for Long-Form Music Generation, MagicInfinite: Generating Infinite Talking Videos with Your Words and
Voice, UniF^2ace: Fine-grained Face Understanding and Generation
with Unified Multimodal Models, SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by
Imitating Human Annotator Trajectories