GenAI Level UP

How AI Learned to Chat About Pictures: Inside the MoshiVis Model


Listen Later

How do you teach a sophisticated speech AI to understand and discuss images, especially when paired image-speech data is rare?


This episode unpacks MoshiVis, a new model that achieves just that. We explore the challenges of building Vision-Speech Models and how MoshiVis overcomes them with a unique one-stage training pipeline, synthetic dialogues, and efficient "perceptual augmentation" techniques built upon the Moshi speech LLM.


Join us for a deep dive into the tech that lets AI see, speak, and converse fluidly about the visual world.

...more
View all episodesView all episodes
Download on the App Store

GenAI Level UPBy GenAI Level UP