April 02, 2025

How AI Learned to Chat About Pictures: Inside the MoshiVis Model

14 minutes

How do you teach a sophisticated speech AI to understand and discuss images, especially when paired image-speech data is rare?

This episode unpacks MoshiVis, a new model that achieves just that. We explore the challenges of building Vision-Speech Models and how MoshiVis overcomes them with a unique one-stage training pipeline, synthetic dialogues, and efficient "perceptual augmentation" techniques built upon the Moshi speech LLM.

Join us for a deep dive into the tech that lets AI see, speak, and converse fluidly about the visual world.

...more

View all episodes

By GenAI Level UP

April 02, 2025

How AI Learned to Chat About Pictures: Inside the MoshiVis Model

14 minutes

How do you teach a sophisticated speech AI to understand and discuss images, especially when paired image-speech data is rare?

Join us for a deep dive into the tech that lets AI see, speak, and converse fluidly about the visual world.

...more

Share How AI Learned to Chat About Pictures: Inside the MoshiVis Model

Sign up to save your podcasts

How AI Learned to Chat About Pictures: Inside the MoshiVis Model

How AI Learned to Chat About Pictures: Inside the MoshiVis Model