
Sign up to save your podcasts
Or


Ani Baddepudi, Gemini Model Behavior Product Lead, joins host Logan Kilpatrick for a deep dive into Gemini's multimodal capabilities. Their conversation explores why Gemini was built as a natively multimodal model from day one, the future of proactive AI assistants, and how we are moving towards a world where "everything is vision." Learn about the differences between video and image understanding and token representations, higher FPS video sampling, and more.
Chapters:
0:00 - Intro
1:12 - Why Gemini is natively multimodal
2:23 - The technology behind multimodal models
5:15 - Video understanding with Gemini 2.5
9:25 - Deciding what to build next
13:23 - Building new product experiences with multimodal AI
17:15 - The vision for proactive assistants
24:13 - Improving video usability with variable FPS and frame tokenization
27:35 - What’s next for Gemini’s multimodal development
31:47 - Deep dive on Gemini’s document understanding capabilities
37:56 - The teamwork and collaboration behind Gemini
40:56 - What’s next with model behavior
Watch on YouTube: https://www.youtube.com/watch?v=K4vXvaRV0dw
By Google AI5
77 ratings
Ani Baddepudi, Gemini Model Behavior Product Lead, joins host Logan Kilpatrick for a deep dive into Gemini's multimodal capabilities. Their conversation explores why Gemini was built as a natively multimodal model from day one, the future of proactive AI assistants, and how we are moving towards a world where "everything is vision." Learn about the differences between video and image understanding and token representations, higher FPS video sampling, and more.
Chapters:
0:00 - Intro
1:12 - Why Gemini is natively multimodal
2:23 - The technology behind multimodal models
5:15 - Video understanding with Gemini 2.5
9:25 - Deciding what to build next
13:23 - Building new product experiences with multimodal AI
17:15 - The vision for proactive assistants
24:13 - Improving video usability with variable FPS and frame tokenization
27:35 - What’s next for Gemini’s multimodal development
31:47 - Deep dive on Gemini’s document understanding capabilities
37:56 - The teamwork and collaboration behind Gemini
40:56 - What’s next with model behavior
Watch on YouTube: https://www.youtube.com/watch?v=K4vXvaRV0dw

1,982 Listeners

1,094 Listeners

226 Listeners

8,580 Listeners

9,716 Listeners

198 Listeners

10,189 Listeners

5,538 Listeners

3,237 Listeners

16,216 Listeners

211 Listeners

688 Listeners

74 Listeners

111 Listeners

32 Listeners