January 21, 2025

Episode 59: Teaching AI to Watch Videos Like Humans

32 minutes

What if machines could watch and understand videos just like we do? In this episode, we explore how cutting-edge models like Tarsier2 are breaking barriers in Video AI, redefining how machines perceive and analyze video content. From automatically detecting crucial moments in sports to enhancing security systems, discover how these breakthroughs are transforming our world.

🎯 Episode Highlights:

Beyond object detection: How AI now understands complex video scenes

Game-changing applications in sports analytics and security

Inside the technology: Frame-by-frame video comprehension

The future of automated video understanding and accessibility

Whether you're a tech enthusiast or industry professional, learn how Video AI is bridging the gap between machine perception and human understanding. No advanced ML knowledge needed!

📚 Based on groundbreaking research: Tarsier2, Video Instruction Tuning, and Moondream2

References for main topic:

[2501.07888] Tarsier2: Advancing Large Vision-Language Models from Detailed Video Description to Comprehensive Video Understanding

GitHub - bytedance/tarsier: Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with good capability of general video understanding.

[2410.02713] Video Instruction Tuning With Synthetic Data

vikhyatk/moondream2 · Hugging Face

...more

View all episodes

By Saugata Chatterjee

January 21, 2025

Episode 59: Teaching AI to Watch Videos Like Humans

32 minutes

🎯 Episode Highlights:

Beyond object detection: How AI now understands complex video scenes

Game-changing applications in sports analytics and security

Inside the technology: Frame-by-frame video comprehension

The future of automated video understanding and accessibility

Whether you're a tech enthusiast or industry professional, learn how Video AI is bridging the gap between machine perception and human understanding. No advanced ML knowledge needed!

📚 Based on groundbreaking research: Tarsier2, Video Instruction Tuning, and Moondream2

References for main topic:

[2501.07888] Tarsier2: Advancing Large Vision-Language Models from Detailed Video Description to Comprehensive Video Understanding

[2410.02713] Video Instruction Tuning With Synthetic Data

vikhyatk/moondream2 · Hugging Face

...more

Share Episode 59: Teaching AI to Watch Videos Like Humans

Sign up to save your podcasts

Episode 59: Teaching AI to Watch Videos Like Humans

Episode 59: Teaching AI to Watch Videos Like Humans