AI Tinkerers - "One-Shot"

How Vik Built Moondream—A Tiny Vision Model with Big Power


Listen Later

Vik from Moondream AI joins Joe to demo a vision-language model that runs locally—on your laptop, your phone, even a Raspberry Pi.

From visual question answering to gaze detection and UI automation, Vik shows how Moondream is redefining edge computer vision—no cloud required.

Whether you're into robotics, home automation, or lightweight AI, this “One-Shot” is packed with insights for builders.

Try it yourself at moondream.ai 🚀

00:00 - Intro to Moondream’s compression tech for 2B parameter models

00:22 - Joe welcomes Vik from Moondream

01:53 - Shift from traditional CV to promptable vision-language models

03:23 - Playground demo: Visual Question Answering (VQA)

04:42 - VQA demo results: speed, structure, and accuracy

05:03 - Object detection, pointing, and captioning demos

07:57 - Prompts that push reasoning: uniform detection

10:07 - Cross-task benefits: gaze detection improves directional reasoning

11:02 - Comparing Moondream’s VQA to Quinn’s visual reasoning model

13:21 - Why edge deployment still matters in vision

15:21 - 0.5B model runs on Raspberry Pi using 816MB with int4

16:15 - HAL 2000 setup: Moondream + Tiny LLaMA + Coqui TTS

20:55 - Texas rancher uses drone and Moondream for cow detection

21:53 - Commercial use: air-gapped environments like retail, safety

23:09 - UI automation and button detection with pointing feature

29:41 - Vision for ambient agents and local inference

30:27 - Power efficiency: 10x less energy than 7B/20B cloud models

31:01 - Moondream API & Hugging Face transformers integration

36:11 - Vik’s background: From AWS to machine learning

40:56 - Discovering AI Tinkerers and global meetups

#AITinkerers #MoondreamAI #EdgeAI #ComputerVision #LLM #OpenSource #OneShot

...more
View all episodesView all episodes
Download on the App Store

AI Tinkerers - "One-Shot"By Joe Heitzeberg