Ernesto Garcia, Front-end Product Engineer, DoistThomas Jost, Backend Software Engineer, DoistHugo Fauquenoi, Product Manager, DoistHow Doist's 2-3 month AI exploration phase led to Ramble — and why voice-to-task emerged as the top contenderThe user research insight behind Ramble: people using pen and paper or ChatGPT voice to brainstorm tasks before committing them to TodoistWhy Ramble skips transcription entirely and processes raw audio directly with a Gemini live audio modelHow the model makes tool calls (add task, edit task, delete task) in real time while the user is still speaking — no text output at allDesigning for the driving use case: sound effects as audio confirmation cues alongside visual task cardsThe challenge of teaching an LLM to capture tasks literally without over-interpreting or doing them — and how temperature tuning played a roleDate handling complexity: injecting the current date, normalizing to days vs. months, and always outputting dates in English for the natural language parserBuilding an LLM-judge eval system with 20+ language recordings from 100+ employees across 35 countries to catch prompt regressionsWhy Doist chose to inject the full project/label list into the system prompt instead of building a RAG pipeline — and why it workedHow easy correction beats perfect first-time accuracy in natural language interfacesWhat's next: multimodal task capture from images and text blobs, Apple Watch support, and automation integrationsTodoistDoistGoogle Vertex AI (Gemini)00:00 Meet the Doist Team
01:40 What Doist Builds
02:27 Ramble Voice to Tasks
04:16 Why Voice Matters
07:42 Brain Dump Insight
09:46 Prototyping With LLMs
11:08 Live Audio Workflow
14:32 Driving Friendly UX
18:47 Tool Only Architecture
26:06 Evals and Multilingual Testing
28:41 Taming Dates and Time
33:28 Fixing Date Confusion
33:43 Defining Task Boundaries
34:34 Capture Versus Do
37:17 Tuning Creativity Levels
39:01 Evals Across Languages
41:23 Feedback and Regressions
44:09 Model Upgrades Over Time
46:33 Projects Labels Context
51:40 Handling Ambiguous Names
54:23 Whats Next Multimodal
58:48 From Capture to Execution