January 05, 2025

Week of 2025-01-05

6 minutes

Alex: Hello and welcome to The Generative AI Group Digest for the week of 05 Jan 2025!

Maya: We're Alex and Maya.

Alex: First up, we’re talking about multi-token prediction and training efficiency in large language models.

Maya: Multi-token prediction? How is that different from regular token prediction?

Alex: Good question! Normally models predict one token at a time. Multi-token prediction lets the model predict multiple tokens sequentially in one go.

Maya: Does that mean it speeds up training or inference?

Alex: Mostly training. As Rajiv and Tokenbender discussed, it provides richer feedback per example by predicting several tokens ahead, kind of like letting the model look ahead—Paras Chopra's analogy.

Maya: So the model learns better because it sees more predictions at once?

Alex: Exactly. Rajiv noted it should help with chain-of-thought reasoning too, which is vital for complex tasks.

Maya: Interesting! That’s a neat way to boost training without increasing costs drastically.

Alex: Next, let’s move on to model updates in real time, as Manan observed with ChatGPT models improving answers within 24 hours.

Maya: Wait, models learn and update that fast now?

Alex: Seems like it. Manan noticed GPT-4o PRO answered a question correctly one day; a day later, the smaller o1 and o1-mini models also got it right.

Maya: Does that happen via continuous training or something else?

Alex: Manan suspects they might constantly learn from chat interactions, though API data usage is uncertain.

Maya: So it’s almost like the models auto-correct over time based on user queries?

Alex: Precisely! And it even applies to the API versions. Shows how dynamic these AI systems are today.

Maya: Next, let’s dive into Hindi transcription and TTS models.

Alex: Lavish Saluja asked about the best Hindi speech-to-text models. Folks recommended Indic Whisper, Deepgram’s nova-2, and Gemini 2.0 Flash for affordable quality.

Maya: Are these models good at recognizing local Indian address details?

Alex: Yes, that’s the challenge. Deepgram and Sarvam AI are popular choices with good context understanding for Indian languages.

Maya: Makes sense, local context and dialects need specialized models.

Alex: Moving on, Rajat’s question about customer support chatbots for offline businesses going online caught attention.

Maya: That’s tricky. Such bots need to handle calls, CRM integration, scheduling, and be easy for non-tech users.

Alex: Jyotirmay advised buying existing solutions instead of building from scratch, with CallFluent highlighted as a rare option supporting agent actions.

Maya: So verticalized voicebots with agent capabilities are much harder than regular chatbots?

Alex: Exactly. Many tools just do retrieval augmented generation (RAG) on knowledge bases but can’t handle calls or schedule meetings fully.

Maya: Next, let's explore text classification with few labeled examples.

Alex: Aman tested DSPy for few-shot reasoning and SetFit, asking about newer options.

Maya: I see Nirant mentioned ModernBERT as a sample-efficient alternative, while Abhishek introduced FusionSent, which showed better F1 scores.

Alex: Great for scientific paper classification, though results await wider confirmation.

Maya: That shows how specialized models tailored for few-shot cases can outperform generic ones.

Alex: Now, let's talk about unstructured data handling and structured output with LLMs.

Maya: I love this topic! Bharath shared how you can dump info like name and age in unstructured text, pass it through services, then decode it reliably into structured XML using models like Claude-sonnet.

Alex: Right, Somya pointed out OpenAI’s recent evals showing GPT-4o’s improved adherence to structured outputs, which is crucial for scalable real-world apps.

Maya: This reduces the need for strict API schemas, making development faster and more flexible.

Alex: Sounds like LLMs are simplifying how services exchange data by understanding natural text context.

Maya: Next up, Ambika’s fun experiment running a small LM on a solar-powered Raspberry Pi in Udaipur!

Alex: So cool! She’s using ollama with a lightweight web chat UI. Paras called it fun, and Sud shared a ready-to-use docker container.

Maya: And she confirmed it works even without docker! Plus, she’s documenting her setup with solar power circuits.

Alex: That’s a delightful mix of tech and sustainability, inspiring for personal AI experiments.

Maya: Before we wrap, let’s mention some interesting tools and libraries shared this week.

Alex: Sure! Nirant recommended gitingest and LLMs.txt for letting tools like GitHub Copilot better understand docs.

Maya: And Aman pointed us to n8n, an open-source workflow engine great for building node-based API or Python script workflows.

Alex: Plus, the DeepLoups leaderboard shared by Ashwin highlights GitHub stars and memory needs of various LLM frameworks—AutoGPT tops the list.

Maya: Also, Mayank loves tools like n8n but wants better stateful conversation support for email bots—still a gap in tools.

Alex: That’s a lot of great stuff this week.

Maya: Here’s a pro tip you can try today: if you’re building a chatbot or a data pipeline, consider relaxing strict data schemas and rely on LLM-powered unstructured data parsing instead. It can speed up your development.

Maya: Alex, how would you use that?

Alex: I’d prototype APIs quickly by letting the model handle messy input, then convert it internally to structured formats. Saves lots of upfront schema design.

Maya: Perfect! To wrap up…

Alex: Remember, multi-token prediction and smart loss functions can boost model training efficiency and pave the way for smarter reasoning.

Maya: Don’t forget, AI models are evolving fast, even updating behind the scenes daily to improve answers.

Maya: That’s all for this week’s digest.

Alex: See you next time!

...more

View all episodes