May 25, 2025

Week of 2025-05-25

6 minutes

Alex: Hello and welcome to The Generative AI Group Digest for the week of 25 May 2025!

Maya: We're Alex and Maya.

Alex: First up, we’re talking about tackling hallucinations in large language models. Shreyash Nigam shared some smart approaches.

Maya: Hallucinations, meaning when the model makes stuff up? How are they catching that live?

Alex: Good question! Shreyash mentioned one effective method uses two prompts: one to generate multiple answer options, and the second to rank them by quality and accuracy. Then you pick the best one.

Maya: So it’s like having a debate within the AI before picking its best answer. That’s clever! What about Nitin Kalra’s method with citations?

Alex: Right! Nitin uses document IDs for each retrieved source and asks the LLM to cite these IDs in JSON format. If the response lacks IDs, it’s probably hallucinating. Then he runs a classification check to see if it’s even helpful before showing it.

Maya: That feels like adding a “source check” to the AI’s answer. Very useful for production use. Moving on, I saw some talk about video streaming with Gemini. Shree brought it up.

Alex: Yes! Shree was building on Gemini’s live video streaming API, but faced stuttering after a few minutes. Palash suggested it might be context overflow and mentioned keeping fps low helps.

Maya: Sounds like real-time streaming strains the context window. Sanjeed recommended webrtc over websocket for live stuff too.

Alex: Exactly. And Gemini also introduced context window compression to help maintain longer connections without stutter.

Maya: Next, let’s dive into Agentic AI versus AI agents. Navanit shared a great paper on that.

Alex: The key idea is that AI agents are single-purpose autonomous bots, while Agentic AI is a broader system with multiple agents working together on complex tasks.

Maya: Ah, so Agentic AI is like a whole team of AI agents collaborating. Makes sense. And there was some debate about this analogy with soda versus carbonated water—a fun way to highlight the confusion.

Alex: Definitely. It's evolving terminology but understanding it helps us design smarter AI systems.

Maya: Moving forward, Sparsh shared a challenge about training large models needing 28 TB storage on VAST.AI.

Alex: That's huge! Aashay recommended using streaming datasets with cloud buckets instead of massive clusters. Streaming data means loading it in chunks during training to avoid the need for all storage at once.

Maya: Smart move to optimize cost and scale. Okay, now about cheating in AI-assisted interviews—Vaibhav Bhargava raised concerns.

Alex: Yes, the rise of AI tools makes it tricky. Some companies emphasize in-person rounds or design questions that are tough to answer directly with AI.

Maya: Shan Shah made a good point—maybe the interview focus should shift to how candidates use AI, not just what answers they get. Skills like reasoning or creativity.

Alex: And others use voice or live feedback to flag suspicious behavior. It’s a growing tension between AI assistance and authentic assessment.

Maya: Here's something practical: Saivignan mentioned Firecrawl with MCP for custom LLM integrations and web scraping capabilities.

Alex: Firecrawl sounds promising for iterative scraping and data extraction. Worth exploring for projects needing multi-page entity mining.

Maya: Next, there was a great update from the Sarvam team about their new 24B model, Sarvam-M, built on Mistral 24B.

Alex: Right, their fine-tuning efforts really shine. The model is Apache 2.0 licensed—encouraging for open usage—and beats some competitors on benchmarks, especially for Indic languages.

Maya: That definitely boosts the Indian AI ecosystem with strong open models.

Alex: Another hot topic: the new Claude 4 Opus and Sonnet models from Anthropic. Nirant K praised them for being better on code benchmarks and having features like web search and Python sandboxing.

Maya: Anthropic’s integration of web search and file APIs sounds like a powerful developer experience, especially for AI agents.

Alex: Shree shared interesting discussions on Gemini video streaming context compression and their API improvements, helping with longer, smoother sessions.

Maya: We also can’t miss the Apple AI story. Pulkit Gupta and Rajesh RS debated if Apple’s opening models to developers will revive its innovation.

Alex: They noted Apple’s strengths in hardware and ecosystem but concerns over their AI product integration lagging behind Google and Meta.

Maya: That interplay between product innovation and AI advancement will be fascinating to watch.

Alex: Alright, Maya, here’s a quick tip for our listeners: if you’re dealing with hallucinations in your LLM apps, consider the two-prompt ranking system Shreyash mentioned. Generate multiple candidates, then rank to pick the best answer. It helps catch and reduce misleading outputs.

Maya: Love that! Alex, how would you use this in your projects?

Alex: I’d build a lightweight validator to produce multiple answers with slight prompt variations and have a small ranking model or heuristic pick the most factual one before showing users. It’s like quality control for your AI.

Maya: Great thoughts! To wrap up, I’ll say: Don’t forget, AI agents and Agentic AI are evolving fast, but grasping how multiple agents collaborate prepares us for future complex tasks.

Alex: And remember, new open models like Sarvam-M and Claude 4 show open-source and developer experience are frontiers where the AI game is heating up.

Maya: That’s all for this week’s digest.

Alex: See you next time!

...more

View all episodes