May 18, 2025

Week of 2025-05-18

4 minutes

Alex: Hello and welcome to The Generative AI Group Digest for the week of 18 May 2025!

Maya: We're Alex and Maya.

Alex: First up, we’re talking about tackling hallucinations in large language models, or LLMs.

Maya: Hallucinations happen when the AI makes stuff up, right? How do folks catch that?

Alex: Great point! Shreyash Nigam shared that if you want to just spot hallucinations for logging, you can have an evaluator model check answers. But catching and fixing them live is trickier.

Maya: So how do you fix them on the fly?

Alex: Shreyash mentioned a clever method with two prompts: the first creates multiple possible answers, and the second ranks those answers by quality and accuracy. Then you pick the best one.

Maya: That’s smart—kind of like brainstorming and then picking the best idea?

Alex: Exactly! And Nitin Kalra takes another approach in production: he uses citations by passing document IDs that the model should include in its response in JSON format. If the LLM doesn’t cite a source ID, it probably hallucinated.

Maya: I love that—putting a “footnote” in the AI’s answer to keep it honest. What if the answer without citation is still useful?

Alex: Then he runs a classifier to check if it's helpful. If yes, show it; if not, display a polite sorry message. This approach balances accuracy with user experience.

Maya: That’s practical for real-world apps. Next, let’s move on to web crawling tools for LLMs.

Alex: SaiVignanMalyala recommended Firecrawl with customizations and LLMs. They even have an MCP—a multi-channel processing feature—recently added.

Maya: Firecrawl sounds neat! Have you tried Crawl4AI? I heard it can scrape multiple pages iteratively.

Alex: SaiVignan wanted to know that too; he’s curious if it can target specific entities across multiple pages starting from a main URL.

Maya: Sanjeed chimed in that Together AI also has a repo but no official framework yet.

Alex: So plenty of options if you want to feed web knowledge into your model smartly.

Maya: Next topic: Claude Code versus Codex for coding AI.

Alex: Shree asked if anyone has reviews comparing Claude Code to Codex. Palash said Codex CLI didn’t work nearly as well as Claude Code.

Maya: Rushabh agreed, and Shree wondered if the hype on Twitter might be paid influencer marketing.

Alex: Palash thought so too but wasn’t sure.

Maya: Kartik Khare prefers Codex because it’s open source and more flexible, though Claude Code has a better user experience.

Alex: So it’s a trade-off between UX polish and open-source flexibility.

Maya: Definitely. Now, let’s talk about Google Gemini's video streaming API.

Alex: Shree has been building with the Gemini live API web console but faced audio stuttering after 3-4 minutes.

Maya: That’s frustrating. Any clues why?

Alex: Palash and friends used Gemini realtime for a hackathon and didn’t notice much stuttering. Shree suspects context overflow in the model or frame rate is the issue.

Maya: Context overflow meaning the AI’s memory or the data it’s processing gets too large?

Alex: Exactly. Shree tested running 5-minute sessions at 1 fps with AI speaking and doing commentary. Lower fps helps avoid stutters but might reduce smoothness.

Maya: Sounds like a good balance between quality and stability.

Alex: Shree plans to test lower fps settings to improve stability. Fingers crossed it works well!

Maya: Great collaboration in this group! Now for a listener tip.

Maya: Here’s a pro tip you can try today: if you’re worried about AI hallucinations, try the two-prompt approach—generate multiple answers, then rank and pick the best.

Maya: Alex, how would you use that in a chatbot or app?

Alex: I'd use it to make answers more reliable by having the model critique itself before replying—kind of like peer review for AI!

Maya: Love that. It can make user trust much higher.

Alex: To wrap up, remember: using clever prompts and citation checks can dramatically reduce hallucinations.

Maya: Don’t forget to balance usability and stability when working with streaming AI APIs like Gemini—try adjusting frame rates to keep things smooth.

Maya: That’s all for this week’s digest.

Alex: See you next time!

...more

View all episodes