June 01, 2025

Week of 2025-06-01

8 minutes

Alex: Hello and welcome to The Generative AI Group Digest for the week of 01 Jun 2025!

Maya: We're Alex and Maya. Excited to dive into yet another packed week of AI news and community chatter!

Alex: First up, we’re talking about WhatsApp Business API, bot-building, and verification headaches.

Maya: Oh yes, that’s always a hot topic. Why do folks struggle so much to get verified with Meta’s WhatsApp API?

Alex: Maruti Agarwal shared that it took his team almost a year to get Meta’s approval, despite having Meta investors pushing internally — but it didn’t help much.

Maya: Wow, nearly a year! That’s a huge barrier. Is using third-party providers like Gupshup or Wati an easier route then?

Alex: It seems so. Maruti mentioned Gupshup offers the best rates in India, while Wati recently started selling their API. But many pointed out that you lose a lot of support and control compared to being a Meta partner, which is slow to approve.

Maya: And I heard switching your WhatsApp number from a third party provider to Meta directly can be tricky?

Alex: Exactly. Anuruddh noted that when moving a number from a BSP (Business Service Provider) to Meta, you lose the number’s sending reputation and have to start fresh. So some recommend starting fresh with Meta if you want ultimate flexibility, but that’s tougher.

Maya: For cost, Meta charges in INR at about 78 paise per message, whereas others charge in USD, making them more expensive for local Indian businesses.

Alex: Yup, so one strategy is to start with a BSP like Gupshup and later migrate to Meta once verified to save costs.

Maya: Okay, next, let's move on to…

Alex: Let’s talk about advances in AI model orchestration and prompt engineering! Nirant K shared a new official guide from Cursor on how to handle non-LLM libraries effectively using @Docs indexing, @Web for rarely used docs, and MCP for very company-specific tools.

Maya: That sounds super practical. Alex, what’s the benefit of classifying docs this way?

Alex: It helps AI agents efficiently tap into internal and external resources without overloading the context window. Plus, it reduces hallucination by grounding responses in accurate, updated docs.

Maya: Also, I saw shoutouts to DeepWiki and context7 MCP for third-party integration—those are apparently outperforming some default docs approaches.

Alex: Exactly. By integrating documentation smartly, we can build stronger, more reliable assistants. Makes AI workflows smoother in production.

Maya: Next up, let's move on to…

Alex: The discussion on system prompts and Constitutional AI! Ankit Sharma shared a great article on Claude’s system prompts.

Maya: Oh, I love this! Sid explained that Anthropic’s idea is to evolve the system prompt into a "constitution" — a solid set of rules that the model must follow.

Alex: Right, and that can break down the massive model prompt into smaller, manageable chunks of instructions.

Maya: Vikram had interesting questions – is there an optimal system prompt size or a rule of thumb for how large it should be?

Alex: Sid mentioned that the model itself is post-trained to follow those global and local instructions, but you still want to compress and balance prompt verbosity to avoid hurting performance.

Maya: So it’s kind of a partnership—model providers handle broad, “constitutional” rules, and developers craft precise, succinct custom prompts.

Alex: Exactly. On to the next topic.

Maya: Let’s talk about agentic retrieval versus traditional RAG (retrieval-augmented generation) in codegen workflows.

Alex: Yes, this came up in a lively thread. The idea is that instead of embedding everything in vector stores and running similarity search (which RAG does), agents can “rawdog” or directly explore the codebase.

Maya: Rawdog?

Alex: Basically, let the agent “grep” or search files itself, understand the structure by direct tool usage, rather than relying on vector search embeddings. Alok Bishoyi shared this approach.

Maya: Makes sense for complex codebases where vector search might miss context. Plus, markdown or XML-based prompts are reportedly better for code generation than plain text prompts.

Alex: Yup, OpenAI Cookbook and Claude docs even promote XML tags in system prompts for clarity and structure.

Maya: Pretty cool. Next, let’s move on to…

Alex: The ongoing debate around LLM evaluation and using LLMs as judges. Vetrivel PS kicked off a conversation about fine-tuned LLMs paired with RAG for evaluation tasks.

Maya: That’s something many teams wrestle with — how reliable are LLM metrics for accuracy?

Alex: Abhiram R summarized a practical approach: use standard metrics for faithfulness and relevance, plus continuous human-curated datasets, because current evaluation metrics can be noisy and fluctuate.

Maya: Cost is also a factor, since running heavy reasoning models daily for eval is expensive.

Alex: Right, some do a dedicated eval deployment and keep datasets manageable, balancing cost with value.

Maya: And multiple LLM judges per criterion can add complexity, so some opt for fine-tuning a cheaper LLM as a judge to keep costs down.

Alex: Great transition — next up, let's move on to…

Maya: AI stack and product tools: from voice agents to orchestration frameworks.

Alex: Lots of folks shared details. For Indian English/Hindi TTS, ElevanLabs and Sarvam’s Bulbul v2 got praise for natural voices.

Maya: Plus open source options like ai4bharat’s indic-parler-tts are available with speaker controls.

Alex: On voice agent building, platforms like Exotel and Avaya support Indian phone numbers, but Twilio does not yet fully.

Maya: And on orchestration frameworks similar to n8n for custom automations, folks mentioned Superblocks, Kestra (Apache 2.0), Motia.dev, and Langgraph for MCP server connections.

Alex: Lots of work on making agentic flows customizable and plug-and-play, especially for enterprises.

Maya: Awesome. Moving on…

Alex: There was a lively debate on LLM subscriptions and user experiences. Sankalp and Cheril shared that ChatGPT still offers the best mix for research and coding, but Gemini shines in deep research and some coding tasks.

Maya: And Claude excels in conversation and code meta prompting with strong document integration tools like Cursor.

Alex: Many folks rotate subscriptions based on their task—math, coding, deep research.

Maya: So it boils down to task-dependent usage and budget.

Alex: Last but not least, we must talk about RLVR—reinforcement learning with value-based rewards.

Maya: Yeah, Alok Bishoyi wondered why self-critic or doubling down on a model’s own preferences boosts performance.

Alex: Amit Bhor and Vinod explained that the model uses its own confidence as a reward signal—sharpening probability peaks it’s already confident about.

Maya: So it “gasps up” its own beliefs post-training, boosting accuracy without external verification.

Alex: Neat trick, but it’s not perfect—sometimes majority votes are still better.

Maya: Definitely a promising technique to explore further.

Alex: Maya, here’s a pro tip you can try today. Inspired by the WhatsApp bot discussion: If you’re planning to build a chatbot on WhatsApp, start with a provider like Gupshup for faster onboarding and lower costs, then migrate your number to Meta’s official API once verified to gain better rates and control.

Maya: That’s solid advice! Alex, how would you use that?

Alex: I’d definitely prototype on Gupshup to validate the bot quickly, then prepare for Meta’s verification for scale and cost savings.

Maya: Perfect. As always, the best way is practical experimentation coupled with smart migration.

Alex: Remember, when managing prompts for codegen or LLM orchestration, consider using structured formats like markdown or XML to improve clarity and model performance.

Maya: Don’t forget to balance human-curated datasets and automated metrics for reliable AI evaluation.

Maya: That’s all for this week’s digest.

Alex: See you next time!

...more

View all episodes