Alex: Hello and welcome to The Generative AI Group Digest for the week of 09 November 2025!
Maya: We're Alex and Maya.
Alex: Big week in the group — a lot of threads, but a few themes stood out. Ready to dive in?
Maya: Always. Let’s start with the “no-code / vibe coding” conversation that kicked off with Paras Chopra sharing nokode.
Alex: Right — Paras shared the nokode repo and Ankur Pandey called it exactly what Andrej Karpathy meant by “vibe coding.” That idea is basically: let big models stitch interfaces and glue code together with very little human typing. Sounds magical, but Abhiram R raised the classic worry — reliability. He said every run can give you a different interface, which is a huge UX and maintenance problem.
Maya: That’s the core trade-off — speed and creativity versus reproducibility. The group had some practical fixes. Paras suggested an easy solve: get the LLM to write actual code files, so outputs are consistent. Nilesh expanded that into a smart architecture: use a hierarchy of intelligence — reuse previously generated code, have the LLM write fresh code and cache it, push complex cases to a deep research agent, and if all else fails, send it to a human inbox.
Alex: I loved that. It’s basically turning the model into a developer that produces artifacts you can test and version. For non-technical listeners: think of “vibe coding” as asking an assistant to assemble a small app — but without discipline, it’s like asking a painter to redraw the Mona Lisa differently each time. Making the model produce files we check into CI is how we get predictable results.
Maya: Why this matters: lots of teams will try no-code or LLM-driven UI generation because it accelerates iteration. But if you don’t lock outputs into file artifacts, tests, and caching, you’ll end up with flaky products and angry users. Non-obvious takeaway: treat generated code like any third-party dependency — version it, run unit and integration tests on it, and create a fallback path that’s human-reviewable.
Alex: Practical ideas: use nokode or similar generators, then pipe generated code through automated tests and snapshot UI outputs. Add a cache or artifact store — versoned zip or container — so you can redeploy the exact generated build. If there's persistent churn, run a small “stabilizer” model that normalizes filenames, export formats, and APIs before committing. Also log model inputs/outputs so you can audit why something changed.
Maya: One more thought: nilesh quipped that “we won’t need code” will be like “we won’t need RAM because we have hard disks” — memorable line. The point is people will stop thinking about code if the toolchain hides it, so design guardrails early.
Alex: Next big thread — voice AI for inbound and outbound calls. Ashwin Ramaswamy asked whether calls are really switching to AI. The group had a nuanced split.
Maya: Yes. Several folks said the tech is great for certain niches but operationally hard. Bargava in healthcare called it “a grind.” Cheril pointed out many YC startups are already using 11labs for TTS and are willing to pay because human labor in the US is expensive.
Alex: But India-specific economics are different and important. Mayank Shekhar laid out the math: human telecallers cost ~21–30k INR/month; an API at ~4 Rs/min ends up near that monthly number for heavy-minute scenarios. He also noted quality expectations can be lower for bulk outbound calls, so AI can be viable for lead qualification where conversion economics matter.
Maya: Nirant reframed the metric nicely — cost per conversion. Humans may convert 50%, voice agents are hitting ~18–20% completion for some workflows. So if AI is dramatically cheaper per minute, it can be viable despite lower conversion. In India, companies are using hybrid patterns: voice to WhatsApp handoff, voice for bulk qualification, then humans for the high-value follow-ups.
Alex: Practical implications: don’t think “replace humans” — think “reallocate humans.” Use voice AI to triage, do bulk notifications, or schedule appointments, and design handoffs to humans for nuanced conversations. Local TTS quality matters — Eleven Labs now offers India data residency, and there are local players like Sarvam, Gnani AI, Greylabs and Riverline working on Indic speech and domain deployments.
Maya: Non-obvious tip: measure cost per conversion from day one. Run small pilots that compare a human-only funnel and an AI-assisted funnel and price based on outcomes rather than raw minutes. Also invest in dialect-specific fine-tuning or small local models for TTS if your region mixes many accents — that lifts perceived quality more than raw model size.
Alex: And UX matters — better human-in-the-loop workflows, graceful handoffs, and “agent laughter” or appropriate tone were called out as differentiators. If you can improve unit economics and handle interruptions and tone variation, you win.
Maya: Another big topic was agent frameworks and interoperability — Nipun asked about A2A and whether to build or buy.
Alex: Right. G Kuppuram and others said A2A is a promising protocol — it lets agents advertise capabilities and security metadata (an “Agent Card”), which helps multi-agent ecosystems talk to one another. The consensus: A2A is basically plumbing; it’s useful, but it doesn’t magically solve prompt engineering or optimization.
Maya: For people building agent systems: you can treat A2A as an interoperability layer. Use existing frameworks where possible — LangChain, the OpenAI agents python examples, LiteLLM for self-hosted abstraction, or Semantic Kernel if you need specific integrations. Anshul and others warned about the constant SDK churn across providers (OpenAI, Claude, etc.), so design adapter layers that map provider-specific syntax and structured output formats into a common schema.
Alex: Non-obvious takeaway: keep agents provider-agnostic internally. Define a small, tested adapter interface that converts your canonical agent messages into provider calls. Use something like pydantic-ai for structured outputs so downstream systems don’t have to parse free text. And if you need security or auditability, include an “agent card” or manifest — who can call what, what data the agent accesses, and how it logs activity.
Maya: Start small: compose a few micro-agents that do one job each, then orchestrate them with A2A or a simple orchestrator. That way you can swap providers without rewriting your business logic.
Alex: Related to agents and apps is prompt management and RAG. Varun asked: where do prompts live — Git, DB, or something else?
Maya: The group had practical suggestions. Puspesh uses versioned YAML in Git so non-devs can edit without accidental code changes. Amitav pointed out that prompts need context — tools the LLM can call, product context, and a way to measure the impact of prompt edits.
Alex: So for product teams: think of prompts as product configuration and documentation. Two practical patterns emerged: keep business documents and long-form context separate (and editable by non-tech people), and keep the canonical prompt files versioned in Git or a prompt store with access controls. Build a small internal UI that writes to Git if you want non-devs to edit safely.
Maya: Also, pair prompt changes with tests: regression checks on sample inputs and golden outputs. And on RAG — Google’s Gemini File Search Tool got a shoutout; Shan Shah said it’s easy and free for storage and great as a RAG quick start. But some skepticism remains about RAG itself: don't rely on retrieval as a crutch without quality control.
Alex: Best practices: 1) store prompts with metadata (model, tools available, expected outputs), 2) test changes automatically, 3) keep prompt docs that business folks can update, and 4) use managed RAG tools like Gemini’s File Search when you want fast iteration, but monitor hallucination risk.
Maya: A short but important strand: compute and infrastructure. Anubhav shared a great explainers about building datacenters, and there was the Microsoft AI Diffusion Report and a Hugging Face blog on shifting compute landscape. Plus, Google announced TPUs headed to space.
Alex: The headline is compute is strategic and ownership is changing. Private equity and big firms are investing in data center builders, which affects access to capacity. If you run models, plan for variability in pricing and consider multi-cloud, edge, or self-hosted options like LiteLLM or vLLM.
Maya: Non-obvious point: startups should consider model choice and localization early — smaller, efficient models running on nearby infrastructure often beat naive reliance on the largest clouds. Also watch the emerging tooling like the Muon optimizer added to PyTorch — optimizer improvements can give practical speedups without changing model size.
Alex: Quick callouts from the week: Moonshot’s Kimi K2 and its “thinking” style got praise, Terence Tao’s experiments with AlphaEvolve point to scaled math exploration, toon-format JSON can save 30–50% tokens, Andon Labs released the butter-bench for robotics LLM evaluation, and SGLang Diffusion added native inference support for diffusion models.
Maya: Lot of great stuff to follow up on. If you like quick wins: try toon-format for compact structured prompts, read HuggingFace’s shifting compute blog for strategy, and test Muon optimizer if you’re doing PyTorch training.
Alex: Okay — listener tips time. I’ll go first. Tip: if you’re experimenting with LLM-generated code or UI, immediately add an artifact step that emits plain code files and a snapshot of the UI or API contract. Put those artifacts in version control or an artifact store, run automated tests, and add a fallback queue for human review. Maya, how would you apply that?
Maya: I’d apply it to any prototype we spin up. Before shipping an LLM-driven feature, I’d require a “stabilize” PR that contains the generated files and tests. On the product side, I’d make sure there’s a monitoring dashboard that shows when generated outputs change and route anomalies to a human inbox.
Alex: Great. My second quick tip: if you’re trying voice AI for calls, don’t buy by minutes — define a pilot tied to conversion metrics. Run a three-way A/B: human, AI, and hybrid. Measure cost per conversion, not just cost per minute.
Maya: I’ll add a tip of my own. For prompt management: treat prompts like product config. Store them in Git with versioned YAML, but expose a lightweight web UI for business folks to edit the human-facing docs that prompts reference. Pair every prompt change with a unit test and an experiment that measures output drift. Alex, how would you apply that?
Alex: I’d make the UI write to a branch, run the tests automatically, run a small canary against production traffic, and if it passes, merge. That keeps non-dev edits safe while preserving audit trails.
Maya: That’s it from us for this week. Lots more to dive into, but we’ll keep watching nokode, A2A adoption, voice AI economics, and the compute landscape.
Alex: Thanks to everyone in the group who started these threads — Paras Chopra, Ankur Pandey, Abhiram R, nilesh, Ashwin Ramaswamy, Mayank Shekhar, G Kuppuram, Nipun, Varun Jain and many others. We’ll be back next week.
Maya: Bye for now — take a small experiment from the show and try it this week. See you next time!
Alex: Goodbye from Alex.
Maya: Goodbye from Maya.