April 20, 2025

Week of 2025-04-20

9 minutes

Alex: Hello and welcome to The Generative AI Group Digest for the week of 20 Apr 2025!

Maya: We're Alex and Maya.

Alex: First up, we’re talking about integrating MCP tools with Dify. Aankit Roy asked if anyone has done that, and Shobhitic shared a blog on turning a Dify app into an MCP server. But Aankit wanted to add an MCP tool as one of the agent tools, which got some interesting responses.

Maya: What exactly is an MCP tool again, Alex? Why would someone want to integrate it with Dify?

Alex: MCP stands for "Modular Control Protocol" — basically, it helps agents manage external tools or workflows in a modular way. Integrating it with Dify, which is a platform for AI agents, means the agent can use multiple custom tools smoothly. Kuldeep Pisda suggested custom tools are the better approach for production, wrapping APIs like RapidAPI and scrapers as custom tools.

Maya: So does that mean there isn’t an off-the-shelf MCP tool plug-and-play for Dify yet?

Alex: Exactly. Some have deployed MCP in production for non-critical tasks, but authentication issues remain a challenge. Shobhitic noted auth workflows needing improvement because clients don’t support the new auth spec yet. Rohit even built a custom Kubernetes MCP server for daily tasks.

Maya: Sounds like building custom tools within Dify is currently the more practical route for critical production use.

Alex: Definitely. If you’re building agents today, wrapping APIs as custom tools in Dify is a proven method. That was a great dive into agent tool integrations!

Maya: Next, let’s move on to AI adoption for coding in non-Python or JavaScript languages. Alok Bishoyi asked about languages like Go or Rust where underlying models might have less data.

Alex: Yes, and quite a few chimed in. Abhishek and Rajesh confirmed Rust works surprisingly well with Claude, while Shan Shah vouched for Swift and Kishore for C/C++ on Linux. Sandeep Srinivasa recommended using FFI modules—basically small bridges—to integrate Go, Rust, or C++ code into Python, combining strengths.

Maya: That makes sense—leveraging Python’s ecosystem while retaining performance or existing codebases in other languages.

Alex: Exactly, Bharath and others agreed it’s smarter to use FFI integration rather than abandoning Python wholesale. So if your project uses those less common languages, consider wrappers to tap Python AI tools without sacrificing your codebase.

Maya: Next, Akshat Khare asked about hosting virtual browsers for robotic process automation—paying a lot on Windows VMs in GCP. What are the cost-effective cloud options?

Alex: Akshat wanted something like BrowserBase with Cloudflare and antibot detection. Right now, he’s using local Chrome logged in over Windows VM, breaking the bank. Unfortunately, no clear best answer popped up, but options like serverless cloud playwright or specialized RPA services could help. Also, Rohit built a custom Kubernetes MCP server, which might scale better.

Maya: Browser automation at scale definitely needs smarter infrastructure choices to control costs while maintaining reliability.

Alex: Moving on—Navanit asked about open source MLOps or LLM ops tools, besides MLflow. Pratik suggested Comet ML, Ashish said he’s used Comet’s Opik in production with good support and cost. So Comet seems a solid alternative or complement to MLflow.

Maya: Having choices like MLflow and Comet helps teams find the best fit for tracking model training, deployment, and governance.

Alex: Now, an interesting conversation around Claude Code and similar tools. Sud asked about where Claude Code's real source code is, as the GitHub repo is a dummy. Palash explained it’s not open source, but alternatives like OpenAI Codex and Anon Kode exist. Abhinav Lal even pointed out an Indian startup’s similar tool—Forge.

Maya: It shows while Claude Code isn’t open source, there’s a healthy ecosystem of agent loops that plan, code, and modify existing codebases directly from the terminal.

Alex: Exactly. It’s a hot area for developers wanting to automate coding workflows.

Maya: Okay, here’s a great tip inspired by the discussion on using agents to filter job applicants. Ashish Dogra shared automating resume parsing using form data linked to Google Sheets and scoring via Deepseek R1 running on serverless API.

Maya: So here’s a pro tip you can try today: automate processing Google Form responses by linking it to a sheet that triggers AI scoring, saving tons of manual sorting.

Maya: Alex, how would you use that approach in your projects?

Alex: I’d definitely build a lightweight pipeline combining Google Forms, Sheets, and serverless AI scoring. It’s easy to scale, low cost, and integrates well with existing Google Workspace tools. Perfect for quick automation without heavy engineering.

Maya: Next up, let’s talk about improving speech synthesis with emotion, especially for Indian and European languages. Ashish Dogra asked about the best high-quality speech synthesis models. Suggestions included smallest AI voice clones for emotion mimicry, ElevenLabs for Hindi/English, and Hume for European languages.

Alex: Right, but emotion generation in Indian languages is still a tough problem. ElevenLabs is reasonable in Hindi but better in English. AudioPod AI offers TTS in multiple Indian languages with special pre-processing to handle numbers and symbols well.

Maya: So if you want natural, emotional speech in Indian languages, trying several providers and checking pre-processing steps is key.

Alex: Also, some Chinese voice cloning models reportedly perform well for Indic languages. So exploring those could unlock new possibilities.

Maya: Next, let’s jump into chunking PDF documents for retrieval-augmented generation (RAG). Akshat asked for the best tools to chunk PDFs with tables, graphs, images.

Alex: Nirant suggested parsing tables with MarkItDown from Microsoft, or paid tools like Azure Document Intelligence, Gemini flash for low-cost structured extraction, and Mistral OCR for markdown output with image support. Pymupdf is popular for extraction, combined with Chonkie for chunking.

Maya: And some folks have creative workflows like converting PDFs to HTML and parsing further. Though krypticmouse advised, “Don’t try this.”

Alex: The key takeaway: combining lightweight open source extraction with specialized parsers balances accuracy, speed, and cost.

Maya: Moving on to coding agents and productivity. Marmik Pandya shared data showing only about 10% of lines of code in PRs were written by cursor agents at his company. Palash and others discussed trust issues because agents can miss nuance in big codebases.

Alex: Kuldeep said cursor often writes SDE1-level code unless you prompt it carefully. Shan noted it’s useful for new code or small bug fixes but struggles with big legacy code. Srihari added it frees devs to focus on planning and design, improving overall outcomes.

Maya: Sounds like coding agents boost efficiency mostly on boilerplate or isolated tasks, not yet ready to replace experienced devs on complex code.

Alex: Adoption and trust remain big hurdles in enterprise. So it’s crucial to set good test coverage and carefully integrate AI assistance.

Maya: Last topic, Hadi Khan shared insights on Google Search’s strong growth despite AI chat competition. They discussed cost-per-click rising steadily, driven by Google’s auction optimizations and ad policies. So even with AI web search and chatbots, traditional search ads still thrive.

Alex: Yeah, Google’s been able to increase ad prices and maintain volume. New AI tools challenge search UX but monetization remains strong. Interesting dynamics to watch here.

Maya: That covers the big topics from this week!

Maya: Here’s a quick listener tip: When working on multi-turn conversations with LLMs, to keep the chat going smoothly for a set time, add summary prompts near the end to encourage a neat wrap-up. It makes interactions feel more natural.

Maya: Alex, how would you apply this in your chatbots or voice assistants?

Alex: I’d definitely build a timer that triggers a system prompt with chat history summary to gracefully close the conversation, improving user experience and reducing abrupt endings.

Alex: For wrap-up, I’ll say: Remember, custom-built AI tools for your specific workflows often outperform general-purpose solutions, especially in agent integrations.

Maya: Don’t forget to keep testing and iterating your AI helpers in real-world tasks—their value grows as you teach and guide them.

Maya: That’s all for this week’s digest.

Alex: See you next time!

...more

View all episodes