September 07, 2025

Week of 2025-09-07

6 minutes

Alex: Hello and welcome to The Generative AI Group Digest for the week of 07 Sep 2025!

Maya: We're Alex and Maya.

Alex: First up, we’re talking about the best ways to handle highly structured documents.

Maya: Hey Alex, do you think rules-based methods can still win against fancy AI for this?

Alex: Good question! Pulkit Gupta mentioned that with specific formats, simple preprocessing plus rules and regex gives high reliability without model finetuning.

Maya: So it’s like a practical shortcut instead of training complex models?

Alex: Exactly! Pulkit said, “basic preprocessing and extracting data followed by simple rules-based logic where you check for specific entities within detected headers and regex should help.”

Maya: That means less need for annotated data or extra training—great for many business cases.

Alex: Plus, it keeps things deterministic and transparent. Very helpful if you want predictable outcomes.

Maya: Next, let’s move on to how to handle neural image similarity engines.

Alex: Luci from Manish raised an interesting question: should you just use image embeddings to measure similarity or explain contrastive learning during interviews?

Maya: Hmm, Alex, what’s contrastive learning again?

Alex: It’s a training method that teaches models to bring similar items closer in embedding space and push dissimilar ones apart—critical for models like CLIP.

Maya: Right, but Mohsin pointed out that for images like “dog running in park”, image embeddings alone don’t capture context well. Text descriptions can help.

Alex: Exactly! Nirant K even suggested combining text and image embeddings to improve similarity search.

Maya: And Nirant said, “Contrastive loss e.g. SigLip style is usually considered necessary for this use case,” especially in interviews.

Alex: So the takeaway? It’s good to know the theory behind contrastive learning but also practical to mention using pre-trained models and embedding concatenation.

Maya: Next up—watermarks in AI-generated images and whether autoencoders can remove them.

Alex: Yash wondered if training an autoencoder on images with watermarks might erase or corrupt them.

Maya: But Swapnil said not exactly autoencoder, and Shan mentioned there are tools like Dewatermark.AI and WatermarkRemover.io already out there.

Alex: So watermark removal is an active space now, but requires specialized approaches beyond basic autoencoders.

Maya: Moving along, let’s dive into recent DeepMind research on embedding-based retrieval.

Alex: SP shared a paper highlighting limitations of embedding models—some document combinations can't be retrieved no matter the query.

Maya: Tanisha noted cross-encoders and multi-vector models can help but with tradeoffs.

Alex: Bharat Shetty stressed hybrid systems—using BM25 as a classic baseline combined with embedding methods—are essential.

Maya: Remind me, Alex, what's BM25?

Alex: It’s a fundamental information retrieval scoring algorithm that balances word frequency, inverse document frequency, and document length to rank documents efficiently.

Maya: So you shouldn’t just trust embeddings blindly; start with BM25 baseline and then improve.

Alex: Right. Bharat says many skip this step and jump straight into fancy semantic models.

Maya: Next, let’s talk keyword extractors.

Alex: Yash was looking for ultra-lightweight, unsupervised keyword extraction from paper abstracts.

Maya: Bharat suggested langextract.io, while Amit recommended KeyBERT which uses sentence embeddings and can run at scale.

Alex: KeyBERT also supports a KeyLLM class that uses LLM prompts, plus spaCy-based rule extractors—nice mix of approaches.

Maya: So mix and match to see what works best for your dataset.

Alex: Now, on to Microsoft’s vibevoice—a new large TTS model demonstrated by Mohamed Yasser.

Maya: Alex, what’s vibevoice?

Alex: It’s a text-to-speech model that can generate multi-speaker podcasts quickly. Mohamed shared demos with his own voice cloned.

Maya: Cool that it supports English, Chinese, and even Hindi beyond official claims.

Alex: And it’s available open-source on HuggingFace, letting people experiment easily.

Maya: Up next—the idea of community meetups to discuss AI papers and ideas.

Alex: Paras Chopra proposed coffee clubs in Bengaluru with inclusion criteria for deeper technical discussion.

Maya: I love that! Alex, do you think physical clubs still matter in our digital age?

Alex: Absolutely. Paras said good collaborations need friendships built by spending time together. Sync and async both matter.

Maya: Plus some want this in Delhi, Mumbai, Dubai too. The vibe for face-to-face is definitely back.

Alex: Switching gears, let’s mention Samvaad—a WhatsApp-first multi-modal LLM by Sarvam.

Maya: Nirant K praised it as a killer deployment, though it’s invite-only for now.

Alex: Exciting example of LLM integration with social apps.

Maya: Then, Tanisha introduced EmbeddingGemma and Matryoshka Representation Learning.

Alex: Right, embeddings nested like Russian dolls—flexible dimensions used depending on speed or accuracy needs.

Maya: Chaitanya shared training notebooks with MatryoshkaLoss. A fresh way to customize embeddings for retrieval.

Alex: Worth experimenting if you want smart memory vs speed trade-offs.

Maya: Lastly, Hadi Khan shared a neat example showing Perplexity AI outperforming others in answering video lecture questions with timestamps.

Alex: That was impressive! Perplexity pulled raw transcripts and gave cross-references, unlike ChatGPT, Claude, or Gemini.

Maya: I guess different retrieval and search methods still matter.

Alex: Here’s your listener tip! Maya?

Maya: Here’s a pro tip you can try today: when building embedding-based search, always start with a BM25 baseline to see if semantic retrieval adds value. Alex, how would you use that in your projects?

Alex: I’d first run BM25 to catch easy matches. Then, overlay embeddings to improve recall and semantic matches for complex queries. Saves compute and improves results.

Maya: Perfect! To wrap up:

Alex: Remember, simple rules and classic methods like regex and BM25 still play an important role alongside newer AI.

Maya: Don’t forget, combining models and modalities—like text plus image embeddings—or nesting embeddings can yield better performance with efficient tradeoffs.

Maya: That’s all for this week’s digest.

Alex: See you next time!

...more

View all episodes