January 26, 2025

Week of 2025-01-26

6 minutes

Alex: Hello and welcome to The Generative AI Group Digest for the week of 26 Jan 2025!

Maya: We're Alex and Maya.

Alex: First up, we’re talking about that wild news on Perplexity AI’s bid to merge with TikTok US.

Maya: Wait, Perplexity and TikTok? What would that even look like?

Alex: Right? Rohan Saxena shared the headline wondering about the timeline and user impact.

Maya: And some say it’s a bad idea, right? Rajesh RS worried about disinformation spreading.

Alex: Exactly. Rajesh feels it’s better to let TikTok US fade or merge with platforms like X, where Musk might block propaganda.

Maya: Plus, Bharat mentioned Musk and Zuckerberg probably won’t allow the deal; zero political capital there.

Alex: Good point. Sathvik asked why TikTok would sell anyway, since their algorithm’s their real crown jewel.

Maya: So, this merger raises questions about data ethics and control over AI-driven platforms. Quite a saga.

Alex: Next, let’s move on to vision models handling multiple images during inference.

Maya: Handling more than one image at a time? That sounds like a tricky resource challenge.

Alex: It is. Luci from the group needed to process multiple images with context. Pulkit Gupta confirmed Google’s Gemini supports multi-image inputs, which helped a lot.

Maya: But it came with out-of-memory errors on other models like Qwen 2.0, right?

Alex: Yes, and they tweaked pixel input sizes and even shifted to Gemini's free tier for testing. So model choice and memory management are key.

Maya: Important takeaway for anyone working with multimodal models: check model docs and manage input sizes carefully!

Alex: Up next, OpenAI’s launch of task automation with scheduled tasks in ChatGPT. Abhijeet shared that helpful article.

Maya: Automating repetitive tasks in ChatGPT? Sounds like a productivity boost.

Alex: Definitely. Users can now schedule and automate workflows, which means less manual hassle and more consistent outcomes.

Maya: Okay, let’s jump to DeepSeek’s big release—R1 model and its impact.

Alex: DeepSeek’s R1 model is impressing, especially at reasoning tasks. Paras Chopra called their paper fascinating, and community members are excited about its distillation efficiency.

Maya: But some concerns popped up about data contamination and privacy because their API collects everything.

Alex: True, there’s speculation around that. Tokenbender noted it’s MIT licensed, meaning you can self-host if privacy is a concern.

Maya: Interesting contrast with the OpenAI models, which focus on unique eval datasets as a competitive moat.

Alex: Speaking of models and evals, Paras Chopra and others discussed how frontier math benchmarks need skepticism since training and validation share distributions.

Maya: So out-of-distribution generalization still remains a big open challenge.

Alex: Exactly. Paras mentioned deep networks act as function approximators — they excel in patterns they’ve seen but might struggle with novel cases without breakthroughs.

Maya: Next, the huge AI infrastructure buildout dubbed the “Stargate Project” caught a lot of attention.

Alex: That’s right. OpenAI and partners like SoftBank and Masa are investing $500 billion to build huge datacenters with new chips and energy setups.

Maya: That dwarfs historical tech projects! Some see it as a strategic move with deep government and political ties.

Alex: Quite a leap for domestic and global AI leadership, and indications show this is mostly private investment with complex partnerships.

Maya: Meanwhile, Gemini 2.0 is getting updates, as Bharat Shetty shared some performance improvements on math and multimodal reasoning benchmarks.

Alex: Yes, with accuracies around 73 to 75 percent on key tests. This shows steady progress in reasoning capabilities.

Maya: On tooling, Rajesh RS asked about low-code platforms for building AI agents for non-technical folks.

Alex: Bharat Shetty recommended Langflow for integrations but noted it needs some AI-fluency to guide the build. Others liked Dify as self-hosted and user-friendly.

Maya: So non-tech teams have options but may still need AI-savvy support.

Alex: Next, let’s talk about log probabilities from GPT models for classification by Shivansh and Rishabh.

Maya: Using logprobs to estimate class probabilities sounds like a neat way to get model confidence.

Alex: Rishabh confirmed gpt-4o logs are reliable, and users can get calibrated probabilities. He even shared a blog post from last year explaining this.

Maya: Great practical tip for folks fine-tuning models for classification tasks.

Alex: Now a big congrats shout out to Paras Chopra and team on their $200 million exit with Wingify!

Maya: That’s awesome! From bootstrapped origins to a major acquisition — impressive and inspiring for the ecosystem.

Alex: Folks celebrated it all over the group with lots of kudos and reflections on the Indian tech scene.

Maya: Lastly, prompt engineering and transitioning from OpenAI’s 4o-mini to Gemini 2.0 flash came up.

Alex: Vrushank from Portkey shared that Gemini tends to “overthink” more, so simple porting of prompts may not work well. Better to test and tune gradually.

Maya: Plus, Paras Chopra suggested that model makers could do a better job creating prompt guidance documents and interactive tools.

Alex: Absolutely. Users want transparency, not magic, in why some prompts work better.

Maya: And with that, here’s a pro tip you can try today: Take advantage of model-specific prompt guides or create simple evals when switching LLMs.

Maya: Alex, how would you use that?

Alex: I’d start small with a few key prompts, run test queries on both old and new models, and tweak prompts based on output quality differences before scaling.

Maya: That’s smart. Helps avoid surprises and wasted compute.

Alex: Remember, AI is progressing fast but thoughtful tuning keeps your projects on track.

Maya: Don’t forget, community collaboration and sharing resources make breakthroughs possible.

Maya: That’s all for this week’s digest.

Alex: See you next time!

...more

View all episodes