December 08, 2024

Week of 2024-12-08

6 minutes

Alex: Hello and welcome to The Generative AI Group Digest for the week of 08 Dec 2024!

Maya: We're Alex and Maya.

Alex: First up, we’re talking about a great question from Sid about how to get large language models, or LLMs, to generate diffs—like changes—based on instructions applied to an existing object. This could be code, a document, or even a markup language.

Maya: Oh, like when you want the AI to tell you exactly what changed between two files, not just rewrite everything?

Alex: Exactly! Sid wanted to know how people are tackling this. Nirant K pointed us towards "fill in the middle" tasks, which are AI training methods where the model learns to generate missing parts of text or code. He suggested that right now you might have to tweak an open-source model or use Mistral models for this.

Maya: So, is this something that's commonly supported out of the box, or are we in experimental territory?

Alex: Mostly experimental. Abhinav mentioned Codestral has a model for this, which shows some progress. The key takeaway is that generating diffs precisely is tricky, but training models specifically on "fill in the middle" tasks is the way forward. Using these tasks lets the model focus on changes rather than rewriting content entirely.

Maya: That’s neat! Plus, finding models trained for these tasks could save developers lots of time. Next, let’s move on to the buzz around the new Llama4 training and Meta’s GPU advantage.

Alex: Yup! Pratik Desai shared that Llama4 is already in training, and Meta’s setup with over 16,000 H100 GPUs gives them a huge computing edge for better language and coding support.

Maya: Wait, why does Meta have so many GPUs but not OpenAI or Anthropic? Is it just money?

Alex: According to Abhiram and Bharat, Meta’s massive GPU fleet was originally to support their ad algorithm shift after TikTok’s rise, not only to train models. So it’s tied to their big ad engine called Andromeda, which personalizes ads faster by processing millions in real time.

Maya: Interesting! So GPU power isn’t only for AI but also for data-heavy ad tech. That’s a nice angle.

Alex: Absolutely, and these investments create leverage for Meta’s AI work. Speaking of routing AI requests, Bharat asked about routing in Llama workflows. Ravi Theja explained it works with any function-calling enabled LLM but defaults to OpenAI’s GPT-4.

Maya: So it’s flexible and can switch between different AI engines depending on needs. Cool!

Alex: Exactly. Moving on, Anubhav and Nirant discussed the mysterious “o1” series models. Anubhav wondered what changed between o1 preview, o1, and o1 pro versions. Nirant said these likely evolved from Codex to GPT-3.5, then GPT-4, possibly the vision-capable GPT-4o.

Maya: Are these models bigger or just smarter?

Alex: Paras suggested o1 models are larger. Ojasvi pointed out that "o1 style" models tend to be slower in inference, making them less suitable when low latency is needed, but they focus on quality over speed.

Maya: So it’s a tradeoff—more powerful but slower, like choosing between a sports car and a comfortable cruiser.

Alex: Exactly. On a related note, Aashay brought up reward modeling challenges in multi-turn dialogue, asking about how DPO (Direct Preference Optimization) might work. Paras chimed in on techniques like masking unwanted rewards but noted that giving proper rewards at each conversation step is tricky.

Maya: That sounds complicated! Rewarding partial progress step-by-step requires smart design.

Alex: For sure. Switching gears, we had a cool question from Luv Singh about analyzing short TV reels with features like characters and emotions to find what boosts user engagement.

Maya: Sounds like a fun data project! What methods did folks suggest?

Alex: Not many direct answers, but this is a classic use case for time series analysis and correlation tools. You could also leverage vector search on feature embeddings per episode to detect patterns.

Maya: That’s a smart idea—embedding features lets you compare episodes in a meaningful way.

Alex: Exactly! Speaking of embeddings, ~Ishita suggested breaking down data into vector chunks, then using vector search plus reranking to handle diff or document chunk queries—relevant to Sid’s diff discussion.

Maya: That’s a nice connection! Also, on the speech frontier, Sreeraag noted that indic-parler-tts struggles reading numbers, but Aashay recommended Sarvam-TTS which handles digits well.

Alex: Great practical tip there for anyone working on Indian language TTS systems!

Maya: Finally, Kashyap shared results from the 2024 ARC challenge, a tough AI benchmark around general intelligence, where state of the art jumped from 33% to 55%, but the $600K prize still went unclaimed.

Alex: Paras wasn’t a participant this year but expects Francois Chollet, the founder, to join next year. This shows progress but also the huge challenge ahead in building truly general AI.

Maya: Inspiring stuff! Now, here’s a pro tip you can try today: If you're working with large documents or code, break them into smaller chunks and index them as vectors using tools like LlamaIndex or Pinecone. This makes it easier to do precise searches or diffs.

Alex, how would you use that?

Alex: I’d combine chunked vectors with a function-calling LLM like OpenAI’s GPT-4 to generate smart diffs or summaries on specific parts. It’s a powerful way to handle big content without overwhelming the model.

Maya: Great approach! To wrap up…

Alex: Remember, training LLMs on targeted tasks like “fill in the middle” can unlock smarter, more precise generation—key for diffs and edits.

Maya: And don’t forget, massive GPU power isn’t just for AI—it also fuels next-gen ad engines, which indirectly boost AI progress.

Maya: That’s all for this week’s digest.

Alex: See you next time!

...more

View all episodes