ThursdAI - The top AI news from the past week

🌉 ThursdAI Dec 7th - Gemini is out-ish, Grok is out, OSS AI Event in SF, Waymo rides, and more AI news from the past week 👏


Listen Later

ThursdAI December 7th TL;DR

Greetins of the day everyone (as our panelist Akshay likes to sometimes say) and Happy first candle of Hannukah for those who celebrate! 🕎

I'm writing this newsletter from the back of an Waymo self driving car, in SF, as I'm here for just a few nights (again) to participate in the Open Source AI meetup, that was co-organized by Ollama and Nous Research, Alignment Labs and hosted by A16Z in their SF office.

This event was the highlight of this trip, it was quite a packed meetup in terms of AI talent, and I got to meet quite a few ThursdAI listeners, mutuals on X, and AI celebs

We also recorded the podcast this week from the arena, thanks to Swyx and Alessio from latentspace pod for hosting ThursdAI this week form their newly built out pod studio (and apologies everyone for the rocky start and the cutting out issues, luckily we had local recordings so the pod version sounds good!)

Google finally teases Gemini Ultra (and gives us Pro)

What a week folks, what a week, as I was boarding the flight to SF to meet with Open Source folks, Google announced (finally!) the release of Gemini, their long rumored, highly performant model with a LOT of fanfare!

Blogposts authored by Sundar and Demis Hassabis, beautiful demos of unseen before capabilities, comparisons to GPT-4V which the Ultra version of Gemini outperforms on several benchmarks, and rumors that Sergey Brin, the guy who's net worth is north of 100Bn is listed as the core contributor on the paper and reports on benchmarks (somewhat skewed) show Ultra beaing GPT-4 on many coding and reasoning evaluations!

We've been waiting for Gemini for such a long time, that we spend the first hour of the podcast discussing it and it's implications basically. We were also fairly disillusioned by the sleight of hand tricks Google marketing department played with the initial launch video, where it purportedly shows Gemini being a fully multi-modal AI, that reacts to a camera feed + user voice in real time, when in fact, it was quickly clear (from their developer blog) that it was not video+audio but rather images+text (the same two modalities we already have in GPT-4V and given some prompting, it's quite easy to replicate most of it. We've also discussed how we again, got a tease, and not even a waitlist for the "super cool" stuff, while getting a GPT3.5 level of a model today in Bard upgrade.

To me, the most mind-blowing demo video was actually one of the other ones in the announcement, which showed that Gemini has agentic behavior in understanding user intent, asks for clarifications, creates a PRD (Product Requirement Document) for itself, and then, generates Flutter code to create a UI on the fly, based on what the use asked it! This is pretty wild, as we all should expect that Just In Time UI will come to many of these big models!

Tune in to the episode if you want to hear more takes, opinions and frustrations as none of us actually got to use Gemini Ultra, and the experience with Gemini Pro (which is now live on Bard) was at least for me, underwhelming

This weeks buzz (What I learned in Weights & Biases this week)

I actually had a blast talking about W&B to many of the open source and fine-tuners community this and past week. I already learned that W&B doesn't only help huge companies (like OpenAI, Anthropic, Meta, Mistral and tons more) to train their foundational models, but is widely used by the open source fine-tuners community as well. I've met with folks like Wing Lian (aka Caseus), maintainer of Axolotl, who uses W&B together with Axolotl, and got to geek out about W&B, met with Teknium and LDJ (Nous Research, Alignment Labs) and in fact, got LDJ to walk me through some of the ways he uses and used W&B in the past, including how it's used to track model runs, show artifacts in the middle of runs, and run mini-benchmarks and evaluations for LLMS as they finetune.

If you're interested in this, here's an episode of a new “series” of me learning publicly (from scratch) so if you want to learn from scratch with me, welcome to check it out:

Open Source AI in SF meetup

This meetup was the reason I flew in to SF, I was invited by dear friends in the open source community, and couldn't miss it! There was such a talent density there, it was quite remarkable. Andrej Karpathy who's video about LLM I just finished re-watching, Jeremy Howard, folks from Mistral, A16Z, and tons of other startups, open source collectives, and enthusiasts, all came together to listen to a few lightning talks, but mostly to mingle and connect and share ideas.

Nous Research announced that they are a company (not anymore just a discord collective of rag tag open sourcers!) and that they are working on Forge, a product offering of theirs, that runs local AI, has a platform for agent behavior, and is very interesting to keep an eye for.

I've spent most of my time going around, hearing what folks are using (Hint: a LOT of axolotl), what they are finetuning (mostly Mistral) and what is the future (everyone's waiting for next Llama or next Mistral). Funnily enough, there was not a LOT of conversation about Gemini there at all, at least not among the folks that I talked to!

Overall this was really really fun, and of course, being in SF, at least for me, especially now as an AI Evangelist, feels like coming home! So expect more trip reports!

Here's a recap and a few more things that happened this week in AI:

* Open Source LLMs

* Apple released MLX - machine learning framework on apple silicon

* Mamba - transformers alternative architecture from Tri Dao

* Big CO LLMs + APIs

* Google Gemini beats GPT-4V on a BUNCH of metrics, shows cool fake multimodal demo

* Demo was embellished per the google developer blog

* Multimodal capabilities are real

* Dense model vs MOE

* Multimodel on the output as well

* For 5-shot, GPT-4 outperforms Gemini Ultra on MMLU

* AlphaCode 2 is here and Google claims it performs better than 85% competitive programmers in the world and it performs even better, collaborating with a competitive programmer.

* Long context prompting for Claude 2 shows 27% - 98% increase by using prompt techniques

* X.ai finally released grok to many premium+ X subscribers. (link)

* Vision

* OpenHermes Vision finally released - something there was not right there, back to drawing board

* Voice

* Apparently Gemini beats Whisper v3! As part of a unified model no less

* AI Art & Diffusion

* Meta - releases a standalong EMU AI art generator websites https://imagine.meta.com

* Tools

* Jetbrains finally releases their own AI native companion + subscription

That's it for me this week, this Waymo ride took extra long as it seems that in SF, during night rush hour, AI is at a disadvatage against human drivers. Maybe I'll take an Uber next time.

P.S - here’s Grok roasting ThursdAI

See you next week, and if you've scrolled all the way here for the emoji of the week, it's hidden in the middle of the article, send me that to let me know you read through 😉



This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
...more
View all episodesView all episodes
Download on the App Store

ThursdAI - The top AI news from the past weekBy From Weights & Biases, Join AI Evangelist Alex Volkov and a panel of experts to cover everything important that happened in the world of AI from the past week

  • 4.9
  • 4.9
  • 4.9
  • 4.9
  • 4.9

4.9

12 ratings


More shows like ThursdAI - The top AI news from the past week

View all
a16z Podcast by Andreessen Horowitz

a16z Podcast

1,008 Listeners

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch by Harry Stebbings

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

514 Listeners

Practical AI by Practical AI LLC

Practical AI

188 Listeners

Last Week in AI by Skynet Today

Last Week in AI

280 Listeners

Machine Learning Street Talk (MLST) by Machine Learning Street Talk (MLST)

Machine Learning Street Talk (MLST)

90 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

356 Listeners

Sharp Tech with Ben Thompson by Ben Thompson

Sharp Tech with Ben Thompson

90 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

128 Listeners

This Day in AI Podcast by Michael Sharkey, Chris Sharkey

This Day in AI Podcast

196 Listeners

Latent Space: The AI Engineer Podcast by swyx + Alessio

Latent Space: The AI Engineer Podcast

72 Listeners

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis by Nathaniel Whittemore

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis

430 Listeners

AI For Humans: Making Artificial Intelligence Fun & Practical by Kevin Pereira & Gavin Purcell

AI For Humans: Making Artificial Intelligence Fun & Practical

235 Listeners

BG2Pod with Brad Gerstner and Bill Gurley by BG2Pod

BG2Pod with Brad Gerstner and Bill Gurley

439 Listeners

AI + a16z by a16z

AI + a16z

32 Listeners

Training Data by Sequoia Capital

Training Data

37 Listeners