Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The One and a Half Gemini, published by Zvi on February 22, 2024 on LessWrong.
Previously: I hit send on The Third Gemini, and within half an hour DeepMind announced Gemini 1.5.
So this covers Gemini 1.5. One million tokens, and we are promised overall Gemini Advanced or GPT-4 levels of performance on Gemini Pro levels of compute.
This post does not cover the issues with Gemini's image generation, and what it is and is not willing to generate. I am on top of that situation and will get to it soon.
One Million Tokens
Our teams continue pushing the frontiers of our latest models with safety at the core. They are making rapid progress. In fact, we're ready to introduce the next generation: Gemini 1.5. It shows dramatic improvements across a number of dimensions and 1.5 Pro achieves comparable quality to 1.0 Ultra, while using less compute.
It is truly bizarre to launch Gemini Advanced as a paid service, and then about a week later announce the new Gemini Pro 1.5 is now about as good as Gemini Advanced. Yes, actually, I do feel the acceleration, hot damn.
And that's not all!
This new generation also delivers a breakthrough in long-context understanding. We've been able to significantly increase the amount of information our models can process - running up to 1 million tokens consistently, achieving the longest context window of any large-scale foundation model yet.
One million is a lot of tokens. That covers every individual document I have ever asked an LLM to examine. That is enough to cover my entire set of AI columns for the entire year, in case I ever need to look something up, presumably Google's NotebookLM is The Way to do that.
A potential future 10 million would be even more.
Soon Gemini will be able to watch a one hour video or read 700k words, whereas right now if I use the web interface of Gemini Advanced interface all I can upload is a photo.
The standard will be to give people 128k tokens to start, then you can pay for more than that. A million tokens is not cheap inference, even for Google.
Oriol Vinyals (VP of R&D DeepMind): Gemini 1.5 has arrived. Pro 1.5 with 1M tokens available as an experimental feature via AI Studio and Vertex AI in private preview.
Then there's this: In our research, we tested Gemini 1.5 on up to 2M tokens for audio, 2.8M tokens for video, and
10M
tokens for text. From Shannon's 1950s bi-gram models (2 tokens), and after being mesmerized by LSTMs many years ago able to model 200 tokens, it feels almost impossible that I would be talking about hundreds of thousands of tokens in context length, let alone millions.
Jeff Dean (Chief Scientist, Google DeepMind): Multineedle in haystack test: We also created a generalized version of the needle in a haystack test, where the model must retrieve 100 different needles hidden in the context window. For this, we see that Gemini 1.5 Pro's performance is above that of GPT-4 Turbo at small context lengths and remains relatively steady across the entire 1M context window, while the GPT-4 Turbo model drops off more quickly (and cannot go past 128k tokens).
Guido Appenzeller (responding to similar post): Is this really done with a monolithic model? For a 10M token window, input state would be many Gigabytes. Seems crazy expensive to run on today's hardware.
Sholto Douglas (DeepMind): It would honestly have been difficult to do at decent latency without TPUs (and their interconnect) They're an under appreciated but critical piece of this story
Here are their head-to-head results with themselves:
Here is the technical report. There is no need to read it, all of this is straightforward. Their safety section says 'we followed our procedure' and offers no additional details on methodology. On safety performance, their tests did not seem to offer much insight, scores were similar to Gemini Pro 1.0.
Mixture...