Gemini is a family of multimodal large language models developed by Google DeepMind, designed to be a successor to LaMDA and PaLM 2. It was announced on 10 May 2023 and launched on 6 December 2023. Key aspects of Gemini include:
- Multimodality: Gemini is designed to process various data types simultaneously, including text, images, audio, video, and computer code. It can handle these different modes in any order, allowing for flexible input and output.
- Model Variants: The first generation of Gemini ("Gemini 1") includes three models: Gemini Ultra (for complex tasks), Gemini Pro (for a wide range of tasks), and Gemini Nano (for on-device tasks). The second generation ("Gemini 1.5") includes Gemini 1.5 Pro and Gemini 1.5 Flash.
- Performance: Gemini Ultra has been reported to outperform other models, including GPT-4, on various industry benchmarks and was the first language model to surpass human experts on the Massive Multitask Language Understanding (MMLU) test. Gemini Pro has been said to outperform GPT-3.5. Gemini 2.0 Flash has outperformed 1.5 Pro on key benchmarks, while also being faster.
- Updates and Evolution:
- Gemma, a family of free and open-source LLMs serving as a lightweight version of Gemini, was released in February 2024.
- Gemini 2.0 Flash Experimental was released in December 2024, with improved speed and performance.
- Gemini 2.0 Flash became the default model in January 2025, with Gemini 1.5 Flash still available.
- Technical Details:
- Agentic Capabilities:
- Training Data: Gemini's dataset is multimodal and multilingual, including web documents, books, code, images, audio, and video data. It was also trained using transcripts of YouTube videos, with lawyers filtering out potentially copyrighted material.
- Reception: Gemini's launch was highly anticipated, with some predicting it would surpass GPT-4. However, some experts have cautioned about interpreting benchmark scores without full insight into the training data. Google has also faced criticism for a demonstrative video of Gemini that was not conducted in real time.
- Safety and Responsibility: Google has emphasised that it is committed to building AI responsibly, with safety and security as key priorities.
Google DeepMind is also developing other related technologies, including Veo for video generation, Imagen for text-to-image generation, and AlphaFold for protein structure prediction.