Demis Hassabis, CEO and Co-Founder of Google DeepMind, has introduced Gemini, Google’s latest and most advanced AI model, in an article highlighting its key features and capabilities.
Gemini is positioned as Google’s largest and most flexible AI model, with the ability to efficiently operate across various platforms, from data centers to mobile devices.
Google outlines three optimised versions of Gemini 1.0: Ultra, Pro, and Nano, each tailored for specific tasks.
Gemini’s state-of-the-art performance is emphasised through rigorous testing and evaluation on a wide range of tasks, from natural image and audio understanding to mathematical reasoning.
Gemini Ultra stands out by exceeding current state-of-the-art results on 30 out of 32 widely-used academic benchmarks, with a groundbreaking score of 90% on the Massive Multitask Language Understanding (MMLU) benchmark, surpassing even human experts. The article introduces new benchmark approaches that enhance Gemini’s reasoning capabilities, leading to improvements in handling difficult questions.
In addition to excelling in text and coding benchmarks, Gemini Ultra achieves a state-of-the-art score of 59.4% on the new Multimodal Massive Multitask Understanding (MMMU) benchmark, showcasing its proficiency in tasks requiring deliberate reasoning across different domains.
Gemini’s native multimodality is highlighted in image benchmarks, where it outperforms previous state-of-the-art models without relying on object character recognition (OCR) systems, demonstrating early signs of complex reasoning abilities.
Gemini’s next-generation capabilities differentiate it from standard multimodal models that involve stitching together separately trained components. Gemini is designed to be natively multimodal, pre-trained on different modalities, and fine-tuned with additional multimodal data, resulting in superior capabilities across various domains.
Google are keen to highlight Gemini’s sophisticated multimodal reasoning capabilities, emphasising its unique skills in making sense of complex written and visual information. Gemini’s ability to extract insights from vast amounts of data is presented as a catalyst for new breakthroughs in fields ranging from science to finance.
Gemini 1.0 is trained to recognise and understand text, images, audio, and more simultaneously, making it especially adept at explaining reasoning in complex subjects like math and physics.
The model’s advanced coding abilities are also highlighted, with Gemini Ultra excelling in coding benchmarks, including HumanEval and Natural2Code, making it a leading foundation model for coding worldwide.
The reliability, scalability, and efficiency of Gemini 1.0 are underlined, with training conducted on Google’s AI-optimised infrastructure using Tensor Processing Units (TPUs) v4 and v5e.
This new announcement introduces Cloud TPU v5p, the most powerful TPU system to date, designed to accelerate Gemini’s development and facilitate faster training of large-scale generative AI models.
Why is this important?
Responsibility and safety have been specifically emphasised as core principles in Gemini’s development. The AI model will undergo comprehensive safety evaluations, including assessments for bias and toxicity. Google has stated that it actively collaborates with external experts to stress-test models across various issues and uses benchmarks like Real Toxicity Prompts to diagnose content safety issues during training.
Gemini is gradually rolling out across various Google products, with Gemini Pro integrated into Google products and Gemini Nano powering features in Pixel 8 Pro. Developers and enterprise customers will gain access to Gemini Pro via the Gemini API in Google AI Studio or Google Cloud Vertex AI starting from December 13.
Gemini Ultra is currently undergoing extensive trust and safety checks, with an early experimentation phase before a broader rollout early next year.
Read more details in Google’s Gemini technical report.
cover image courtesy of Google