Google has unveiled Gemini 2.0, the latest iteration of its AI models, marking a significant step in the “agentic era” of artificial intelligence.
This model builds on the multimodal capabilities of its predecessor, Gemini 1.0, which organized and processed information across various formats like text, video, images, and code. Gemini 2.0 expands these capabilities with advanced multimodality, including native image and audio outputs, tool integration, and improved reasoning. The model is designed to create a new class of AI agents capable of acting proactively under user supervision, advancing Google’s vision of a universal assistant.
The experimental Gemini 2.0 Flash model is already accessible to developers and trusted testers via the Gemini API, Google AI Studio, and Vertex AI. It boasts enhancements such as faster response times, multimodal inputs and outputs, and integrated tools for tasks like coding, searching, and executing user-defined functions.
A new Multimodal Live API supports real-time audio, video-streaming input, and interactive applications. Full general availability is set for January 2025.
Search
Google is integrating Gemini 2.0 into its core products, starting with Search. AI Overviews, now reaching 1 billion users, will benefit from Gemini 2.0’s reasoning capabilities to address complex queries and tasks like advanced math, multimodal queries, and coding. A new feature, Deep Research, acts as an advanced research assistant capable of exploring intricate topics and compiling comprehensive reports.
images courtesy of Google
The Gemini 2.0 ecosystem introduces groundbreaking prototypes:
- Project Astra leverages multimodal understanding for a universal AI assistant, now with multilingual dialogue, better memory, and tool usage like Google Maps.
- Project Mariner, a browser-focused prototype, uses Gemini 2.0 to reason across browser screens, completing tasks through an experimental Chrome extension while ensuring user safety with built-in controls.
- Jules, an AI-powered coding agent, integrates with GitHub workflows to assist developers with planning and executing coding tasks.
Google is also applying Gemini 2.0 to gaming and robotics. In collaboration with developers like Supercell, the model navigates virtual game worlds and offers real-time suggestions. In robotics, its spatial reasoning capabilities hold promise for real-world applications.
Why is this important?
Gemini 2.0 represents a leap toward advanced, responsible AI. With innovations spanning multimodal capabilities, agentic AI, and practical applications, Google continues to shape the future of artificial intelligence and is set to make a significant impact on Search in the next 12 months.