Google make surprise AI announcement with Gemini 1.5 almost ready

Demis Hassabis, CEO of Google DeepMind has announced the release of Gemini 1.5 – just one week after launching Gemini 1.0.

Gemini 1.5 is heralded by Google as a significant leap forward in performance and efficiency, driven by innovations in research and engineering, particularly in its Mixture-of-Experts (MoE) architecture. This architecture, pioneered by Google, enhances efficiency by dividing the model into smaller expert neural networks, allowing for selective activation of relevant pathways based on input type.

The announcement introduces Gemini 1.5 Pro, a mid-size multimodal model optimised for scalability across various tasks. Notably, it introduces a breakthrough experimental feature in long-context understanding, with a standard 128,000 token context window that can be expanded to 1 million tokens for select developers and enterprise customers in a private preview.

The expanded context window allows Gemini 1.5 Pro to process vast amounts of information in a single prompt, enabling complex reasoning across modalities such as text, code, images, audio, and video.

The new model demonstrates impressive capabilities, including analysing 402-page transcripts of historical events like the Apollo 11 mission, interpreting silent films like those of Buster Keaton, and reasoning across lengthy blocks of code.

Performance evaluations indicate that Gemini 1.5 Pro outperforms its predecessor, Gemini 1.0 Pro, on 87% of benchmarks, performing comparably to the larger Gemini 1.0 Ultra model. Notably, it maintains high performance even with the expanded context window, successfully locating specific information within long blocks of text and demonstrating in-context learning abilities.

Early access to Gemini 1.5 Pro is currently only being offered to developers and enterprise customers via AI Studio and Vertex AI with plans to introduce pricing tiers based on context window size.

Testers can explore the 1 million token context window at no cost during the testing phase, with improvements in speed anticipated in the future.

Why is this important?

Through a series of machine learning innovations, Google claims to have increased 1.5 Pro’s context window capacity far beyond the original 32,000 tokens for Gemini 1.0. –  running up to 1 million tokens in production.

This means 1.5 Pro can process vast amounts of information in one go — including 1 hour of video, 11 hours of audio, codebases with over 30,000 lines of code or over 700,000 words. 

For more details, see  Gemini 1.5 Pro technical report.

images courtesy of Google



Share this post

Sign up to our Newsletter for more content like this

By signing up you agree to our Privacy Policy