Google’s Flagship AI Model Gets a Mighty Fast Upgrade
Google’s flagship AI model, Gemini, is receiving a significant upgrade just two months after its initial release. The new version, Gemini Pro 1.5, boasts enhanced capabilities and improved performance, making it more powerful and versatile than its predecessor. This advanced AI model can now handle massive amounts of text, video, or audio input simultaneously, showcasing its incredible capacity for processing information.
Developed by Google DeepMind, Gemini Pro 1.5 has the ability to analyze extensive documents and media files with ease. In a demonstration, the AI model was shown analyzing a 402-page PDF of the Apollo 11 communications transcript, identifying humorous portions within the text. Additionally, Gemini Pro 1.5 showcased its understanding of specific actions in a Buster Keaton movie, showcasing its ability to answer questions about video content. These tasks would have been challenging for the previous version of Gemini, which had limitations regarding the length of text or video it could process. Google hopes that the upgraded capabilities of Gemini Pro 1.5 will inspire developers to create new and innovative applications using this advanced AI model.
Gemini Pro 1.5 sets itself apart from other AI models, including OpenAI’s GPT-4, with its ability to ingest and comprehend massive amounts of data. Google claims that it can handle an hour of video, 11 hours of audio, 700,000 words, or 30,000 lines of code in a single processing instance, surpassing the capabilities of its competitors. However, Google has not disclosed the technical details behind this impressive feat. One potential application for models that can process large amounts of text is extracting crucial insights from lengthy Discord discussions that contain thousands of messages.
Notably, Gemini Pro 1.5 is more efficient than its size would suggest, thanks to a technique known as mixture of experts. This technique selectively activates parts of the model that are best suited to solve a specific task, increasing its performance without requiring additional computing power. Despite being smaller than the most powerful offering, Gemini Ultra, Gemini Pro 1.5 matches its capabilities in many tasks, demonstrating the effectiveness of this technique. Google’s CEO Demis Hassabis suggests that this technique can also be applied to enhance Gemini Ultra in the future.
The upgraded Gemini Pro 1.5 will be made available to developers through AI Studio, a sandbox for testing model capabilities. Additionally, a limited number of developers will have access to the upgrade via Google’s Vertex AI cloud platform API. However, there is no specific release date for the general availability of Gemini Pro 1.5.
Google is also launching new tools to assist developers in leveraging Gemini’s capabilities in their applications. These tools include ways to utilize the AI model’s video and audio parsing abilities. Project IDX, Google’s web-based coding tool, will also receive new Gemini-powered features, enabling AI to debug and test code.
The rapid upgrade of Gemini reflects the intense competition among AI developers, driven by the success of OpenAI’s ChatGPT. OpenAI recently announced that ChatGPT would be enhanced with the ability to remember useful information over long periods of time, further fueling the AI race. Google, on the other hand, rebranded its chatbot Bard and announced the availability of the paid subscription for Gemini Ultra.
While the advancements in generative AI are exciting, concerns about the risks associated with this technology persist. To address these concerns, Google claims to have extensively tested Gemini Pro 1.5 and is offering limited access to gather feedback on potential risks. Furthermore, Google has provided researchers at the UK’s AI Safety Institute with access to its most powerful models for testing purposes.
Google’s CEO, Hassabis, promises more advancements in the coming months. He describes this pace as a new cadence reminiscent of startup mentalities. As the AI industry continues to evolve rapidly, it remains to be seen what groundbreaking developments lie ahead.