Google Unveils Gemini: A Multimodal AI for Seamless Information Processing
Google has announced the launch of Gemini, its new multimodal AI model that aims to revolutionize information processing. Available globally starting December 6, 2023, Gemini is being touted as the most capable and flexible AI model to date. It will be integrated into Google Bard, including the latest Pixel 8 Pro smartphones, marking a major upgrade for the platform.
Gemini Pro, the fine-tuned version of Gemini, boasts enhanced capabilities in areas such as understanding, summarizing, reasoning, coding, and planning. Users can now test drive Google Bard with Gemini Pro, with support for additional modalities set to be added in the coming days. As of now, Gemini is accessible in more than 170 countries, with plans to expand language support to regions like Europe in the near future.
Sundar Pichai, CEO of Google, expressed his excitement about the development, stating that Gemini 1.0 has been optimized for different sizes, including Ultra, Pro, and Nano. These variants demonstrate Gemini’s state-of-the-art performance across various benchmarks. Pichai emphasized that the launch of Gemini represents a significant scientific and engineering achievement for the company.
Gemini, the result of collaboration between multiple teams, including Google Research, is designed to be multimodal, meaning it can seamlessly understand and combine different types of information, including text, code, audio, image, and video. During a demonstration, Google showcased Gemini’s ability to perceive like a human eye, comprehend and evaluate in real-time, and provide suggestions for the next course of action.
The Gemini model comes in three variants: Ultra, Pro, and Nano. While Ultra is the largest and most capable model, suitable for highly complex tasks, Pro excels in scalability across a broad spectrum of tasks. On the other hand, Nano is designed to handle on-device tasks. Pixel 8 Pro users will benefit from Gemini Nano integration, enabling features like Summarise in the Recorder app and Smart Reply via Gboard, initially with WhatsApp. Gemini’s deployment will extend to various Google products and services, including Search, Ads, Chrome, and Duet AI.
In ongoing experiments, Google is incorporating Gemini into Search, aiming to enhance the Search Generative Experience (SGE) by reducing latency by 40% in English in the US, while also improving quality.
Starting from December 13, developers and enterprise customers will be able to access Gemini Pro through the Gemini API in Google AI Studio or Google Cloud Vertex AI. Android developers, specifically, can leverage Gemini Nano via AICore, a new system capability introduced in Android 14, which will be available on Pixel 8 Pro devices. The release of Gemini Ultra, undergoing trust and safety checks, will provide select customers, developers, partners, and safety experts with an opportunity for early experimentation and feedback before its broader release next year.
Furthermore, Google is working on a specifically tuned version of Bard Advanced, which will incorporate Gemini Pro in English. This version will feature enhanced reasoning, planning, understanding, and more. Users will have early access to Google’s most advanced models and capabilities, starting with Gemini Ultra.
Addressing concerns about hallucinations in AI models, Eli Collins, VP of Product at Google DeepMind, stated that while Gemini has made significant strides in improving factuality, there is still a chance of hallucination. To enhance response accuracy when integrating Gemini with products like Bard, additional techniques have been implemented.
Google asserts that Gemini Ultra surpasses current state-of-the-art results on 30 out of 32 widely-used language model benchmarks. Achieving a score of 90.0%, Gemini Ultra becomes the first model to outperform human experts in massive multitask language understanding (MMLU). These evaluations cover a wide range of subjects, including math, physics, history, law, medicine, and ethics, testing both world knowledge and problem-solving abilities. Google also claims Gemini’s ability to understand, explain, and generate high-quality code in programming languages like C++, Python, Go, and Java.
While testing Google Bard powered by Gemini, we found that the system still provides the same old replies. Information about a requested person is available, but it doesn’t provide the top 5 songs rendered by that particular individual. However, the same prompt works effectively with Microsoft Copilot.
Google’s Gemini represents a significant leap forward in AI technology, offering powerful capabilities for information processing across various modalities. With its integration into Google Bard and subsequent expansion to other products and services, Gemini aims to enhance user experiences and provide seamless access to information and support.