Google announced today the launch of Gemini, its new multimodal large language model developed by the AI division, DeepMind. Gemini aims to compete with OpenAI’s ChatGPT and will serve as the foundation for Google Bard, a chatbot that has struggled to gain recognition in the shadow of its competitor.
Gemini is unique among its AI counterparts because it was designed to be multimodal from the start, meaning it can handle text, audio, and image-based prompts. In a demo video, Gemini successfully identifies objects, infers actions in videos, generates music based on visual prompts, and even assesses children’s homework with a playful personality. However, it’s worth noting that the video description specifies that latency has been reduced and the Gemini outputs have been shortened for brevity.
According to Google’s CEO Sundar Pichai and DeepMind’s co-founder and CEO Demis Hassabis, there are three versions of Gemini available: Ultra, Pro, and Nano. The fine-tuned Gemini Pro supports Google Bard, while the Nano variant will be incorporated into products like Pixel Pro smartphones. In the coming months, Gemini will also be integrated into Google Search, Ads, and Chrome. However, public access to the Ultra version will not be available until 2024.
Gemini’s technical report reveals that its most powerful version, Ultra, outperforms current state-of-the-art benchmarks on 30 out of 32 widely-used academic benchmarks in the large language model research and development field. While the improvements may seem modest, with Gemini Ultra correctly answering multidisciplinary questions 90% of the time compared to ChatGPT’s 86.4%, it is clear that Gemini poses real competition for ChatGPT.
Despite its impressive capabilities, Google acknowledges that Gemini is not flawless and is susceptible to the industry-wide challenge of hallucinations, where the AI model occasionally generates incorrect or nonsensical responses. To address this, Google subjected Gemini to extensive safety evaluations, including testing its response to problematic inputs and assessing potential biases.
Google plans to gradually integrate Gemini into its suite of products, starting with closed testing phases. If all goes according to plan, the public can expect a Gemini Ultra-powered Bard Advanced release next year. Nevertheless, predicting the outcomes of the ongoing AI arms race remains challenging.
In a statement to PopSci, when asked if Bard is powered by Gemini, the chatbot responded that it does not have access to information regarding internal Google projects. It recommended searching for information through official Google channels or contacting someone within the company for more details.
With Gemini’s arrival, Google aims to position itself as a leading player in the realm of multimodal language models, offering innovative capabilities that expand beyond traditional text-based AI models. As Gemini continues to evolve and integrate into Google’s products, it will undoubtedly shape the future of AI-powered interactions and redefine the boundaries of what AI can achieve.
In conclusion, Gemini presents a significant milestone for Google’s AI division, DeepMind, with its multimodal capabilities and potential to challenge OpenAI’s ChatGPT. While Gemini is not without its flaws, its integration into Google’s suite of products holds promise for delivering enhanced user experiences and shaping the future of AI technology. As public access to the Ultra version remains on the horizon, the ongoing AI arms race continues to captivate both industry experts and curious observers eager to witness the next breakthrough in artificial intelligence.