Google’s Gemini: is the new AI model really better than ChatGPT?
Google Deepmind has introduced Gemini, its latest AI model that aims to rival OpenAI’s ChatGPT. While both models fall under the category of generative AI, which learns patterns from training data to generate new information, ChatGPT specifically focuses on generating text.
Similar to ChatGPT, Google’s web app called Bard was built on the LaMDA model, which is trained on dialogue. However, Google has now upgraded Bard by incorporating Gemini.
What sets Gemini apart from earlier models like LaMDA is its multimodal capabilities. Unlike ChatGPT, which primarily deals with text, Gemini is a multi-modal model that can handle various types of inputs and outputs, including text, images, audio, and video. This gives rise to a new acronym: LMM (large multimodal model).
In September, OpenAI announced GPT-4Vision, a model that can work with images, audio, and text to some extent. However, it falls short of being a fully multimodal model like Gemini.
For instance, while ChatGPT-4 can process audio inputs and generate speech outputs, OpenAI achieves this by converting speech to text using another deep learning model called Whisper. Similarly, ChatGPT-4 can produce images, but it does so by generating text prompts that are then passed on to a separate model called Dall-E 2, which converts the text into images.
In contrast, Google has designed Gemini to be natively multimodal, meaning its core model directly handles multiple input types and can generate output accordingly.
Evaluating the models, it appears that the currently available version of Gemini, Gemini 1.0 Pro, is not as powerful as GPT-4 and is more aligned with the capabilities of GPT 3.5, according to Google’s technical report and qualitative tests.
Google has also unveiled a more advanced version of Gemini called Gemini 1.0 Ultra, claiming it surpasses GPT-4 in terms of power. However, independent validation of these results is currently not possible since Ultra has yet to be released.
Furthermore, Google’s claims have been somewhat disputed due to a demonstration video that was not conducted in real time. It was revealed that the model was pre-trained on specific tasks, such as a cup and ball trick, using a sequence of still images. This raises questions about the accuracy of Google’s portrayal of Gemini’s capabilities.
Despite these issues, Gemini and large multimodal models hold tremendous promise for the field of generative AI. They not only offer expanded capabilities but also contribute to the competitive landscape of AI tools. With GPT-4 trained on approximately 500 billion words, the limit of new training data for language models has almost been reached. However, multimodal models tap into vast reserves of untapped training data in the form of images, audio, and videos.
Models like Gemini, trained directly on diverse data types, are expected to exhibit even greater capabilities in the future. For instance, video-trained models could develop robust internal representations of concepts like naïve physics, encompassing our understanding of causality, movement, and gravity.
Moreover, the emergence of Google’s Gemini introduces a significant competitor to OpenAI’s dominant GPT models. While OpenAI is likely to be developing GPT-5, also embracing multimodal capabilities and showcasing impressive new features, Google’s entry will drive the field towards further advancement.
Looking ahead, the prospect of open-source and non-commercial large multimodal models is highly anticipated. Additionally, Google’s introduction of a lightweight version called Gemini Nano, capable of running directly on mobile phones, holds promise for reducing the environmental impact of AI computing and prioritizing privacy.
In conclusion, Gemini represents an exciting leap forward for generative AI, enabling it to tap into new data sources and expand its capabilities. While challenges and questions remain regarding Gemini’s current performance and the transparency of its demonstrations, the progress made towards multimodal models is undeniable. As the landscape of AI continues to evolve, the emergence of major competitors like Gemini will shape the future of this groundbreaking technology.
Read more: Google’s Gemini AI hints at the next great leap for the technology: analyzing real-time information