A group of researchers have just announced the release of MiniGPT-4, an open-source model designed to perform complex vision-language tasks like those undertaken by GPT-4. MiniGPT-4 has been enhanced with the ability to process images, a feature not yet available in GPT-4. The code, demos, and training instructions have been released to the public on Github.
MiniGPT has been built using Vicuna, an open source platform built on LLaMA and the same language decoder used by GPT-4, and the BLIP-2 Vision Language Model, a visual decoder. MiniGPT also has many of the capabilities of GPT-4 including image description generation and website creation from hand-written drafts.
OpenAI, the company behind the development of GPT-4, has yet to reveal many details about the architecture and training methods. This makes the release of MiniGPT-4 a significant development for the world of AI research. MiniGPT offers a powerful and open source alternative to GPT-4, allowing researchers to develop and explore new ways of incorporating vision into natural language processing.
OpenAI is a San Francisco-based research lab focusing on Artificial Intelligence (AI). Founded in 2015, the company has collaborated with several organizations and institutions, including Microsoft and the U.S. Department of Defense. It is most well-known for its development of GPT-3, a state-of-the-art language model, and GPT-4, a versatile, multimodal AI system.
Gwern Branwen is a researcher and programmer who has worked for OpenAI since 2018. Branwen has a PhD in computational neuroscience and is a core member of the differentiable neural network (DNN) team at OpenAI. Branwen has contributed to various OpenAI projects, including GPT-3, GPT-4, and the most recent MiniGPT-4. His research in AI systems has helped OpenAI team to lead the field of natural language processing.