San Francisco based company Databricks recently released a large language model (LLM) Dolly 2.0, an upgrade of the model they released two weeks ago. This LLM model was the base of chatbots such as the chatGPT. It is the very first open-source, instruction following LLM in the artificial intelligence space that is fine-tuned upon a freely accessible dataset. Asides that, it also comes freely and without API access, or the need to share data with third parties.
In order to make the process of fine-tuning this model a reality, the firm acquired over 15,000 pieces of data generated by responses from 5,000 employees from 40 countries – all in the scope of Databricks. This process cost millions of dollars, as revealed by the CEO. Users are given access to examine the training data of Dolly 2.0, something that the models OpenAI ChatGPT and Google Bard do not offer.
The CEO further reveals that the dataset gathered is not perfect as it skews male, however it is purposefully designed for large language models to in order to demonstrate ChatGPT-like capabilities.
The company Databricks is a cloud computing platform, specializing in data engineering and data science. It was founded in 2013 by Ali Ghodsi, who is currently the CEO of the company. Under his leadership, the company has dedicated its work to simplifying the data life cycle and making it easier, even more efficient and accessible for those who work in data engineering, machine learning and business analytics. With the release of Dolly2.0 and its dataset, Databricks proves its capacity to stay ahead of the curve within the Artificial Intelligence game.