Databricks is a San Francisco based data analytics company founded in 2013 by Ali Ghodsi, Ion Stoica, Matei Zaharia, Patrick Wendell and Reynold Xin. The firm offers a cloud-based platform for the development, deployment and management of data-based applications such as data lakes, pipelines, streaming and analytics solutions.
Today, Databricks released Dolly 2.0, an open source text-generating AI model that can power apps like chatbots, text summarizers, and basic search engines. Offered under a license that allows independent developers and companies to use it commercially, this second-generation model can help developers build applications on the Databricks platform.
Ali Ghodsi, CEO of Databricks, stated that the company wanted to support open and transparent large language models (LLMs). While there have been many open source models, most make use of datasets that contain outputs from OpenAI, which violates OpenAI’s terms of service. To train Dolly 2.0, Databricks used a training set with 15,000 records generated by thousands of Databricks employees.
Dolly 2.0 suffers from the same limitations as GPT-J-6B, which can only generate text in English and may contain abusive language. However, Ghodsi noted that it is not intended to be the best model of its kind, and can be best used for responding to customer support tickets, extracting information from legal briefs and generating code.
Notably, when compared to services like OpenAI which restrict their use and licensing terms, the open source nature of Dolly 2.0 quickly gives it potential appeal. For this reason, Databricks must be aware of the risks that come with model release such as those with Stable Diffusion, which was used to create non-consensual celebrity deepfakes before it was open source.
In 2021, we saw Databricks open-source their model ChatGPT-like model, Dolly 2.0, to allow independent developers and companies take advantage of the opportunities it offers. Despite the possible risks that come with open source models like Dolly 2.0, it nonetheless provides a cost-free and easy alternative for developers and firms to quickly create AI-powered applications. We can only wait and see how the developments of this will fare in the future.