Dolly – Free Open Source AI Model in the Style of ChatGPT

Date:

On Wednesday, Databricks released an innovative new development, Dolly 2.0, the first freely available, instruction-based large language model (LLM) for commercial use and fine-tuned on a human-generated data set. It represents a spark in the language model universe, providing a foundation for developing ChatGPT-style competitors.

Databricks, an American enterprise software company founded in 2013 by the creators of Apache Spark, enables organizations to create and customize LLMs without repercussions such as need to pay for API access or sharing data with third parties.

Dolly 2.0 is a 12-billion-parameter model, originally based on EleutherAI’s pythia model family and solely fine-tuned with Databricks-dolly-15K, a data set crowdsourced from Databricks personnel. This fine-tuning gives it capabilities closer to OpenAI’s ChatGPT, a model capable of properly answering questions and engaging realistically with conversations.

In March of this year, Databricks began their journey with the release of Dolly 1.0, which was hampered by limitations due to the training data featuring ChatGPT outputs, which required users to adhere to OpenAI’s terms of service.

The Databricks team then decided to take on the colossal task of creating a new data set to enable commercially accessible LLMs; a 13,000-demonstration data set crowdsourced from over 5,000 employees, who were encouraged by a participating competition. The tasks for data generation included open Q&A, closed Q&A, summarizing from Wikipedia, brainstorming, classification, and creative writing.

The data set, model weights and training code were released with a Creative Commons license, allowing any commercial use with modifications and extensions. This is beneficial to organizations in comparison to OpenAI’s ChatGPT, which demand users to pay for API access, and Meta’s LLaMA, which is only partially open source and forbids commercial use.

See also  Elon Musk To Launch ChatGPT Rival To Microsoft

AI researcher Simon Willison deemed the launch of Dolly 2.0 a “really big deal” and commended Databricks for the fine-tuned instruction set created by the 5,000 Databricks personnel members and openly released with Creative Commons license.

The potential of Dolly 2.0 is absolutely astounding; it could potentially spark a new wave of open source language models free from the shackles of proprietary limitations and restrictions on commercial use. Furthermore, further refinements may allow for local consumer-class machines to enjoy the power of these finely-tuned language models.

Databricks LLC is a software company founded by the original creators of Apache Spark — an open-source distributed computing platform designed for processing large datasets. It provides a web-based platform designed with development and distributed processing of big data in mind, featuring support for a variety of languages, libraries, APIs, and other technologies.

Simon Willison is a venture capitalist and AI researcher. He conducts experiments with open source language models, including Dolly. Willison’s comments on the release of Dolly 2.0 created great anticipation for the potential of Open Source language models, summed in his words: “Even if Dolly 2 isn’t good, I expect we’ll see a bunch of new projects using that training data soon. And some of those might produce something really useful.”

The Dolly 2.0 weights are available on Hugging Face and the databricks-dolly-15k data set is free for download from GitHub. It is an exciting time for large language models, with the potential of unlimited possibilities enabled by freely available, open source AI.

Frequently Asked Questions (FAQs) Related to the Above News

Please note that the FAQs provided on this page are based on the news article published. While we strive to provide accurate and up-to-date information, it is always recommended to consult relevant authorities or professionals before making any decisions or taking action based on the FAQs or the news article.

Share post:

Subscribe

Popular

More like this
Related

Samsung Unpacked Event Teases Exciting AI Features for Galaxy Z Fold 6 and More

Discover the latest AI features for Galaxy Z Fold 6 and more at Samsung's Unpacked event on July 10. Stay tuned for exciting updates!

Revolutionizing Ophthalmology: Quantum Computing’s Impact on Eye Health

Explore how quantum computing is changing ophthalmology with faster information processing and better treatment options.

Are You Missing Out on Nvidia? You May Already Be a Millionaire!

Don't miss out on Nvidia's AI stock potential - could turn $25,000 into $1 million! Dive into tech investments for huge returns!

Revolutionizing Business Growth Through AI & Machine Learning

Revolutionize your business growth with AI & Machine Learning. Learn six ways to use ML in your startup and drive success.