Dolly – Free Open Source AI Model in the Style of ChatGPT

Date:

On Wednesday, Databricks released an innovative new development, Dolly 2.0, the first freely available, instruction-based large language model (LLM) for commercial use and fine-tuned on a human-generated data set. It represents a spark in the language model universe, providing a foundation for developing ChatGPT-style competitors.

Databricks, an American enterprise software company founded in 2013 by the creators of Apache Spark, enables organizations to create and customize LLMs without repercussions such as need to pay for API access or sharing data with third parties.

Dolly 2.0 is a 12-billion-parameter model, originally based on EleutherAI’s pythia model family and solely fine-tuned with Databricks-dolly-15K, a data set crowdsourced from Databricks personnel. This fine-tuning gives it capabilities closer to OpenAI’s ChatGPT, a model capable of properly answering questions and engaging realistically with conversations.

In March of this year, Databricks began their journey with the release of Dolly 1.0, which was hampered by limitations due to the training data featuring ChatGPT outputs, which required users to adhere to OpenAI’s terms of service.

The Databricks team then decided to take on the colossal task of creating a new data set to enable commercially accessible LLMs; a 13,000-demonstration data set crowdsourced from over 5,000 employees, who were encouraged by a participating competition. The tasks for data generation included open Q&A, closed Q&A, summarizing from Wikipedia, brainstorming, classification, and creative writing.

The data set, model weights and training code were released with a Creative Commons license, allowing any commercial use with modifications and extensions. This is beneficial to organizations in comparison to OpenAI’s ChatGPT, which demand users to pay for API access, and Meta’s LLaMA, which is only partially open source and forbids commercial use.

See also  Apple Nearing ChatGPT Integration in iPhone as AI Plans Unfold

AI researcher Simon Willison deemed the launch of Dolly 2.0 a “really big deal” and commended Databricks for the fine-tuned instruction set created by the 5,000 Databricks personnel members and openly released with Creative Commons license.

The potential of Dolly 2.0 is absolutely astounding; it could potentially spark a new wave of open source language models free from the shackles of proprietary limitations and restrictions on commercial use. Furthermore, further refinements may allow for local consumer-class machines to enjoy the power of these finely-tuned language models.

Databricks LLC is a software company founded by the original creators of Apache Spark — an open-source distributed computing platform designed for processing large datasets. It provides a web-based platform designed with development and distributed processing of big data in mind, featuring support for a variety of languages, libraries, APIs, and other technologies.

Simon Willison is a venture capitalist and AI researcher. He conducts experiments with open source language models, including Dolly. Willison’s comments on the release of Dolly 2.0 created great anticipation for the potential of Open Source language models, summed in his words: “Even if Dolly 2 isn’t good, I expect we’ll see a bunch of new projects using that training data soon. And some of those might produce something really useful.”

The Dolly 2.0 weights are available on Hugging Face and the databricks-dolly-15k data set is free for download from GitHub. It is an exciting time for large language models, with the potential of unlimited possibilities enabled by freely available, open source AI.

Frequently Asked Questions (FAQs) Related to the Above News

Please note that the FAQs provided on this page are based on the news article published. While we strive to provide accurate and up-to-date information, it is always recommended to consult relevant authorities or professionals before making any decisions or taking action based on the FAQs or the news article.

Share post:

Subscribe

Popular

More like this
Related

WhatsApp Unveils New AI Feature: Generate Images of Yourself Easily

WhatsApp introduces a new AI feature, allowing users to easily generate images of themselves. Revolutionizing the way images are interacted with on the platform.

India to Host 5G/6G Hackathon & WTSA24 Sessions

Join India's cutting-edge 5G/6G Hackathon & WTSA24 Sessions to explore the future of telecom technology. Exciting opportunities await! #IndiaTech #5GHackathon

Wimbledon Introduces AI Technology to Protect Players from Online Abuse

Wimbledon introduces AI technology to protect players from online abuse. Learn how Threat Matrix enhances player protection at the tournament.

Hacker Breaches OpenAI, Exposes AI Secrets – Security Concerns Rise

Hacker breaches OpenAI, exposing AI secrets and raising security concerns. Learn about the breach and its implications for data security.