Dolly – Free Open Source AI Model in the Style of ChatGPT

Date:

On Wednesday, Databricks released an innovative new development, Dolly 2.0, the first freely available, instruction-based large language model (LLM) for commercial use and fine-tuned on a human-generated data set. It represents a spark in the language model universe, providing a foundation for developing ChatGPT-style competitors.

Databricks, an American enterprise software company founded in 2013 by the creators of Apache Spark, enables organizations to create and customize LLMs without repercussions such as need to pay for API access or sharing data with third parties.

Dolly 2.0 is a 12-billion-parameter model, originally based on EleutherAI’s pythia model family and solely fine-tuned with Databricks-dolly-15K, a data set crowdsourced from Databricks personnel. This fine-tuning gives it capabilities closer to OpenAI’s ChatGPT, a model capable of properly answering questions and engaging realistically with conversations.

In March of this year, Databricks began their journey with the release of Dolly 1.0, which was hampered by limitations due to the training data featuring ChatGPT outputs, which required users to adhere to OpenAI’s terms of service.

The Databricks team then decided to take on the colossal task of creating a new data set to enable commercially accessible LLMs; a 13,000-demonstration data set crowdsourced from over 5,000 employees, who were encouraged by a participating competition. The tasks for data generation included open Q&A, closed Q&A, summarizing from Wikipedia, brainstorming, classification, and creative writing.

The data set, model weights and training code were released with a Creative Commons license, allowing any commercial use with modifications and extensions. This is beneficial to organizations in comparison to OpenAI’s ChatGPT, which demand users to pay for API access, and Meta’s LLaMA, which is only partially open source and forbids commercial use.

See also  O'Reilly Offers Learning Resources on Large Language Models and ChatGPT

AI researcher Simon Willison deemed the launch of Dolly 2.0 a “really big deal” and commended Databricks for the fine-tuned instruction set created by the 5,000 Databricks personnel members and openly released with Creative Commons license.

The potential of Dolly 2.0 is absolutely astounding; it could potentially spark a new wave of open source language models free from the shackles of proprietary limitations and restrictions on commercial use. Furthermore, further refinements may allow for local consumer-class machines to enjoy the power of these finely-tuned language models.

Databricks LLC is a software company founded by the original creators of Apache Spark — an open-source distributed computing platform designed for processing large datasets. It provides a web-based platform designed with development and distributed processing of big data in mind, featuring support for a variety of languages, libraries, APIs, and other technologies.

Simon Willison is a venture capitalist and AI researcher. He conducts experiments with open source language models, including Dolly. Willison’s comments on the release of Dolly 2.0 created great anticipation for the potential of Open Source language models, summed in his words: “Even if Dolly 2 isn’t good, I expect we’ll see a bunch of new projects using that training data soon. And some of those might produce something really useful.”

The Dolly 2.0 weights are available on Hugging Face and the databricks-dolly-15k data set is free for download from GitHub. It is an exciting time for large language models, with the potential of unlimited possibilities enabled by freely available, open source AI.

Frequently Asked Questions (FAQs) Related to the Above News

Please note that the FAQs provided on this page are based on the news article published. While we strive to provide accurate and up-to-date information, it is always recommended to consult relevant authorities or professionals before making any decisions or taking action based on the FAQs or the news article.

Share post:

Subscribe

Popular

More like this
Related

Obama’s Techno-Optimism Shifts as Democrats Navigate Changing Tech Landscape

Explore the evolution of tech policy from Obama's optimism to Harris's vision at the Democratic National Convention. What's next for Democrats in tech?

Tech Evolution: From Obama’s Optimism to Harris’s Vision

Explore the evolution of tech policy from Obama's optimism to Harris's vision at the Democratic National Convention. What's next for Democrats in tech?

Tonix Pharmaceuticals TNXP Shares Fall 14.61% After Q2 Earnings Report

Tonix Pharmaceuticals TNXP shares decline 14.61% post-Q2 earnings report. Evaluate investment strategy based on company updates and market dynamics.

The Future of Good Jobs: Why College Degrees are Essential through 2031

Discover the future of good jobs through 2031 and why college degrees are essential. Learn more about job projections and AI's influence.