Start of the Article
On Wednesday, pioneer machine learning firm, Databricks, released a substantial dataset that can be used to train chatbots for free, in a move to challenge OpenAI’s dominance in the language model market. The open-source release of Dolly 2.0 with exclusive fine-tuning on a new dataset from Databricks’ employees, is geared to make AI much more accessible for all.
With 12 billion parameters, Dolly 2.0 is built on the EleutherAI pythia model family, and includes the complete training code, dataset and model weights suitable for commercial use. Databricks CEO, Ali Ghodsi noted that the company’s aim is to make it easier for customers to use language models without paying for API access or sharing data with third parties.
The announcement comes two weeks after the launch of Dolly, an LLM trained on ChatGPT data. The inability of the former version is its limitation of being employable in commercial applications, prohibited by OpenAI’s terms of service to protect its own offerings. This later drove the company to creating its own dataset (databricks-dolly-15k) that can be used commercially, elevating the parameters of Dolly 2.0 in the language model industry.
Andy Thurai, VP and principal analyst from Constellation Research, Inc. believes that Databricks’ move to open-source data and access to Dolly 2.0 is to assist the machine learning community in leveraging the Databricks platform for creating individualized versions of language models that can rival those offered by Microsoft and Google, highlighting the company’s continued effort to encourage the application of language models for business ventures.
In an exclusive interview on the Fortt Knoxshow, Ghodsi expressed his strong conviction that artificial intelligence and machine learning have the potential to tackle any and all pressing challenges in the world today. Through Dolly 2.0, the entrepreneur seeks to introduce greater public exposure to the traditionally private-controlled technology.
About Databricks
Developed by the team behind Apache Spark, Databricks is a software company that provides the Databricks Unified Analytics Platform, a data and AI platform that caters to organizations of any size across industries. This platform simplifies data and AI, optimizing on the entire data pipeline from data ingestion to production making sure data is secure and compliant.
About Ali Ghodsi
Ali Ghodsi is one of the founders of Databricks and served as its CEO since its establishment in 2013. Ghodsi holds a Ph.D from the University of California, Berkeley, and has operated in various executive roles with industry giants such as Apple, Diffblue, and Sun Microsystems. In 2018, Ghodsi was honored by Forbes 30 under 30 list for his immense contributions to enterprise software.
In conclusion, Databricks’ latest stride in providing unrestricted access to exclusive, and AI-friendly datasets with Dolly 2.0, is to promote better public exposure and subsequently, greater compassion for tackling underlying issues. As Ghodsi emphasizes, Databricks will continue to invest in open-source developments as well as innovating existing models to meet the needs of businesses.