How BeInCrypto Aided AI-Based ChatGPT in Enhancing Its Intellectual Capacity

Date:

Recent research conducted by the Washington Post and the Allen Institute for AI concluded that BeInCrypto was included in the dataset necessary to train Artificial Intelligence (AI) models like ChatGPT. The dataset in question, C4, consists of websites that AI technologies “scrape” in order to build up their language models. This dataset, which stands for Colossal Clean Crawled Corpus, has been used to help large language models, such as that of ChatGPT, mimic human speech.

The Washington Post leveraged data from web analytics company Similarweb to rank the top 10 million websites utilized by the dataset. Resultingly, it became apparent that the top three contributors to the dataset were patents.google.com, wikipedia.org and scribd.com, a subscription-based digital library. Aside from these, news organizations such as the Guardian, New York Times, Forbes, LA Times and Huffington Post also featured heavily among the AI model’s data sources.

Additionally, the researchers observed the presence of websites such as Instructables, an online platform for DIY instruction and tutorials. They even detected the presence of twenty seven sites certified as markets for piracy and counterfeiting by the US government.

C4 was first scraped in 2019 by the non-profit CommonCrawl and as such, is free to use and analyze. Despite its popularity amongst AI language models, its usage has proven contentious in sectors most at risk from AI. Namely, due to the fact that AI training does not pay content creators for the use of their data. This problem was recently met with a copyright lawsuit issued against Midjourney and Stable Diffusion AI image tools for scraping artwork without consent from the artists.

See also  Elon Musk Set to Challenge Google Communications with New Chatbot TruthGPT

In conclusion, BeInCrypto was recognised by the Washington Post and the Allen Institute for AI as a website that contributed to the C4 dataset used to improve AI technology like ChatGPT. C4, which stands for Colossal Clean Crawled Corpus, was popular amongst AI language models and sought to allow AI to mimic human speech. Nevertheless, its usage has become increasingly controversial due to its lack of compensation to content creators.

Frequently Asked Questions (FAQs) Related to the Above News

Please note that the FAQs provided on this page are based on the news article published. While we strive to provide accurate and up-to-date information, it is always recommended to consult relevant authorities or professionals before making any decisions or taking action based on the FAQs or the news article.

Share post:

Subscribe

Popular

More like this
Related

Obama’s Techno-Optimism Shifts as Democrats Navigate Changing Tech Landscape

Explore the evolution of tech policy from Obama's optimism to Harris's vision at the Democratic National Convention. What's next for Democrats in tech?

Tech Evolution: From Obama’s Optimism to Harris’s Vision

Explore the evolution of tech policy from Obama's optimism to Harris's vision at the Democratic National Convention. What's next for Democrats in tech?

Tonix Pharmaceuticals TNXP Shares Fall 14.61% After Q2 Earnings Report

Tonix Pharmaceuticals TNXP shares decline 14.61% post-Q2 earnings report. Evaluate investment strategy based on company updates and market dynamics.

The Future of Good Jobs: Why College Degrees are Essential through 2031

Discover the future of good jobs through 2031 and why college degrees are essential. Learn more about job projections and AI's influence.