ChatGPT, the large language model developed by OpenAI, is currently no threat to Wikipedia, according to Jimmy Wales, the co-founder of Wikipedia. Speaking at an OpenUK press event, Wales stated that ChatGPT is simply not accurate enough to generate factual content. While it may produce plausible responses, it often provides completely inaccurate information.
Wikipedia, known for its reliable and verified content, does not view ChatGPT as a suitable tool for famous or obscure topics. The model tends to hallucinate and generate unreliable information, making it unsuitable for use on the platform. However, Wales highlighted that large language models (LLMs) like ChatGPT could potentially assist human contributors in the future.
One possible way LLMs could be helpful is by aiding in the time-consuming task of fact-checking. These models could be programmed to read entries and cross-check them against cited references for accuracy and completeness. This could streamline the process for human contributors, providing them with quick suggestions for improvement.
While ChatGPT may not pose a direct threat to Wikipedia, copyright issues surrounding LLMs are a hot topic. Wales estimated that 50% of the information used to train OpenAI’s GPT-4 model came from Wikipedia, which is in the public domain. While he has no problem with this, others disagree. Comedian Sarah Silverman and others have filed lawsuits against OpenAI and Meta, alleging infringement of their work as input for training the models.
On the other hand, US courts have ruled that the output of AI models is not patentable since it does not come from a human source. This means the output is considered to be in the public domain. Wales believes the case filed by Silverman and others is unlikely to succeed and that the current law against copyrighting AI output will remain in effect. However, he acknowledges that global pressure to change legislation, particularly on the input side, may arise.
Wales expressed concern about the notion that copyright ownership somehow extends to the facts contained within a work. This has never been the case for copyright, and he sees a danger in such an interpretation. Entities like scientific publishers could potentially use this to limit access to scientific papers and push for increased sales. The influx of AI-generated data into the public domain may also have interesting copyright implications.
Regarding AI regulation, Wales believes that the European Union’s approach is misguided. He argues that the EU’s focus on regulating only a handful of big tech giants overlooks the progress made by open source models, which can often match those developed by larger companies. He views the EU AI act as overly prescriptive and predicts it will leave Europe behind, allowing the US to dominate the field.
In conclusion, while ChatGPT is not currently a threat to Wikipedia’s factual content, LLMs may offer assistance in the future. Copyright issues surrounding LLMs remain a topic of debate, and pressure to change legislation could arise. Wales also criticized the EU’s approach to AI regulation, highlighting its potential negative effects on European progress in the field.