New Study Raises Concerns About Declining Performance of AI Language Model ChatGPT
ChatGPT, one of the widely used large language models (LLMs) developed by OpenAI, has come under scrutiny in a recent study conducted by researchers from Stanford University and UC Berkeley. The study suggests that the performance of ChatGPT has significantly worsened over time, raising concerns about the capabilities and reliability of AI language models.
The researchers compared the performance of two versions of ChatGPT, GPT-3.5 and GPT4, over a period from March to June 2023. They evaluated the models’ ability to solve math problems, answer sensitive questions, generate code, and perform visual reasoning tasks. The findings revealed a decline in performance, particularly in solving math problems.
In March, GPT-3.5 exhibited an accuracy rate of 7.4% in solving math problems, which increased to 86.8% in June. However, GPT-4’s accuracy dropped dramatically from 97.6% in March to a mere 2.4% in June. Additionally, the models’ responses to sensitive questions underwent a noticeable change. In March, both versions provided more detailed explanations, but by June, they simply responded with, sorry, but I can’t assist with that.
The study’s authors did not speculate on the reasons behind the decline in performance, but other researchers fear a phenomenon called model collapse. This occurs when newer language models are trained on data generated by previous models, potentially resulting in the models forgetting information or making more errors over time.
Ilia Shumailov, the lead author of another study from the University of Oxford, compares this process to repeatedly printing and scanning the same picture. Each iteration may introduce more noise, making it increasingly challenging to discern any meaningful information. Shumailov suggests that employing human-generated data for training and modifying learning procedures could help alleviate this issue.
OpenAI, the creator of ChatGPT, has refuted claims that their newer versions are becoming less capable. They maintain that each new iteration is intended to be smarter than its predecessor. However, some users have noticed performance issues, leading to speculation about intentional manipulation to encourage subscriptions to their premium offering, GPT Plus.
The ongoing debate surrounding the impact of AI on society continues, with contrasting views on whether AI is a boon or a bane. As AI language models like ChatGPT evolve, it remains crucial to address concerns about declining performance and potential biases ingrained within these models. Finding balanced solutions that involve human-generated data and improved learning procedures could be essential in ensuring the responsible development of AI technology.