ChatGPT’s Performance Shows Signs of Decline, New Study Finds
ChatGPT, the renowned AI chatbot that gained popularity for its impressive conversational abilities, appears to be experiencing a decline in performance, according to a recent study conducted by researchers at Stanford and UC Berkeley. The study analyzed different versions of ChatGPT released between March and June 2022, assessing the bot’s competency in various tasks such as math, coding, and visual reasoning.
The results of the analysis painted a concerning picture of ChatGPT’s capabilities over time. In March, the AI bot demonstrated exceptional math-solving skills, correctly answering 488 out of 500 questions related to prime numbers, achieving an accuracy rate of 97.6%. However, by June, ChatGPT’s accuracy plummeted to a mere 2.4%, managing to solve only 12 out of the same set of questions.
The decline in performance was particularly significant in the bot’s coding abilities. The study revealed that GPT-4, the latest version of ChatGPT, experienced a substantial decrease in the percentage of directly executable generations, dropping from 52.0% in March to 10.0% in June. Notably, these results were obtained without using code interpreter plugins, reflecting a genuine decline in the bot’s coding proficiency.
Furthermore, when it came to reasoning tasks using visual prompts from the Abstract Reasoning Corpus (ARC) dataset, ChatGPT’s decline, while less steep, was still noticeable. Queries that the bot answered correctly in March were met with mistakes in its responses by June, indicating a regression in its reasoning abilities.
The study’s researchers hypothesized that OpenAI, the creator of ChatGPT, may have made optimizations to address the bot’s responses to dangerous questions, which could have unintentionally impacted its usefulness across other tasks. As a result, ChatGPT now tends to provide verbose and indirect responses instead of clear and concise answers, affecting its overall effectiveness.
AI expert Santiago Valderrama, commenting on the study’s findings, noted that ChatGPT’s performance appears to be deteriorating rather than improving. Valderrama even suggested the possibility that OpenAI might have replaced the original ChatGPT architecture with a combination of smaller and more specialized GPT-4 models to reduce costs. While this could potentially accelerate response times, it may come at the expense of the bot’s competency.
Other factors proposed by enthusiasts and fans, such as cost-cutting efforts, the introduction of warnings and disclaimers that could affect the bot’s output, and the lack of broader community feedback, may have contributed to ChatGPT’s decline.
Despite the need for more comprehensive testing, these findings align with users’ frustrations over a perceived decline in ChatGPT’s coherence and eloquence, which were its distinguishing features.
To prevent further deterioration, advocates have suggested utilizing open-source models like Meta’s LLaMA, which allow community debugging and continuous benchmarking to detect regressions early. Transparency, collaboration, and ongoing improvement efforts are crucial to maintaining and enhancing the quality of AI chatbots like ChatGPT.
As ChatGPT fans come to terms with its apparent decline, it is worth acknowledging that even AI celebrities are not immune to age-related decline. However, with diligent efforts to address the issues raised in the study, there is hope that ChatGPT can regain its former brilliance and continue to push the boundaries of AI-powered conversations.