OpenAI’s language model, ChatGPT, has been found to answer over half of software engineering questions incorrectly, according to a recent study conducted by researchers at Purdue University in the US. This revelation has raised concerns about the accuracy and reliability of the popular AI-powered model.
Despite its widespread usage, there hasn’t been a thorough investigation into the quality and usability of ChatGPT’s responses to software engineering queries. To address this gap, the research team undertook a comprehensive analysis of ChatGPT’s replies to 517 questions sourced from Stack Overflow (SO), a popular platform for software developers seeking solutions.
The study uncovered that approximately 52 percent of ChatGPT’s answers contained inaccuracies, while an even more significant 77 percent were excessively verbose. These findings shed light on the limitations of the popular language model, indicating that it struggled to grasp the concepts of the questions in 54 percent of cases.
Even when ChatGPT understood the questions, it often failed to demonstrate a deep understanding of problem-solving techniques, leading to a high number of conceptual errors. The researchers also noted that the AI model lacked reasoning abilities, frequently providing solutions, code, or formulas without considering potential outcomes.
While prompt engineering and human-in-the-loop fine-tuning have proven somewhat useful in helping ChatGPT understand problems to a certain extent, they fall short in addressing the limitations of reasoning. As a result, understanding the factors contributing to conceptual errors and rectifying reasoning-related issues becomes vital for enhancing ChatGPT’s performance.
The analysis further revealed the presence of other quality issues in ChatGPT’s responses, including verbosity, inconsistency, and a lack of negative sentiments. Nonetheless, users still preferred ChatGPT’s responses in 39.34 percent of cases due to its comprehensive and articulate language style.
The researchers stress the need for meticulous error correction while using ChatGPT and emphasize the importance of raising awareness among users regarding the potential risks associated with seemingly accurate answers.
This study highlights the necessity of continuous improvement in AI language models like ChatGPT. It underlines the crucial role of error rectification, enhanced reasoning capabilities, and user awareness in ensuring the reliability and accuracy of these models. As further advancements are made in this domain, addressing the identified limitations can contribute to the evolution of more trustworthy and efficient AI-powered solutions.
(Note: This article is based on a study conducted by researchers at Purdue University and does not reflect the views or opinions of OpenAI.)