An exciting study recently found that Auckland University students outperformed ChatGPT (a bot using Artificial Intelligence) in a large-scale evaluation of over 25,000 exam questions from 186 different institutions. The study was a collaborative effort across the world, and it included University of Auckland accounting and finance lecturers Ruth Dimes and David Hay.
Ruth Dimes, who runs the business master’s programme at the university, used two recent exams from the analysing financial statements course to enter questions into ChatGPT-3 and rate the accuracy of its responses. Surprisingly, ChatGPT did not perform as well as she had anticipated and her findings were consistent with the primary results. Similarly, Professor of Auditing David Hay tested ChatGPT on exam and test questions from his auditing course and found that, while it managed to perform better in this area than in financial accounting, it still was not as successful as the students.
The total amount of questions posed to ChatGPT-3 amounted to 25,817, meant to test accounting information systems, auditing, financial accounting, managerial accounting and tax. When evaluating the accuracy of its answers, the authors found that the bot scored an average of 47.4 percent on the account of fully correct answers. Nevertheless, when taking into account partially correct responses, those results rose to 56.5 percent, which is shy of the students’ 76.7 percent. Interestingly, ChatGPT was more successful in AIS and auditing assignments than it was in tax, financial and managerial ones.
The paper, spearheaded by Professor David Wood of Brigham Young University in Utah, is a testament to the progress of AI technology, despite its current flaws. Indeed, it was found that the ChatGPT could make up facts and provide erroneous explanations, as well as make ridiculous arithmetic errors, such as adding two numbers in a subtraction problem.
According to the study, it would be beneficial to test newer versions of ChatGPT and other AI tools in a similar fashion in order to measure and track the evolution of such tools.