In an unprecedented study, researchers have found that an Artificial Intelligence chatbot, named ChatGPT, outperformed human candidates in a mock Obstetrics and Gynecology specialist clinical examination. The six-month study saw the chatbot score higher, on average, than the human candidates in empathetic communication, information gathering, and clinical reasoning. The study team selected seven stations – in the form of objective structured clinical examinations (OSCEs) – that had been run in the two previous years, and tested the candidates’ abilities to complete each station within ten minutes.
ChatGPT achieved a higher average score, of 77.2%, compared to the human candidates’ average of 73.7%. Additionally, ChatGPT completed each station in an average of two minutes and 54 seconds, far ahead of the standard 10 minutes given. However, the chatbot did not outperform all human candidates in each session. To mitigate bias, all three candidates’ responses were submitted to the examination panel, while concealing the true identity of ChatGPT.
The study’s team, led by Associate Professor Mahesh Choolani, found that ChatGPT scored particularly well in empathetic communication, generating factually accurate and contextually relevant responses that would take an average intelligence person more than a decade of clinical training to understand. However, there were limitations to ChatGPT’s performance, including a lack of knowledge about Singaporean ethnicities and vocabulary, as well as the inability to handle questions with ambiguous scenarios.
Despite these limitations, the study’s team still recommends ChatGPT as a viable resource in guiding medical education. With the instant accessibility of accurate knowledge and information in the future, the need for future generations of medical doctors to clearly demonstrate the value and importance of the human touch is now saliently obvious.