A recent study published in the Radiological Society of North America’s journal has shown that ChatGPT, an AI chatbot developed for language interpretation and response generation, has successfully passed a radiology board-style exam. This demonstrates both the potential and the limitations of the current technology.
Lead author Rajesh Bhayana, MD, FRCPC, an abdominal radiologist and technology lead at University Medical Imaging in Toronto, Canada, said, “The use of large language models like ChatGPT is exploding and only going to increase. Our research provides insight into ChatGPT’s performance in a radiology context, highlighting the incredible potential of large language models, along with the current limitations that make it unreliable.”
ChatGPT was tested using 150 multiple-choice questions designed to match the style, content, and difficulty of the Canadian Royal College and American Board of Radiology exams. It was found that GPT-3.5 model had an accuracy of 69%, which just passed the threshold of 70% set by the Royal College. Further, the model performed relatively well on questions requiring lower-order thinking (84%), but struggled with higher-order thinking questions (60%).
In a follow-up study, GPT-4 model, the improved version of GPT-3.5 was tested, and it showed better accuracy of 81%, outperforming GPT-3.5 and scoring above the passing threshold of 70%. It was also observed that GPT-4 performed much better than GPT-3.5 on higher-order thinking questions (e.g. questions involving description of imaging findings and application of concepts).
The researchers found that, although considerably improved, ChatGPT can still generate incorrect responses, which they termed as “hallucinations”. This is particularly dangerous if solely relied on for information. Hence, the researchers cautioned that ChatGPT should not be used for quick information recall and should only be used to spark ideas.
The company ChatGPT is owned by OpenAI, a research laboratory based in California and was recently declared as the fastest growing consumer application in history. It is increasingly being incorporated into popular search engines used to search for medical information.
Rajesh Bhayana is an abdominal radiologist and technology lead at University Medical Imaging Toronto, Toronto General Hospital in Toronto, Canada. He is the lead author of the two studies published in Radiology, a journal of the Radiological Society of North America (RSNA). He is skilled in medical imaging informatics, healthcare innovation, medical education, and research.