Researchers at the University of Medical Imaging Toronto recently conducted a study to evaluate the capability of ChatGPT, a conversational chatbot powered by AI, in responding accurately to questions featured on radiology exams. The research published in the Radiology journal revealed ChatGPT’s strength in comprehending and responding to questions of both lower and higher complexity.
ChatGPT was not only able to correctly answer 150 out of a total of 150 questions on the radiology exam, but it was also able to distinguish between lower-order questions that test knowledge recall and basic understanding, and higher-order questions that require one to analyze and apply information. The chatbot answered the easier questions with astounding accuracy, but struggled a bit with the harder ones.
After the release of GPT-4, a new version of the chatbot’s AI, researchers tested it against the same questions. The improved version, powered by GPT-4, was able to successfully answer 121 out of 150 questions during the retest. This time, its success rate while responding to higher-order questions were even greater, obtaining an accuracy of 81%.
Although ChatGPT seems to be improving with each update and showing highly promising results, a radiologist from the Toronto General Hospital, Rajesh Bhayana, still found it to provide some illogical or inaccurate answers. This raises questions for medical professionals to ensure the safety of patients.
Fortunately, advancements in language models, like ChatGPT, can help address the potential risks that come along with applications powered by AI. With the ever-evolving technology, one can look forward to more reliable and useful chatbots that can be trusted in various domains.
The company behind the chatbot is OpenAI, a non-profit artificial intelligence research laboratory that works with a mission to promote more equitable outcomes by refusing to participate in military, surveillance, or censorship applications of AI. They are one of the world’s leading organisations in language-technology and specialize in a range of topics from natural language processing to deep learning.
Rajesh Bhayana is a radiologist and technology lead at Toronto General Hospital. As radiologist, he is responsible for reading and interpreting imaging scans and providing treatment recommendations. He also serves as a tech-leader, guiding colleagues in matters related to technology and machine learning. Rajesh also conducted the study that recently evaluated ChatGPT and came to the conclusion that with more advancements in language models, it is possible to release more efficient and reliable chatbots.