Artificial intelligence (AI) has been making its way into the medical industry recently, with systems such as ChatGPT being touted as potential solutions for better healthcare. However, a new study published in Urology Practice found that ChatGPT performed poorly on a major specialty self-assessment tool, achieving less than a 30% rate of correct answers. Furthermore, the chatbot made certain errors that pose a risk of spreading medical misinformation.
Christopher M. Deibert, MD, MPH, and colleagues from the University of Nebraska Medical Center evaluated ChatGPT’s performance on the Self-Assessment Study Program for Urology, a 150-question practice examination addressing the core curriculum of medical knowledge in urology. ChatGPT provided less than 30% correct answers to multiple-choice and open-ended questions. For most open-ended questions, the explanations provided by ChatGPT were longer but frequently redundant and cyclical in nature. Even when given feedback, ChatGPT continuously repeated the original incorrect explanation.
While AI tools are being developed as a potential solution for healthcare, further research is necessary to understand their limitations and capabilities across multiple disciplines and before making them generally available for use. Utilization of currently developed AI tools carries a high risk of facilitating medical misinformation for untrained users. Dr. Deibert and colleagues emphasize that AI systems should be thoroughly tested before application in clinical settings.