OpenAI‘s ChatGPT, an AI-powered chatbot, recently failed a gastroenterology self-assessment test conducted by the American College of Gastroenterology (ACG), as per a study published in American Journal of Gastroenterology. The study discovered that ChatGPT‘s GPT-3.5 and GPT-4 versions scored 65.1% and 62.4%, which is below the passing grade of 70%. ChatGPT‘s inability to meet the criteria serves as a reminder of the limits of AI.
The researchers found the passing benchmark for the ACG’s practice test to be surprisingly low, emphasizing the need to improve AI chatbots‘ accuracy in medical settings. Dr. Trindade, who conducted the study, believes that AI chatbots should have a higher accuracy threshold of 95% or higher in the medical field. During the assessment, the researchers fed each question into ChatGPT and examined the generated response and explanation to evaluate its performance.
Dr. Trindade acknowledges that AI technology is growing rapidly in the medical field and optimizing these tools for clinical use is crucial. He stresses that while AI models like Google’s Med-PaLM have demonstrated success in passing medical exams, technology like ChatGPT‘s performance in the gastroenterology assessment highlights the limitations of AI models without specific medical knowledge and training.
The study helps evaluate AI models’ potential as a medical tool, indicating that AI models may not be used as perfect tools for clinical use, especially those without specialized medical information and training. Although the convenience of obtaining quick answers from AI platforms may seem appealing, studies like this should remind us to establish appropriate expectations for the use of AI chatbots in the medical field.