In a recent study, researchers found that while artificial intelligence (AI) has the potential to be an effective tool in the medical field, there are still some limitations to consider. The study, led by Emmanouil S. Brilakis, MD, PhD, tested ChatGPT 4.0, an AI language model, against interventional cardiology fellows to see if it could pass a simulation test for the American College of Cardiology/American Board of Internal Medicine Collaborative Maintenance Pathway.
The results showed that ChatGPT 4.0 was able to pass a multiple-choice version of the exam, scoring 76.7%, which was lower than the fellows’ average score of 82.2%. However, when the AI was retested just 2 weeks later, its score dropped to 65%, raising concerns about its reliability.
Interestingly, ChatGPT struggled with questions that required viewing a video, as it lacked the ability to do so. When questions were reformatted to be multiple-choice only, the AI performed better but still had limitations.
Despite being able to provide explanations for its answers, ChatGPT’s underperformance compared to human fellows suggests that while it may be useful for certain tasks, it may not be the most reliable tool for clinical decision-making.
Overall, the study highlights the potential of AI in the medical field but also underscores the need for further research and validation to ensure accuracy and reliability in clinical practice.