ChatGPT-4 Falls Short in Pediatric Diagnoses, Highlighting the Role of Human Pediatricians – Study
The latest version of OpenAI’s large language model, ChatGPT-4, has been found to struggle with diagnosing pediatric medical cases, according to a study published in JAMA Pediatrics. The study revealed that ChatGPT-4 achieved an accuracy rate of just 17 percent in diagnosing pediatric cases, compared to its already underwhelming accuracy rate of 39 percent in diagnosing general medical cases in a previous analysis. The findings suggest that human pediatricians will continue to play a crucial role in healthcare, as clinical experience remains invaluable.
The study, conducted by researchers at Cohen Children’s Medical Center in New York, highlighted the unique challenges involved in pediatric diagnoses. Unlike general cases, pediatric cases require more consideration of the patient’s age, and diagnosing conditions in infants and young children becomes especially difficult when they cannot clearly express their symptoms.
To assess ChatGPT-4’s performance, the researchers presented the AI bot with 100 published pediatric case challenges from JAMA Pediatrics and NEJM. They found that ChatGPT-4 provided the correct diagnosis in only 17 of the cases, while it was plainly wrong in 72 cases and failed to fully capture the diagnosis in 11 cases. Among the incorrect diagnoses, 57 percent were related to the same organ system, indicating a struggle in recognizing known relationships between conditions.
One example of ChatGPT-4’s missed connections involved its failure to link autism with scurvy (Vitamin C deficiency) in a medical case. Neuropsychiatric conditions, such as autism, can lead to restricted diets and subsequently result in vitamin deficiencies. Recognizing this relationship is crucial for clinicians. However, ChatGPT-4 incorrectly diagnosed the case as a rare autoimmune condition instead.
Despite its shortcomings, the researchers suggest that the accuracy of AI chatbots like ChatGPT-4 could be improved through specific and selective training using reliable medical literature rather than relying on inaccurate information from the internet. They also propose that real-time access to medical data could enhance the models’ accuracy by allowing for continuous refinement through tuning.
The medical field has been at the forefront of adopting AI-powered technologies, showcasing successes in automating administrative tasks and aiding in the interpretation of medical images. However, the limitations of AI in accurately diagnosing complex pediatric cases demonstrate the irreplaceable role of human pediatricians, as clinical experience remains invaluable in considering various factors such as age and patient interaction.
While AI chatbots have the potential to become valuable tools in clinical care, particularly in diagnostics, there is still much progress to be made. By addressing the weaknesses identified in ChatGPT-4’s performance, future iterations and developments can aim to bridge the gap between AI-powered tools and the expertise of human healthcare professionals.
In conclusion, the study emphasizes the crucial role of human pediatricians in diagnosing pediatric medical cases. While AI chatbots like ChatGPT-4 may fall short in accurate diagnoses, advancements in training and real-time access to medical data hold promise for enhancing their diagnostic capabilities in the future. The integration of AI into clinical care is viewed as inevitable by many healthcare professionals, but human expertise and clinical experience remain indispensable in providing comprehensive and accurate care to pediatric patients.