Limits of AI Highlighted as ChatGPT Struggles with Gastro Exam

Date:

In a reminder of the limits of artificial intelligence (AI), the OpenAI’s ChatGPT system has failed to pass a practice test created by the American College of Gastroenterology (ACG). When tested on questions from the ACG’s 2021 and 2022 multiple-choice assessment, both the GPT-3.5 and GPT-4 versions of the AI chatbot failed to reach the 70% passing grade.

The tests were conducted by Arvind Trindade, MD, of Northwell Health’s Feinstein Institutes for Medical Research in Manhasset, New York, and his colleagues. Questions from the assessment were copied and pasted directly into ChatGPT, which then generated a response and explanation. From these, the authors selected the correspond answer.

The GPT-3.5 and GPT-4 versions scored a 65.1% (296 of 455 questions) and a 62.4% (284 of 455 questions), respectively. The scores were below the required 70% grade to pass the exam. Shockingly, the scores were lower than expected, prompting the authors of the study to call for a higher standard to be set.

Currently, there have been recent papers showing ChatGPT passing other medical assessments. But, Dr. Trindade argued that it doesn’t mean it’s ready for clinical use. He commented that medical professionals should think about how to optimize this technology rather than relying on it for clinical use. He also noted that the medical community should have much higher standards than, for example, a 95% accuracy threshold.

Google researchers have developed their own medically trained AI model, Med-PaLM, which achieved 67.6% accuracy and surpassed the common threshold for passing scores. An updated version of this model, known as Med-PaLM 2, even achieved an 85% accuracy and performed at “expert” physician levels.

See also  OpenAI Files Motion to Dismiss NY Times Copyright Lawsuit

AI chatbots such as ChatGPT have also been found to beat physicians in answering patient-generated questions. During a blind evaluation, the AI chatbot’s responses were preferred over real-physician answers 75% of the time.

While this research into AI medical credentialing tests has shown tremendous progress, it is also a reminder that AI is far from providing accurate, reliable advice. Medical professionals should consider all available forms of information when making decisions and should always prioritize human expertise over artificial intelligence.

Frequently Asked Questions (FAQs) Related to the Above News

Please note that the FAQs provided on this page are based on the news article published. While we strive to provide accurate and up-to-date information, it is always recommended to consult relevant authorities or professionals before making any decisions or taking action based on the FAQs or the news article.

Share post:

Subscribe

Popular

More like this
Related

Samsung’s Foldable Phones: The Future of Smartphone Screens

Discover how Samsung's Galaxy Z Fold 6 is leading the way with innovative software & dual-screen design for the future of smartphones.

Unlocking Franchise Success: Leveraging Cognitive Biases in Sales

Unlock franchise success by leveraging cognitive biases in sales. Use psychology to craft compelling narratives and drive successful deals.

Wiz Walks Away from $23B Google Deal, Pursues IPO Instead

Wiz Walks away from $23B Google Deal in favor of pursuing IPO. Investors gear up for trading with updates on market performance and key developments.

Southern Punjab Secretariat Leads Pakistan in AI Adoption, Prominent Figures Attend Demo

Experience how South Punjab Secretariat leads Pakistan in AI adoption with a demo attended by prominent figures. Learn about their groundbreaking initiative.