Limits of AI Highlighted as ChatGPT Struggles with Gastro Exam


In a reminder of the limits of artificial intelligence (AI), the OpenAI’s ChatGPT system has failed to pass a practice test created by the American College of Gastroenterology (ACG). When tested on questions from the ACG’s 2021 and 2022 multiple-choice assessment, both the GPT-3.5 and GPT-4 versions of the AI chatbot failed to reach the 70% passing grade.

The tests were conducted by Arvind Trindade, MD, of Northwell Health’s Feinstein Institutes for Medical Research in Manhasset, New York, and his colleagues. Questions from the assessment were copied and pasted directly into ChatGPT, which then generated a response and explanation. From these, the authors selected the correspond answer.

The GPT-3.5 and GPT-4 versions scored a 65.1% (296 of 455 questions) and a 62.4% (284 of 455 questions), respectively. The scores were below the required 70% grade to pass the exam. Shockingly, the scores were lower than expected, prompting the authors of the study to call for a higher standard to be set.

Currently, there have been recent papers showing ChatGPT passing other medical assessments. But, Dr. Trindade argued that it doesn’t mean it’s ready for clinical use. He commented that medical professionals should think about how to optimize this technology rather than relying on it for clinical use. He also noted that the medical community should have much higher standards than, for example, a 95% accuracy threshold.

Google researchers have developed their own medically trained AI model, Med-PaLM, which achieved 67.6% accuracy and surpassed the common threshold for passing scores. An updated version of this model, known as Med-PaLM 2, even achieved an 85% accuracy and performed at “expert” physician levels.

See also  Advancing Workplace DEI: 3 Ways ChatGPT Can Help

AI chatbots such as ChatGPT have also been found to beat physicians in answering patient-generated questions. During a blind evaluation, the AI chatbot’s responses were preferred over real-physician answers 75% of the time.

While this research into AI medical credentialing tests has shown tremendous progress, it is also a reminder that AI is far from providing accurate, reliable advice. Medical professionals should consider all available forms of information when making decisions and should always prioritize human expertise over artificial intelligence.

Frequently Asked Questions (FAQs) Related to the Above News

Please note that the FAQs provided on this page are based on the news article published. While we strive to provide accurate and up-to-date information, it is always recommended to consult relevant authorities or professionals before making any decisions or taking action based on the FAQs or the news article.

Share post:



More like this

Google Introduces Ads in AI-Generated Search Summaries To Compete with ChatGPT

Google introduces ads in AI-generated search summaries to compete with ChatGPT, enhancing user experience and revenue potential.

Cybercriminals Embrace AI for Efficient Global Operations

Discover how cybercriminals are utilizing AI to enhance their global operations, raising concerns about misuse and performance issues.

European Countries Approve Landmark AI Act for Regulation and Innovation

EU countries endorse AI Act, paving way for innovation & regulation. A groundbreaking milestone for global AI governance.

AI Threat: Will Artificial Intelligence Lead to Humanity’s Extinction?

Discover the potential threats posed by AI and the necessity of preparing for job displacement and inequality in the wake of artificial intelligence development.