New research has suggested that including evidence in health-related questions can confuse AI-powered chatbots like ChatGPT, leading to a decrease in accuracy. Scientists are unsure of the exact reason behind this phenomenon but hypothesize that the evidence adds too much noise, affecting the chatbot’s ability to provide accurate responses.
Large language models such as ChatGPT have gained immense popularity, posing a potential risk as more individuals rely on online tools for essential health information. These models, trained on massive amounts of textual data, can generate content in natural language.
A study conducted by researchers from CSIRO and The University of Queensland, Australia, explored how presenting evidence in health-related questions impacted ChatGPT’s accuracy. They found that when evidence was included in the question, the chatbot’s accuracy dropped from 80% to 63%.
While the exact reason for this drop in accuracy remains unclear, researchers emphasize the need for continued research on the use of language models to answer health-related queries. Understanding the effectiveness of these models is crucial as more people turn to online tools like ChatGPT for information.
The study, presented at the Empirical Methods in Natural Language Processing (EMNLP) conference in December 2023, highlights the importance of informing the public about the potential risks associated with relying on AI-powered chatbots for health information. Further research is essential to optimize the accuracy of responses provided by these language models.