New Study Shows ChatGPT Comparable to Humans in Providing Healthcare Answers
In a recent study conducted by researchers at New York University, the challenges faced by individuals in distinguishing between responses generated by OpenAI’s chatbot, ChatGPT, and those given by human healthcare providers have been brought to light.
The aim of the study was to explore the potential of chatbots in aiding patient-provider communication and evaluate the trustworthiness and reliability placed in their responses.
Participants in the study were presented with a series of patient questions and responses, with half of them being generated by ChatGPT, and the other half provided by human healthcare providers. The results of the study revealed some interesting findings:
Participants correctly identified the chatbot responses 65.5% of the time on average, while they correctly identified the provider responses 65.1% of the time. This indicates that participants found it challenging to differentiate between the two sources of information. The ability to identify the source varied depending on the specific question, suggesting that certain topics or complexities posed more difficulties in differentiation.
Participants generally expressed a moderate level of trust in chatbot responses, with an average score of 3.4 on a 5-point scale. Nevertheless, trust in chatbot responses was lower for tasks involving higher complexity in healthcare, such as diagnostic and treatment advice. Conversely, logistical questions and preventative care received higher trust ratings.
The findings of the study have significant implications for the utilization of chatbots in healthcare communications:
Chatbots have the potential to assist in patient-provider communication, particularly for administrative tasks and managing common chronic diseases. These areas often involve straightforward information and can greatly benefit from the efficiency and accessibility offered by chatbot interactions.
However, the study underscores the importance of exercising caution and critical judgment when relying on advice generated by chatbots for more complex clinical tasks. Diagnostic and treatment advice, in particular, should be approached with care, as chatbots may lack the same level of expertise and nuanced judgment as human healthcare providers.
The study highlights the limitations of current chatbot technology and the potential biases of AI models. Further research is necessary to refine and enhance chatbot capabilities, ensuring their reliability, accuracy, and suitability for different healthcare tasks. Ongoing evaluation and validation processes should be implemented to improve the performance and trustworthiness of chatbots in healthcare settings.
These findings serve as a reminder that while chatbots can be valuable tools in healthcare, they are not a substitute for human expertise and should be used judiciously. By combining the strengths of chatbots and human providers, we can effectively harness the power of technology to improve patient care and communication in the healthcare industry.