The medical community has raised concerns about the viability of a “curbside consult” wherein doctors would rely on artificial intelligence for real medical advice. Recent findings from Stanford’s Human-Centered Artificial Intelligence (HAI) group have done nothing to assuage those concerns. When 64 clinical scenarios were presented to the ChatGPT software, it agreed with the known responders only 41% of the time – significantly lower than even the most conservative estimates of how often human doctors make diagnostic errors.
However, the Stanford team’s findings also show reason to be cautiously optimistic. Prior to ChatGPT, OpenAI had released GPT-3.5 to Microsoft, which is capable of some impressive feats. When prompted to “act as an AI doctor,” GPT- 3.5 agreed with the correct answer only 21% of the time. The Stanford study found that ChatGPT improved on this number significantly, and even determined that 91-93% of GPT-3.5 and GPT-4 responses were deemed safe, with the remaining 7-9% triggering AI “hallucinations.”
Atropos Health, a company co-founded by Stanford researchers Nigam Shah, Saurabh Gombar, and Brigham Hyde provides evidence to clinicians on demand and has aided the research on ChatGPT. Google has also thrown its hat into the ring with its Med-Palm 2 generative AI tool, which will be tested by Google’s cloud computing customers over the next several months.
The general conclusion, shared by the Stanford study, the New England Journal of Medicine, and the Harvard computer scientist and physician, is that there is much promise in GPT technology — it carried out the test, after all. However, as of right now, caution is necessary and ChatGPT isn’t yet an intelligent choice. It’s an important step, however, and its continuing development is a key point to watch on the medical front.