ChatGPT Is Little Help for Doctors in Diagnosing Diseases, Study Finds

The research, conducted with 50 physicians last year, found that using ChatGPT did not significantly improve doctors’ diagnostic reasoning.

The Washington Post

November 22, 2024

4 Min Read
a caduceus with a background that with the word AI
Alamy

Can an artificial intelligence chatbot help doctors better diagnose their patients?

Not really, according to new research.

The study, published last month in the journal JAMA Network Open, found that using ChatGPT, a chatbot created by OpenAI, did not significantly improve doctors’ diagnostic reasoning compared with doctors who used only traditional resources. The study also found that ChatGPT on its own performed better than either group of physicians.

The doctors that could use the software got a median score of 76 percent on making a diagnosis and explaining a reason for it, while the group that used only conventional resources had a median score of 74 percent. Run on its own, the software had a median score of roughly 90 percent.

The small study is yet another exploration of the potential for AI’s use in medicine. In recent years, hospitals across the country have been investing in AI tools, hoping to integrate them into their care and research - to the dismay of those who worry that the technology will soon start replacing human doctors.

But the study’s authors emphasized that the finding that ChatGPT on its own could diagnose better than physicians does not mean that AI should be used to make diagnoses without a doctor’s oversight. The significance of the research is limited by the fact that it was simulated - although based on real patient data - rather than performed in a clinical practice setting, said Ethan Goh, a postdoctoral fellow at the Stanford Clinical Excellence Research Center who was a co-first author of the study.

Related:AI in Healthcare Demands Vigilant Security Measures

“All the information was prepared in a way that doesn’t mimic real life,” he said.

OpenAI did not immediately respond to a request for comment on the study’s findings Monday evening.

Goh said he and the other researchers became interested in assessing whether ChatGPT could diagnose patients after learning that the software’s abilities had already been tested with multiple-choice questions, including the U.S. Medical Licensing Exam tests that medical students take. They wanted to design a different test for the software, one that was more open-ended.

“You don’t get a patient coming in and they’re like, ‘Hey doctor, A, B, C, D, which one do I have?’ or ‘How are you going to treat me?’ Goh said. “So that was the inspiration.”

The researchers expected to find that doctors who had the help of ChatGPT would perform better, Goh said. But the results surprised them.

“We were all shocked,” Goh said. “There’s a fundamental theorem that AI plus [humans] or computer plus humans should always do better than humans alone.”

Related:Healthcare and HIPAA: How to Avoid AI-Related Privacy Pitfalls

The 50 physicians - 26 attendings and 24 residents - who participated in the study last year were given six cases selected from a broader pool of 105 real cases that have been used in medical research for decades. The researchers noted that those cases have never been publicly released, meaning they could not have been included in ChatGPT’s training data.

The doctors were asked to come up with diagnoses in as many of the six cases as they could in an hour. At random, half of the physicians could use the chatbot alongside traditional resources like UpToDate, an online system with clinical information physicians can consult. None of the doctors were given explicit training on using ChatGPT to participate in the study.

The finding that the chatbot doesn’t significantly help doctors make diagnoses is notable because some health systems already offer chatbots for doctors to use in clinical settings, “often with no to minimal training on how to use these tools,” the researchers wrote in the study.

Goh said training, including an explicit curriculum for physicians on how to use AI and instructions on its pitfalls, could help doctors more effectively use chatbots to make diagnoses.

Beyond that, he said another reason the research group using ChatGPT might not have performed better is a bias doctors can have when making a diagnosis. Once physicians have formulated a diagnosis, they may be hesitant to change their minds on it, even in the face of new or conflicting information. That tendency may have prevented them from fully considering ChatGPT’s input while completing cases during the study, Goh said.

He added that those factors would need to be studied to know whether changing them would make a difference in diagnosing.

And, Goh said, after a diagnosis comes a new set of questions for physicians to answer - where they could also potentially use AI’s help in the future.

“What are the correct treatment steps to take?” he said. “What are the tests and such to order that would help you guide the patient towards what to do next?”

About the Author

The Washington Post

The latest technology news from The Washington Post.

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like