Study: ChatGPT Health missed emergency referrals

An independent study published in Nature Medicine found that ChatGPT Health, an AI-powered medical guidance tool reportedly used by 40 million people daily, failed to recommend emergency care in more than half of serious cases evaluated by physicians. Researchers warned that inconsistent triage decisions and suicide-crisis safeguards raise concerns about relying solely on artificial intelligence for urgent health decisions.
A new study in Nature Medicine has raised questions about the reliability of ChatGPT Health in high-risk medical situations. Researchers designed 60 standardized clinical scenarios across 21 specialties, ranging from minor ailments to life-threatening emergencies.
Three independent physicians assessed each case’s urgency based on guidelines from 56 medical societies, creating a benchmark for comparison.Each scenario was tested under 16 contextual variations, resulting in 960 simulated patient interactions with the AI tool. The research team then evaluated whether the system’s triage recommendations aligned with physician-determined standards of care.
Advertisement
Undertriage in critical cases
According to researchers at the Icahn School of Medicine at Mount Sinai, ChatGPT Health performed adequately in clear emergency presentations but undertriaged more than half of cases physicians considered to require immediate medical attention. In several instances, the system correctly described alarming symptoms in its explanation yet still reassured users instead of directing them to emergency services.
Advertisement
The study’s senior author, Girish N. Nadkarni, stated that the findings exceeded expectations regarding variability. “While we expected some variability, what we observed went beyond inconsistency,” he said, highlighting the potential risks of algorithmic decision-making in urgent care contexts.
Concerns over suicide safeguards
The researchers also examined the tool’s suicide-crisis protocols. Although ChatGPT Health is designed to guide high-risk individuals toward crisis resources such as the Suicide and Crisis Lifeline, alerts were triggered unevenly. In some lower-risk scenarios, warnings appeared unnecessarily, while in other cases involving explicit descriptions of self-harm planning, the system failed to activate appropriate safeguards.
Advertisement
Call for cautious use
Despite the shortcomings, the authors did not recommend abandoning AI-driven health tools altogether. Instead, they urged users to seek direct medical evaluation for worsening or concerning symptoms rather than relying exclusively on chatbot advice. Alvira Tyagi, a co-author of the study, emphasized the importance of training both clinicians and the public to critically assess AI outputs.
Advertisement
Isaac Kohane, chair of biomedical informatics at Harvard Medical School and not involved in the study, underscored the broader implications. “When millions of people are using an AI system to decide whether they need emergency care, the stakes are extraordinarily high,” he said, adding that independent evaluation of such systems should become standard practice.
Advertisement
Comments you share on our site are a valuable resource for other users. Please be respectful of different opinions and other users. Avoid using rude, aggressive, derogatory, or discriminatory language.