When AI Scores 100% and Still Isn’t Enough for Patient Care

Artificial intelligence is advancing rapidly in medicine, and much of that progress is genuinely exciting.

Recently, a widely used AI medical search platform demonstrated 100% accuracy on USMLE-style licensing exam questions. That result made headlines, and understandably so. Standardized exams test knowledge recall and pattern recognition, areas where large language models can perform exceptionally well.

But medicine is not practiced on multiple-choice exams.

A newly published pilot study asked a more clinically relevant question: How does AI perform when faced with complex, real-world subspecialty scenarios, the kinds of cases physicians actually manage every day?

The answer was sobering. Using a dataset of 100 board-exam level subspecialty cases, researchers tested two versions of the AI tool:

A rapid standard search
A slower, more comprehensive Deep Consult mode

The results:

Standard search accuracy: 34%
Deep Consult accuracy: 41%

Even with additional processing time, more than half of the AI-generated responses were incorrect or incomplete.

Just as important, expert reviewers disagreed on the correctness of the AI’s answers in roughly one out of every four cases, highlighting another concern: inconsistent reasoning and limited reliability in complex clinical judgment.

What this tells us about AI in healthcare

AI excels at:

Rapid literature retrieval
Summarizing guidelines
Generating differential diagnoses
Supporting early clinical thinking

These are valuable capabilities, and they absolutely belong in modern medicine. But complex subspecialty care is different.

It requires:

Contextual judgment
Nuanced risk assessment
Pattern recognition informed by lived clinical experience
Accountability for downstream consequences

When accuracy falls to the 30% to 40% range, that is not decision support. That is clinical uncertainty.

And uncertainty is not something we can outsource when patients are involved.

Why TeleCurbMD is built differently

At TeleCurbMD, we believe technology should amplify physicians, not replace them.

Our platform was designed around a few core realities:

Specialist expertise is scarce
Traditional referrals introduce delay, not answers
Delays worsen outcomes and drive cost
Most specialty questions do not require an in-person visit
eConsults can deliver guidance without slowing care

But critically, we believe that the guidance itself must come from real, board-certified physicians.

Every TeleCurbMD consult is reviewed by a credentialed specialist who:

Practices clinically
Understands nuance and risk
Applies judgment, not just pattern matching
Is accountable for the quality of their recommendations

AI may help route cases, surface literature, or streamline workflows in the future. We are open to that evolution. What we will not do is ask clinicians, or patients, to trust probabilistic answers where expert judgment is required.

The future is collaborative, not binary

This is not an argument against AI. It is an argument for appropriate use.

The future of healthcare is not AI versus physicians. It is AI plus physicians, each doing what they do best

At TeleCurbMD, our role is clear:

Reduce specialty bottlenecks
Support primary care clinicians with timely, expert guidance
Keep care moving without sacrificing quality
Preserve the human judgment that medicine depends on

When patients need answers, 41% accuracy is not good enough.

They deserve better. And so do the clinicians caring for them.

Specialty expertise. Without the wait.
That is why TeleCurbMD exists

‍

Introduce specialty expertise earlier in care.

See how TeleCurbMD fits within your care model.

Request a Conversation