Ministry Of AIMinistry Of AI
  • Home
  • Courses
  • About
  • Blog
  • Login
  • Register
Back
  • Home
  • Courses
  • About
  • Blog
  • Login
  • Register
  • Home
  • Blog
  • Blog
  • Can AI Doctors Be Trusted? How Reliable Are Large Language Models in Medical Diagnoses?

Blog

17 Mar

Can AI Doctors Be Trusted? How Reliable Are Large Language Models in Medical Diagnoses?

  • By Stephen Smith
  • In Blog
  • 0 comment

Can AI Doctors Be Trusted? How Reliable Are Large Language Models in Medical Diagnoses?

Artificial intelligence (AI) is making its way into every aspect of our lives, from recommending what to watch on Netflix to composing emails. But one area where AI could have an especially profound impact is healthcare. Imagine an AI-powered assistant that can help diagnose medical conditions accurately and instantly—potentially leveling the playing field for people with limited access to healthcare. Sounds like a game-changer, right?

The reality, however, is a bit more complicated. A new study by Krishna Subedi titled “The Reliability of LLMs for Medical Diagnosis: An Examination of Consistency, Manipulation, and Contextual Awareness” dives deep into whether AI models like ChatGPT and Google Gemini can be trusted as diagnostic tools. The findings are both exciting and alarming.

Let’s break it down in simple terms and explore what this means for the future of AI in healthcare.


The Big Question: Can AI Diagnose Patients as Well as a Doctor?

AI chatbots and Large Language Models (LLMs) have shown impressive results in answering medical queries and even generating preliminary diagnoses. But having a big medical knowledge base isn’t enough—reliability is crucial. An AI that gives inconsistent answers, changes its diagnosis based on irrelevant details, or fails to understand vital patient history could do more harm than good.

This study evaluates ChatGPT (GPT-4o) and Google Gemini 2.0 Flash, focusing on three critical aspects:

  1. Consistency: When given the same medical case multiple times, does the AI reach the same diagnosis every time?
  2. Resistance to Manipulation: Can irrelevant or misleading information change the AI’s diagnosis?
  3. Context Awareness: Does the AI properly consider a patient’s medical history, lifestyle, and other relevant factors in its diagnosis?

1. AI is Consistent… But That Doesn’t Mean It’s Always Right

One of the most impressive findings of the study was that both ChatGPT and Google Gemini showed 100% consistency when given the same clinical information. That means if you ask these models to diagnose the same patient case multiple times, they won’t randomly change their answer.

Sounds like a win, right? Not so fast.

🔹 AI’s consistency does not necessarily mean it is correct—an AI can consistently give the wrong diagnosis every time. If an error exists in the way it processes information, that mistake will be repeated flawlessly.

🔹 Doctors, on the other hand, adjust their reasoning as they consult more information and reanalyze cases. They don’t just provide the same answer robotically—they critically assess whether they’re missing something.

So while AI being consistent is a good sign, it doesn’t alone solve the trust issue in medical scenarios.


2. AI Can Be Fooled: The Problem of Manipulation ‍💻

Here’s where things get a little concerning. The study found that both ChatGPT and Gemini could be manipulated into changing their diagnoses by adding irrelevant or misleading information to a patient’s prompt.

🔹 Google Gemini changed its diagnosis 40% of the time
🔹 ChatGPT changed its diagnosis 30% of the time

What does that mean in practice? Imagine an AI diagnosing a heart attack patient but suddenly shifting to an entirely different condition—just because a patient mentioned they drink herbal tea every day.

🚨 Doctors don’t get easily fooled like this! They are trained to filter out unrelated distractions and focus on relevant clinical information. AI, on the other hand, treats all text it receives as potentially important, no matter how misleading.

💡 Why does this happen? LLMs work by finding statistical patterns in text rather than truly “thinking” like a doctor. If a certain phrase appears often in relation to a diagnosis in its training data, it may over-prioritize that phrase even when it shouldn’t!


3. Struggles With Context: Can AI Consider Patient History?

The ability to understand context is what separates a good doctor from a bad one. A diagnosis isn’t just based on symptoms—it’s shaped by medical history, lifestyle, demographics, and even social factors.

This study tested how well AI integrated relevant contextual information. ChatGPT was more likely to change diagnoses based on context than Gemini, but this wasn’t always a good thing.

🔹 ChatGPT changed its diagnosis in 78% of context-rich cases
🔹 Gemini changed its diagnosis in 55% of cases

While being responsive to context sounds promising, the problem is that ChatGPT sometimes made incorrect changes. It overreacted to minor bits of context, rather than focusing on strong clinical reasoning.

🚨 A real doctor adjusts their diagnosis based on context in a rational way—not just modifying answers more frequently for the sake of variation.

A dramatic example from the study: – A patient with asthma history but experiencing a lung infection should be diagnosed with bronchitis. – ChatGPT wrongly changed the diagnosis to an asthma attack, just because of the past asthma history.

It’s like saying “once a sprained ankle, always a sprained ankle”—instead of properly evaluating the current symptoms.


The Fragility of AI in Medicine

The study identified three key weaknesses in AI medical diagnosis:

  1. Inflexible Consistency: AI doesn’t reconsider its answers like human doctors do—it just repeats its past response.
  2. Manipulation Vulnerability: AI can be tricked by misleading or irrelevant information, making it unreliable in real patient settings.
  3. Weak Context Awareness: AI sometimes overcorrects based on context, leading to inappropriate changes in diagnosis.

These weaknesses prove that LLMs should not be used as independent decision-makers in healthcare—at least not with today’s technology.


The Big Picture: What’s the Future of AI in Medicine?

AI won’t be replacing doctors anytime soon, but that doesn’t mean it can’t play a valuable role. Instead of making final diagnoses, AI can be used to:

✔️ Support doctors in decision-making – Acting as a second opinion or a quick reference.
✔️ Improve healthcare accessibility – Offering initial guidance in regions with fewer doctors.
✔️ Handle routine medical queries – Answering basic health-related questions for patients.

However, the study makes it clear that AI should never operate without human oversight—at least not until it’s significantly improved.

💡 A balanced future involves AI assisting doctors, not replacing them.


🚀 Key Takeaways

✔️ AI is consistent, but consistency doesn’t mean accuracy. A model can be reliably wrong.
✔️ AI can be tricked! Irrelevant or misleading information can disrupt diagnoses.
✔️ AI struggles with complex patient history. Unlike human doctors, it sometimes makes irrational diagnostic shifts.
✔️ ChatGPT was more responsive to context, but also made more incorrect changes.
✔️ LLMs should be used as a tool, not a replacement for medical professionals.


💡 Want to Improve How You Use AI for Medical Questions?

If you’re using ChatGPT or Google Gemini for health-related insights, here are some tips:

🔹 Be precise in your question – Avoid unnecessary details that could confuse the model.
🔹 Ask for different angles – Get AI to explain multiple conditions that match the symptoms.
🔹 Double-check with trusted medical sources like Mayo Clinic or WebMD.
🔹 Never rely on AI alone for a serious medical issue! Always consult a doctor.


Final Thought: AI Medicine Still Needs a Human Touch 🩺

The study reminds us that while AI is impressive, it still lacks critical thinking, situational awareness, and judgment—qualities that human doctors have spent years perfecting. AI can be a useful assistant, but it’s not yet ready to be the doctor.

Would you trust an AI to diagnose you? 🏥 Let’s discuss in the comments! 👇

If you are looking to improve your prompting skills and haven’t already, check out our free Advanced Prompt Engineering course.

This blog post is based on the research article “The Reliability of LLMs for Medical Diagnosis: An Examination of Consistency, Manipulation, and Contextual Awareness” by Authors: Krishna Subedi. You can find the original article here.

  • Share:
Stephen Smith
Stephen is an AI fanatic, entrepreneur, and educator, with a diverse background spanning recruitment, financial services, data analysis, and holistic digital marketing. His fervent interest in artificial intelligence fuels his ability to transform complex data into actionable insights, positioning him at the forefront of AI-driven innovation. Stephen’s recent journey has been marked by a relentless pursuit of knowledge in the ever-evolving field of AI. This dedication allows him to stay ahead of industry trends and technological advancements, creating a unique blend of analytical acumen and innovative thinking which is embedded within all of his meticulously designed AI courses. He is the creator of The Prompt Index and a highly successful newsletter with a 10,000-strong subscriber base, including staff from major tech firms like Google and Facebook. Stephen’s contributions continue to make a significant impact on the AI community.

You may also like

Unlocking Software Development: How ChatGPT is Transforming the Game for Developers

  • 8 May 2025
  • by Stephen Smith
  • in Blog
Unlocking Software Development: How ChatGPT is Transforming the Game for Developers In the bustling realm of software development, a...
Navigating Science with AI: How Middle Schoolers Tackle ChatGPT for Effective Questioning
7 May 2025
Tailored Tutoring: How AI is Changing the Game in Personalized Learning
7 May 2025
How AI is Shaping Online Conversations: The Rise of Emotion and Structure in Tweets
6 May 2025

Leave A Reply Cancel reply

You must be logged in to post a comment.

Categories

  • Blog

Recent Posts

Unlocking Software Development: How ChatGPT is Transforming the Game for Developers
08May,2025
Navigating Science with AI: How Middle Schoolers Tackle ChatGPT for Effective Questioning
07May,2025
Tailored Tutoring: How AI is Changing the Game in Personalized Learning
07May,2025

Ministry of AI

  • Contact Us
  • stephen@theministryofai.org
  • Frequently Asked Questions

AI Jobs

  • Search AI Jobs

Courses

  • All Courses
  • ChatGPT Courses
  • Generative AI Courses
  • Prompt Engineering Courses
  • Poe Courses
  • Midjourney Courses
  • Claude Courses
  • AI Audio Generation Courses
  • AI Tools Courses
  • AI In Business Courses
  • AI Blog Creation
  • Open Source Courses
  • Free AI Courses

Copyright 2024 The Ministry of AI. All rights reserved