Can AI Fact-Check Political Claims? A Deep Dive into the Abilities and Limits of Generative AI

Introduction

Misinformation is everywhere, especially in the political arena. From misleading social media posts to outright fabrications, false claims can influence public opinion, shape policy decisions, and even sway elections. Fact-checkers work hard to combat this, but their job is time-consuming and resource-intensive.

Enter generative AI, specifically large language models (LLMs) like ChatGPT-4, Claude 3.5 Sonnet, and Google Gemini. These powerful AI systems process text, summarize information, and even attempt to verify claims. But can they actually replace or assist human fact-checkers effectively?

A new study by Kuznetsova et al. systematically tested the fact-checking abilities of five popular LLMs. The results were mixed—LLMs show promise but also have significant limitations. Let’s break down their findings and see what this means for the future of political fact-checking.

How the Study Tested AI’s Fact-Checking Skills

To test whether AI can reliably verify political statements, the researchers examined five major LLMs:

ChatGPT-4 (by OpenAI)
Llama 3 (70B and 405B parameters, by Meta)
Claude 3.5 Sonnet (by Anthropic)
Google Gemini

The Dataset

The study used 16,513 political statements, all previously fact-checked by professional journalists through organizations like PolitiFact and Snopes. These statements were labeled as True, False, or Mixed (partially accurate).

Each LLM was given the same political claims and prompted to classify them into one of these categories. The study then compared their results to human fact-checkers to see how well they performed.

The Good, The Bad, and the Unexpected

So, how did the AI models do? Here’s a breakdown:

1️⃣ AI is pretty good at spotting false claims

One of the more promising findings was that LLMs were better at identifying false statements than at recognizing true ones. This was especially true for sensitive topics like:

COVID-19 misinformation
U.S. political controversies
Social issues

The researchers suggest that this could be due to built-in guardrails—pre-programmed safeguards intended to prevent AI from spreading misinformation about these topics.

2️⃣ AI struggles with true and mixed claims

While false statements were flagged effectively, LLMs struggled with statements that were actually true. Even when given factually correct claims, they often mislabeled them as “mixed” or even “false”.

Possible reason? LLMs are trained on vast amounts of internet data, which contains far more misinformation than fact-checking reports. This could lead to overcautious AI that doubts legitimate information.

3️⃣ Different AIs, different results

Not all AI models performed the same:

ChatGPT-4 and Google Gemini were the most accurate
Llama 3 (70B & 405B) had lower accuracy
Claude 3.5 Sonnet was better at evaluating mixed claims but worse at distinguishing true vs. false

This means that choosing which AI model to use for fact-checking matters! Someone using Llama 3 might get a different result than someone using ChatGPT-4 on the same claim.

4️⃣ Topic matters—a lot

The study found that AIs had different accuracy levels depending on the topic of the statement.

✅ Best accuracy for topics: COVID-19, U.S. elections, and American political controversies
🚫 Worst accuracy for topics: U.S. economic and fiscal policies

Why? It could be due to the amount and quality of training data on these topics. Economic claims, for example, often involve complex statistics, which AI may misinterpret.

What This Means for the Future of AI in Fact-Checking

This study highlights both the promise and the limits of AI fact-checking. Here’s what it tells us about where things are heading:

➤ Can AI replace human fact-checkers? Not yet.

If AI models misidentify true claims as false or struggle with certain topics, they can’t be fully trusted to replace human journalists. However, they could assist professionals by identifying suspicious claims more quickly.

➤ AI guardrails can help—but they must be fine-tuned

The better performance on false statements about COVID-19 suggests that AI models can be improved with specific safeguards. However, setting up such guardrails demands careful tuning—otherwise, AI might overcorrect and wrongly flag true information.

➤ Model choice matters

Different AI models perform differently on different types of claims. This means policymakers, journalists, and tech platforms need to choose the right AI tool for the job rather than assuming all LLMs perform equally.

➤ AI will get better—but must be monitored

As AI evolves, improvements in training data, fine-tuning, and transparency will make it more reliable. However, without careful oversight, AI-generated fact-checking could still spread errors.

Key Takeaways

✔ AI fact-checking is promising, but not perfect. LLMs perform best at identifying false claims but often struggle with true and mixed statements.

✔ Choosing the right AI model makes a difference. ChatGPT-4 and Google Gemini generally performed better than Llama 3.

✔ Fact-checking accuracy varies by topic. Certain issues, like COVID-19 and American politics, were checked more accurately than economic policy claims.

✔ AI can assist human fact-checkers, but not replace them. While AI can speed up misinformation detection, human oversight is still crucial.

✔ Perfecting AI fact-checking requires better guardrails. Improvements in training data, topic-specific fact-checking, and bias reduction will be key to making AI a better fact-checking tool.

Final Thoughts

This study gives a realistic snapshot of AI’s abilities in political fact-checking. While generative AI is advancing rapidly, it’s not yet foolproof—it struggles with true information, has blind spots on certain topics, and varies across models.

If you’re relying on AI for fact-checking, whether as a researcher, journalist, or everyday internet user, remember:

🤖 Not all AI models are created equal – Do research on which ones work best for your needs.
📰 AI fact-checking should complement, not replace, human verification – Always cross-check AI-generated results with trusted sources.
⚠️ AI can still make mistakes – Be mindful of potential misclassifications, especially on important political issues.

As AI continues to evolve, fine-tuning its fact-checking capabilities will be an ongoing challenge—but one with huge potential benefits. The next time you come across a suspicious claim online, will you trust AI to fact-check it? Let the debate begin. 🚀

If you are looking to improve your prompting skills and haven’t already, check out our free Advanced Prompt Engineering course.

This blog post is based on the research article “Fact-checking with Generative AI: A Systematic Cross-Topic Examination of LLMs Capacity to Detect Veracity of Political Information” by Authors: Elizaveta Kuznetsova, Ilaria Vitulano, Mykola Makhortykh, Martha Stolze, Tomas Nagy, Victoria Vziatysheva. You can find the original article here.

Blog

Can AI Fact-Check Political Claims? A Deep Dive into the Abilities and Limits of Generative AI

Can AI Fact-Check Political Claims? A Deep Dive into the Abilities and Limits of Generative AI

Introduction