Tech 3 min read

I looked into the source behind "ChatGPT lies 27% of the time"

IkesanContents

Where It Started

I saw a post like this on X.

It turns out there’s a shocking fact that ChatGPT lies 27% of the time, along with a useful fix. According to the latest research from Johns Hopkins University, adding just two words can dramatically reduce hallucinations.

The claim was that simply adding the two words “According to” to a prompt reduces hallucinations by 20%. That sounded interesting, so I tracked down the supposed source.

The Actual Research

There really is a Johns Hopkins paper.

Paper: "According to ...": Prompting Language Models Improves Quoting from Pre-Training Data
arXiv: https://arxiv.org/abs/2305.13252
Posted: May 2023 (last revised in February 2024)

The authors are Orion Weller, Marc Marone, Nathaniel Weir, Dawn Lawrie, Daniel Khashabi, and Benjamin Van Durme, all from Johns Hopkins University.

What the paper actually studies is whether prompts of the form According to [source] make an LLM answer more closely grounded in a source, using a metric called QUIP-Score.

What Does Not Add Up

There Is No Source for the “27% Lies” Number

This paper is about improving source-grounded responses with According to. It is not a study that measured ChatGPT and concluded that “it lies 27% of the time.”

Many hallucination studies report false-generation rates somewhere around 10% to 39%, but I could not find any primary source showing Johns Hopkins published the specific 27% figure.

It Is Not “Latest Research”

The paper was posted in May 2023, so calling it the “latest” research in 2026 is a stretch.

Multiple Studies Are Being Mixed Together

The original post also mentioned “36% improvement with Step-Back Prompting,” but that is a different study from Google DeepMind in 2023. It has nothing to do with the Johns Hopkins paper.

Does According to Actually Help?

Cases Where It Seems Useful

  1. When asking for summaries or explanations

    • According to the React 19 release notes, explain the new features.
    • Useful when you want to constrain the answer to what is in the source
  2. For basic knowledge in specialized domains

    • According to WHO guidelines, what are the recommended daily sodium intake limits?
  3. For comparison and organization

    • According to the official documentation of Python and JavaScript, compare their async/await implementations.

Limits

  • Even if you write According to, the model is not necessarily reading the source in real time
  • It can still produce text that merely sounds citation-like
  • For important information, you still have to verify it yourself

An Older Anti-Hallucination Prompt That Used to Circulate

Before According to became popular, prompts like this were common:

As an expert, provide only accurate and reliable information based on facts. When citing sources, always specify highly reliable primary sources such as peer-reviewed papers, public institutions, specialist books, or authoritative reporting. If sufficient information does not exist, clearly state that “there is no reliable information currently available,” and never fill gaps with guesses or speculation.

The goal is the same: reduce unsupported answers. But the effect is limited. LLMs can still make things up even when explicitly told not to.

What I Do Personally

I mostly use LLMs for programming, so at least I can test whether the code actually works. Even then, Gemini sometimes gives me shaky information, so I often cross-check with Claude or ChatGPT and compare the answers.

In the end, it is better to question the source first, whether the information comes from the internet, from a human, or from an LLM. That is just basic information literacy.