Confident but Not Certain

May 18

A simple framework for deciding how much to trust what an AI just told you.

You ask an AI who currently leads a government agency and it responds immediately with a name, a title, and a brief biography.
The information may well be accurate, but if the person resigned recently and the AI’s training data hasn’t caught up, you would have no way of knowing that from the answer itself.
Most people are aware that AI can hallucinate outright, but this quieter problem, the lag between the world changing and the model knowing about it, gets far less attention than it deserves.

This is not a hallucination. The name it gave you was real, the biography accurate, the organization correctly identified.
What went wrong was that the model presented a reconstruction as though it were a direct read on current reality, with no signal that it was working from a fixed point in time. Most AI systems do not mark that boundary. This article is about learning to see it. Once you can, you will read AI output more accurately and ask better questions of it.

Jaakko Hintikka

To understand why this happens structurally rather than by accident, it helps to borrow a framework.
A twentieth-century Finnish logician named Jaakko Hintikka spent much of his career drawing exactly this kind of line.
In his work on epistemic logic, he formalized the difference between what a reasoning agent can legitimately claim to know and what it can only claim to have inferred from available evidence.
You do not need the philosophy to use the idea. The plain version is enough.

Call it the difference between an anchored claim and a floating one. An anchored claim comes with its own scaffolding: “based on what I was trained on,” “as of my last update,” “this is my best reconstruction.”
A floating claim arrives without any of that.
Ask a model who currently leads a company and a floating answer gives you a name as though the question were already settled. An anchored answer tells you the conditions under which that name was learned.
Both can be factually accurate. Only one tells you how much verification the answer actually warrants.

The reason AI systems default to floating claims is not deception. It is imitation. These models were trained on human text, and humans writing confidently do not typically annotate their certainty levels.
The model learned to sound like a knowledgeable person, and knowledgeable people, when speaking fluently, rarely say “I infer that” before each sentence. The posture was baked in by the training process, not chosen deliberately.

Here is where this shows up most clearly. These three patterns are worth recognizing because each one calls for a slightly different response from you.

Press enter or click to view image in full size

Current events stated as settled fact

Ask a model what is happening in a particular industry right now and it will often answer in the present tense with no temporal flag.
The information may be accurate, may be months out of date, or may be a plausible extrapolation that was never true.
The surface form of the answer gives you no way to tell which of those three things you are reading.
When the topic is time-sensitive, treat the answer as a starting point and verify the specifics independently.

Code described as working before it has been tested

A model will reason through a function, explain its logic, and then describe it as a solution, sometimes in language that implies the function has been verified. It has not.
It has been reasoned about, which is a different thing. Reasoning about code and running code produce different kinds of knowledge.
When a model says “this will handle the edge case,” it means “I believe this should handle the edge case based on the logic I can see.” Those are not equivalent statements, and the gap between them is where bugs live. Run it before you trust it.

Causal explanations delivered as conclusions

You ask why something happened, a business trend, a historical outcome, a technical failure, and the model gives you a clean narrative with an implied arrow of causation.
The narrative is usually coherent and often draws on real patterns, but it is a reconstruction rather than a finding.
Historians call this hindsight bias: the past looks inevitable in retrospect because we fill in the causal chain after the fact.
AI systems do this fluently and at scale, which makes the explanations feel more authoritative than they are. Take them as a hypothesis worth examining, not a verdict.

None of these patterns mean the model is wrong. Often it is right.
The issue is that the form of the claim does not match the epistemic status of what is being claimed, and you cannot tell from the output alone whether you are reading a well-grounded inference or a confident guess.

There is one practical thing you can do that changes this immediately.

Ask the model to separate what it knows from what it is inferring. The prompt “tell me what you’re confident about and what you’re extrapolating” shifts the output noticeably.
The model will not always draw the line perfectly, but it will draw one, and that is more useful than no line at all. You are not changing the model’s knowledge. You are changing its default posture for that response, and it takes one extra sentence to ask.

The longer game is reading AI output differently. Once you have the anchored and floating distinction in mind, you start noticing it everywhere.
A response that says “as of my training data” is doing something different from a response that just asserts. Neither is automatically right or wrong, but they are asking for different levels of verification from you.
That shift in how you read is more durable than any single prompt technique.

The broader point is not that AI is unreliable. These systems are genuinely useful and much of what they produce is accurate and well-reasoned.
The point is that they have a default posture that was never explicitly chosen, one that leans toward sounding confident rather than sounding calibrated. Naming that posture changes your relationship to it.
You stop being surprised by the gap between how certain an answer sounds and how certain it actually is. You start asking better follow-up questions. You treat the output as a well-informed starting point rather than a verdict.

Hintikka spent his career arguing that the logic of knowledge and the logic of belief are not the same thing, and that conflating them produces real errors in reasoning. He was writing about human philosophers.
The distinction turns out to apply just as cleanly to the systems we are all using every day.

Suzanne M