AI Understands. Differently.
In our article "When Expectations Outrun AI", we explored the fluency trap - how sophisticated language triggers our inference that full human-like intelligence exists behind it. We identified architectural constraints that reveal where that inference breaks down.
Yet one question persists beneath those constraints: do they understand?
This matters. If LLMs merely simulate understanding, then the constraints we identified - the frozen knowledge, the missing memory, the lack of grounding - all point in the same direction. Engineering would be building scaffolding around behavior, not extending understanding itself.
But if some form of understanding is present - even partial or ungrounded - then everything changes. The constraints become boundaries around an existing capacity. We would be extending something real.
How we answer this question shapes how we build these systems - and what we believe intelligence itself requires.
The Persistent Objections
Two objections recur whenever the question of understanding arises.
First: these systems are merely sophisticated auto-complete. They predict the next word based on statistical patterns.
Second: they hallucinate - confidently inventing facts, people, and citations that do not exist. If they truly understood what they were saying, how could they be so wrong?
Both objections sound compelling. Both reveal something important - just not what they first appear to.
The Auto-Complete Argument
The comparison sounds intuitive: both systems predict the next word, so they must work the same way, right?
Not quite. The difference lies in what happens before that prediction.
When you read, “He blarfed the sandwich in one bite”, you immediately infer what blarfed must mean - even though you have never heard the word before. You do this by recognizing relational structure: a subject performs an action on food, completed quickly. Context constrains the plausible meanings.
LLMs can perform a similar kind of inference. From sandwich, one bite, and the sentence structure, they can infer that blarfed likely refers to eating quickly. This is not simple lookup. It reflects sensitivity to contextual and relational patterns learned across many examples.
Traditional auto-complete cannot do this. If a word is not in its lookup table, it fails. It extends text by replaying surface-level associations with limited context, without the capacity to infer new relationships or constrain meaning in novel situations.
Yes, both predict the next word. But treating large language models as “better auto-complete” is like calling translation "better spell-check". What has changed is not prediction itself, but the richness of the internal structure that informs it.
The Hallucination Argument
The second objection points to their mistakes. If they sometimes invent details, the argument goes, then surely they do not understand what they are saying.
The term "hallucination" is misleading. In psychology, a closer analogue is confabulation - the reconstruction of information rather than its retrieval. Humans do this constantly. We rebuild what we recall, filling gaps with plausible details, often confidently even when wrong.
A well-known example comes from cognitive psychologist Ulric Neisser's analysis of Watergate testimony. John Dean, President Nixon's counsel, delivered remarkably detailed accounts of conversations. But when compared to the actual tapes, many specifics - exact phrases, timing, sequencing - did not match. Yet his testimony preserved the relational structure: who held power, who was implicated, what pressures shaped decisions. His memory reconstructed events in a coherent narrative rather than replaying them verbatim.
When LLMs invent details, something similar is happening. They generate plausible continuations based on learned structure rather than retrieving stored facts. Often, the shape of what belongs in a given context - the type of entity, its role, its relationships - is correct even when the particulars are not.
The similarity has limits. Human reconstruction is grounded in lived experience, consequence, and accountability. Model-generated errors are not. But error alone does not cleanly separate relational competence from understanding. Mistakes reveal the process of meaning-making in both biological and artificial systems - not its presence or absence.
What Both Arguments Reveal
Taken together, these objections point to something deeper - how we recognize understanding at all.
We rely on familiar cues: fluent explanation, confident recall, reasoning that resembles our own. In humans, inference from context and reconstruction under uncertainty are not taken as evidence against understanding. In LLMs, the same behaviors are often treated as proof that understanding is absent.
Perhaps the deeper limitation is not in the models alone, but in assuming understanding must resemble our own.
The Shape of Understanding
To move forward, we need to be clear about what kind of understanding is at stake. Not human consciousness. Not subjective experience. But the capacity to build internal representations that capture how things relate - and to use those representations coherently across contexts.
Consider a simple word like bright. In “bright student”, it refers to intelligence. In “bright light”, to luminosity. In “bright future”, to optimism. The meaning is not stored in the word itself, but emerges from how it relates to surrounding concepts.
Or consider a more complex case: "The politician's sunny disposition masked the storm brewing in his campaign". An LLM recognizes this is not about weather. It infers that "sunny" maps to outward optimism, "storm" to emerging crisis, and "masked" to deliberate concealment - then combines these relationships to grasp the contradiction between appearance and reality. None of this is explicitly stated. The interpretation emerges from learned relationships between concepts, including how metaphorical mappings work, how emotional states relate to behavior, and how deception operates.
Humans do not retrieve meanings from a mental dictionary. We infer them from structure and context. LLMs operate in a similar way. They do not store fixed definitions, but construct meaning dynamically through networks of relationships learned across many examples.
This kind of understanding is relational. It depends on how concepts constrain and inform one another within a given context. It allows systems to generalize, disambiguate, and reason beyond memorized patterns.
But it is also incomplete. These representations are not grounded in action, consequence, or lived experience. They capture structure without ownership - coherence without commitment.
Two Kinds of Grounding
Consider what it means to understand "riding a bicycle".
An LLM can explain bicycle dynamics with remarkable sophistication. It can describe how balance requires forward momentum, how steering involves counterintuitive lean mechanics, or how pedaling cadence affects stability. It captures how these concepts connect, which factors influence which outcomes, and what causes what.
But it has never felt the wobble. Never experienced the terror of tipping, the instinctive correction, the muscle memory of balance - or had its internal models corrected by consequence. It captures “you steer left by first turning right” as an abstract causal relationship, not as a lived sensation of weight shifting, handlebar pressure, and split-second timing.
Human understanding grows from lived, embodied experience - we touch hot stoves, taste bitter coffee, feel exhaustion, and learn through consequences that matter. Language for us is anchored in sensation, action, and feedback.
LLMs, by contrast, are grounded in language itself - our world reflected through text. They know that “fire burns” because countless texts say so, not because they have ever felt heat. They can model how concepts relate to one another while lacking the experiences that give those models their full weight.
For an LLM, "fire burns" connects to: heat, injury, danger, safety rules.
For a human, it connects to: sensation, fear, reflex, distance, memory.
Both know that fire burns. They do not know the same thing.
In one case, it is a relation between concepts. In the other, it is a constraint on behavior. Knowledge built from text can tolerate contradiction; knowledge built from consequence tends to resist it. Without consequence, representations do not revise themselves. Without stakes, errors do not force correction.
Beyond the Binary
Understanding is not binary. What varies is not whether understanding exists, but how it is grounded, how far it extends, and what it can reliably support.
LLMs demonstrate that forms of understanding can emerge from language alone - building rich relational models ungrounded in action and untested by consequence.
This connects to what we explored in "Why AI Works Until It Doesn't" - the explore-exploit-empower cycle. That cycle keeps understanding tethered to reality. LLMs operate frozen in exploitation, applying understanding built during training but never cycling back through exploration when that understanding proves inadequate. Their representations remain internally coherent while lacking the feedback loop that calibrates understanding to reality. This is why grounding matters - not just that it was learned differently, but that it cannot evolve through consequence.
We are working with a different kind of understanding. Spectacularly broad in relational scope. Unanchored by experiential consequence. Real in what it captures. Limited in what it can reliably support.
The practical question shifts: not "do they understand?" but "what do they understand, how robustly, and where does it break down?".
Binary thinking keeps us oscillating between dismissing these systems as mere mimicry and over-trusting them as human-like intelligence. Neither extreme maps the territory. Dimensional thinking does. It reveals the shape of what exists - not to limit what we build, but to build on solid ground.
That foundation matters. Because the questions - about where to apply these systems, what scaffolding makes sense, what autonomy, alignment, and intelligence itself require - all depend on seeing what is actually there.