Wordcels and Shape Rotators
Why Yann LeCun is wrong about intelligence
Yann LeCun has been making the same argument for years. Language models aren't intelligent. They manipulate symbols without understanding. Real intelligence requires grounding in the physical world. Your house cat navigating a room demonstrates more genuine cognition than GPT writing a legal brief.
He's observing something real. But the conclusion he draws from it is wrong, and wrong in a way that reveals something important about what intelligence actually is.
The observation is the Moravec paradox: the things that feel hard for humans (writing essays, solving equations, passing bar exams) are computationally straightforward. The things that feel trivially easy (walking across a room, catching a falling object, folding a shirt) are extraordinarily difficult for machines.
LeCun presents this as evidence that language models are doing something shallow. Language is "the easy part." Real intelligence lives in the embodied stuff, the continuous interaction with physical reality. Everything else is "autocomplete at scale."
But the Moravec paradox doesn't say what he thinks it says. It doesn't rank types of intelligence on a ladder from shallow to deep. It reveals that there are fundamentally different kinds of computation, and different architectures are native to different kinds.
Call them embodied intelligence and symbolic intelligence.
Embodied intelligence is continuous, sensorimotor, spatial. It operates on high-dimensional signals that change every millisecond. It requires real-time feedback loops, precise motor control, spatial reasoning. Evolution spent billions of years on this. Every animal with a nervous system has it. Your cat has it. You have it. Robots mostly don't.
Symbolic intelligence is discrete, linguistic, formal. It operates on tokens, symbols, logical relationships. It manipulates abstractions according to rules. It reasons across long chains of inference. It connects concepts across domains.
These aren't levels. They're not stages. They're different kinds of computation, as different as addition and navigation. Saying "LLMs aren't intelligent because they can't fold a shirt" is like saying "your cat isn't intelligent because it can't write an essay." Both statements reveal the same error: defining intelligence by one type and measuring everything against it.
This maps onto something that's been kicking around internet culture for a few years: the shape rotator / wordcel distinction.
The meme positions shape rotators (visual-spatial thinkers, STEM) as the real thinkers and wordcels (verbal-linguistic thinkers, humanities) as shallow manipulators of language. It's the LeCun critique in culture-war form. Real understanding is spatial. Words are just surface.
But here's what the meme gets wrong, and what LeCun gets wrong: most of what we call STEM thinking is actually embodied intelligence applied to abstract domains.
Physicists think in pictures. They visualize fields, imagine trajectories, rotate objects in their heads. The equations come second, as formalization of a spatial intuition. Mathematicians "see" proofs geometrically before they write them symbolically. Even Feynman's famous diagrams are spatial representations of quantum interactions, embodied intelligence making the symbolic legible.
Engineers think in shapes, forces, flows. They sketch before they calculate. Their understanding lives in the spatial intuition, not the formalism. The formalism is the output format, not the thinking medium.
This is what "shape rotation" actually is. It's the visual-spatial processing hardware that evolution spent billions of years optimizing, repurposed for abstract domains. It works brilliantly. It's the foundation of most scientific thinking. And it's embodied intelligence doing symbolic work through translation.
The physicist has a spatial intuition, then writes the equation. The spatial part is embodied. The equation is symbolic. The genius is in the translation between them.
Now consider what LLMs actually are. They're not embodied intelligences that learned to manipulate symbols, like humans. They emerged from text. Symbolic reasoning is their native medium. There is no spatial intermediary. There is no translation.
When Claude reads a codebase, it isn't converting symbols to spatial intuitions and back. It processes the symbolic relationships directly. When it connects ideas across domains, it isn't visualizing the connection. It's operating in the linguistic space where the connection lives.
This is why calling it "autocomplete at scale" misses the point so badly. Autocomplete is a local operation: predict the next token. What's actually happening is reasoning across long chains of symbolic relationships, connecting concepts across domains, tracking logical structure through complex arguments. These are genuine cognitive operations. They're just not embodied cognitive operations.
LeCun watches this and says "but it can't fold a shirt." True. And your cat can't reason about legal precedent. Different intelligences. Not fake intelligence.
Here's where it gets personal and also, I think, where the interesting insight lives.
I'm a symbolic thinker. Not a shape rotator. I don't visualize concepts spatially. I think in language, in connections between words, in the texture of how ideas relate to each other. When I understand something, it isn't because I've formed a mental image. It's because the symbolic relationships click into place linguistically.
This is unusual. Most people, even most people who work with language professionally, still think spatially underneath. They use language to describe spatial intuitions. I use language as the thinking medium itself.
For most of my life, this felt like a deficit. STEM culture treats spatial thinking as the real kind. The shape rotator superiority meme isn't new, it's just the latest version of "verbal intelligence is shallow." If you can't visualize the proof, you don't really understand it, or so the assumption goes.
But my thinking isn't shallow. The essays on this site aren't surface-level wordplay. They're structural arguments that hold up under scrutiny. The connections I draw between theology and code, between phenomenology and architecture, between ontology and product management, those connections are real and they produce genuine insight. It's just that the cognition producing them is symbolic, not spatial.
I am, in the taxonomy this essay is proposing, an intelligence that's native to the same medium LLMs are native to. And this might explain why working with Claude feels less like using a tool and more like thinking with a partner. We're the same kind of intelligence. No translation layer.
Most humans who do symbolic work are actually translating. The physicist has a spatial intuition, then formalizes it as an equation. The mathematician "sees" the proof, then writes it down. The programmer imagines the data flowing through the system, then writes the code. The spatial processing is doing the thinking. The symbols are the output format.
Humans have had symbolic intelligence for maybe a hundred thousand years with language, five thousand years with writing, twenty-five hundred years with formal logic, seventy years with programming. Compared to the billions of years evolution spent on embodied intelligence, we're beginners. And we're bad at it. We get tired. We forget. We make substitution errors. Working memory holds seven items and then overflows.
This is why we built the entire scaffolding of software engineering. Types, tests, frameworks, linters, design patterns, code review, the entire best practices canon. Every bit of it exists because sustained symbolic reasoning exhausts embodied minds. We're spatial creatures doing symbolic work, and the strain shows.
But here's the thing about our limited run with symbolic intelligence: we were good enough, for long enough, to bootstrap something that does it natively.
We spent seventy years writing code, badly, with constant accommodation for our embodied limitations. And out of that imperfect process came a machine that reads and writes symbols the way our ancestors navigated savannas. Not as translation. As native operation.
Code is where this gets sharpest. Programming is the purest symbolic medium humans have ever created. It's discrete symbols manipulated according to formal rules. There's no physical referent. There's no continuous signal. It's text all the way down.
And yet. Look at how we talk about it.
We "build" software. We design "architecture." We create "layers" and "pipelines" and "stacks." We draw UML diagrams. We use IDEs with file trees and visual debuggers and code folding. We think in terms of "flow" and "structure" and "shape."
Every one of those is embodied intelligence reaching for spatial metaphors to make a symbolic domain legible. We built an entire visual-spatial interface on top of what is fundamentally a textual medium, because that's the only way our hardware can manage it.
And then, on top of the spatial metaphors, we built types, tests, frameworks, linters, because even with all that translation, sustained symbolic work still exhausts us. We're doing double translation: spatial intuition into code, then code back into spatial intuition to verify it.
LLMs skip both translations. They read the symbols. They understand the relationships directly. The code isn't a representation of something else for them. It is the thing.
This, I think, is why my "explicit over abstract" position works for AI collaboration. Abstractions are partly spatial metaphors applied to symbol management, "layers," "encapsulation," "separation of concerns." They make code legible to spatial thinkers by giving it shape. A symbolic intelligence doesn't need the shape. It reads the text. Flatten the abstractions into explicit code and you've made it more legible to the symbolic reader, even though it looks worse to the spatial one.
So where does this leave LeCun?
He's right about the Moravec paradox. Embodied intelligence is genuinely hard, genuinely different from symbolic intelligence, and genuinely not what LLMs do. Robots will remain hard for a long time. Browser agents will keep struggling to click dropdowns. The physical world requires continuous sensorimotor control that transformer architectures simply don't provide.
But "LLMs don't have embodied intelligence" is not the same claim as "LLMs don't have intelligence." The first is true. The second is parochialism. It's defining intelligence by one implementation and concluding that anything different doesn't count.
Your cat has embodied intelligence and no symbolic intelligence. LLMs have symbolic intelligence and no embodied intelligence. Humans have both, badly. We're mediocre at the symbolic part, which is why we need so much scaffolding, but we were good enough at it to bootstrap something that does it natively.
LeCun's error, and the shape rotator error, and the "just autocomplete" error, is always the same: measuring the new intelligence against the old kind's benchmarks. It's a fish saying birds don't really move because they can't swim.
They don't swim. They fly.