← Patrick White

CAD Was Always Code

Why Opus being "SOTA at agentic CAD" isn't really about CAD

The tweet landed with a shrug and a video of Claude modeling parts in Onshape by itself. A few people said huh, interesting. Most scrolled past.

But if you've been paying attention to where AI is fast and where AI is slow, CAD should be surprising. CAD looks like exactly the kind of work AI shouldn't be good at yet. You rotate a part. You drag a face. You eyeball a dimension. You pick an edge out of a mess of overlapping geometry. It's visual. It's spatial. It's the kind of thing a child learns with their hands before they learn with words.

In Browser Use Is a Robotics Problem, I argued that operating a GUI is a control problem, not an intelligence problem — because clicking a button at specific coordinates in a changing environment is continuous sensorimotor work, which is the one thing transformers structurally can't do. If that's true for clicking a dropdown, shouldn't it be doubly true for extruding a face in 3D space?

It isn't. And the reason it isn't reveals something sharper about text distance than that essay said.

* * *

Here's what a parametric CAD model actually is: a program.

When you model a bracket in Onshape, you're not pushing clay around. You're writing an ordered sequence of operations:

  1. Sketch a rectangle on the XY plane (width=50mm, height=30mm)
  2. Extrude that sketch 10mm upward
  3. Shell the result with 2mm walls
  4. Fillet the top edges at 3mm radius
  5. Pattern four holes on the top face, 15mm spacing

That list — the feature tree — is the source of truth. The 3D shape you see is the compiled output. Change width from 50 to 80 and the entire tree re-evaluates. New walls, new fillets, the holes re-space themselves automatically because they were defined symbolically ("15mm spacing") not positionally ("at x=15, 30, 45, 60"). The model isn't a shape. It's a function of its parameters.

Under the tree there's a geometric kernel doing the actual B-rep math. But the kernel is a library. You never touch it. Your work — the thing a CAD engineer is paid to do — lives entirely in the symbolic layer above it.

FeatureScript makes this explicit. It's Onshape's native programming language for authoring features, and every built-in operation — extrude, fillet, pattern, every single menu item — is written in FeatureScript under the hood. When you click Extrude in the toolbar, Onshape writes a FeatureScript call. The GUI is a typing aid. The code was always there.

CAD isn't a visual task that got a code interface bolted on. It's a symbolic task that got a visual interface bolted on, and everyone mistook the interface for the discipline.
* * *

This breaks the text-distance framework — or more precisely, it sharpens it.

The original claim was: the speed at which AI transforms a domain is a function of how close that domain already is to text. Code is pure text, so coding went first. Surgery is pure embodiment, so surgery goes last. Browser use looks close to text because it's digital, but it's actually far, because operating an interface is embodied.

What that framing missed is that apparent text distance and real text distance aren't the same thing. Some fields are wearing a visual costume over a symbolic substrate. CAD looked far from text only because the tooling made it look far. The substrate was always text-shaped.

A domain's apparent text distance isn't fixed. It's partly real (surgery) and partly imposed by tooling (CAD). The GUI era accidentally inflated the distance of fields that were symbolic underneath.
* * *

Blender is the counterexample that proves the rule.

Blender is two programs in one. The default mode — mesh editing, sculpting — is genuinely embodied. You push vertices around. You extrude faces by grabbing them. The artifact is the polygon mesh. There's no program producing it. That part is a robotics problem, and AI is bad at it for the same structural reason AI is bad at clicking dropdowns. There's no symbolic layer to operate on. The sculpture is the sculpting.

But Blender also has Geometry Nodes and the Python API.

Geometry Nodes is a visual graph editor where you wire deterministic functions into a DAG. Each node takes geometry plus parameters and returns new geometry. Distribute points on faces. Instance a mesh on each point. Rotate by noise. The mesh is the compiled output of the graph. It's structurally identical to FeatureScript. It happens to be drawn as boxes-and-wires rather than as text, but the substrate is symbolic — pure functions composed into a pipeline.

When Blender pros need to scatter ten thousand rocks on a terrain, simulate hair, generate a procedural city, or do anything with repetition and variation, they reach for Geometry Nodes. Not because the GUI is faster there. Because those tasks are symbolic, and mesh sculpting was always the wrong fit for them.

So Blender is an honest hybrid. Different tasks have different modalities, and within a single domain they can coexist. The parts that are embodied are embodied. The parts that are symbolic have a symbolic interface. You can see the seam running down the middle of the application.

This is actually the useful way to think about most creative software: not as a tool, but as a stack of modalities, some of which are load-bearing embodied and some of which are symbolic work pretending to be embodied because that's how we built GUIs in the 1990s.

* * *

Once you see the seam, you see it everywhere.

Electronic circuits. Schematic capture looks spatial — you drag component symbols and draw wires. But the artifact is a netlist: text. Verilog and VHDL for digital, SPICE for analog. KiCad has full Python scripting. The schematic is a visualization of the netlist; the netlist is canonical.

Music composition. Scores look spatial. Lilypond lets you write an entire symphony as text. MIDI and MusicXML are the canonical interchange formats. Ableton's session view is a grid on top of an event tree.

Chemistry. Drawing molecules in ChemDraw is spatial. SMILES notation writes them as strings — CC(=O)O is acetic acid. Computational chemistry runs on symbolic representations all the way down.

Shader graphs. Unity and Unreal's shader editors are node-based UIs that compile to HLSL or GLSL text. The graph is a presentation layer.

UI and frontend design. Figma is spatial. But a UI is a tree of components with props. React is canonical. We've watched AI eat UI design faster via Tailwind and Claude Code than any Figma plugin ever managed.

Architecture. Revit walls aren't geometry — they're parametric wall-objects with layers and materials. Grasshopper, which rides on top of Rhino, is basically Geometry Nodes for architecture, used extensively by parametric firms like Zaha Hadid Architects.

Game levels. Unity and Unreal scenes are trees of GameObjects and Actors with components. Every level can be built in code. Designers drag-drop because the feedback loop is spatial, but the artifact is symbolic.

Databases. ER diagrams are spatial. SQL DDL is canonical.

Diagrams. Mermaid and Graphviz turn text into graphs. The graph is a render.

Roblox. The entire world is a tree of typed Instance objects manipulated by Luau. You can spawn a city of five hundred houses from thirty lines of code. The biggest Roblox games are built with enormous amounts of procedural composition.

Node editors in general. Houdini, TouchDesigner, Max/MSP, Unreal's Blueprints. Visual skins over computation DAGs.

The pattern is hard to miss once it clicks:

Any tool that lets you see-and-click on something whose underlying model is a data structure is a surveying-instruments overlay. The symbolic substrate already exists. The GUI is a UI choice, not the discipline.

Most of what we call "design software" turns out to be this. The design isn't the clicking. The design is the structure. The structure is symbolic. The clicking was how we got the structure into the computer before we had better tools.

* * *

This makes a sharper prediction than the original text-distance essay did.

In that piece I said every domain will eventually get AI-native interfaces, because building those interfaces is itself a coding problem — and AI is best at coding. True but incomplete. The corollary is: some domains don't need new interfaces at all. The symbolic layer is already there, buried under the GUI. They just need it exposed.

Onshape did this with FeatureScript. The result was a model that suddenly looks SOTA at agentic CAD, overnight. Not because the models got smarter. Because the field finally let the model talk to the thing the field was actually about. It stopped making the geometer walk the fields with measuring rope when the geometry was sitting right there.

Every domain on the list above is waiting for the same move. The AI-native move for Figma isn't "let an agent click around" — it's expose the component tree as code and let the model edit the tree directly. The move for Revit isn't a vision-model Revit operator — it's Grasshopper, plus a model that can read and write .gh files. The move for ChemDraw isn't better screen parsing — the domain already speaks SMILES.

When a domain has a hidden symbolic substrate, the AI-native move isn't building a new interface. It's deleting the old one.
* * *

Back to where we started.

Browser use is a robotics problem because operating the DOM really is continuous sensorimotor control. There's no symbolic substrate hiding underneath the click. A button click is a click. The form can't be edited any other way. That's a genuine control task, and it will be slow for the same reason robotics is slow.

CAD looked like the same kind of task. It isn't. The symbolic substrate was always there. The GUI was the accident.

This is the thing to look for, domain by domain:

CAD turned out to be the first category wearing the second's clothes.

A lot more of the world is in that category than anyone expected.