Architecture Still Matters

A narrative has been forming over the past year that goes roughly like this: AI coding tools are absorbing the craft of software engineering, so the parts of the craft that used to matter, architecture, naming, boundaries, principles matter less now. The model will figure it out. The teams that win are the ones that move fastest, not the ones that follow good practices. Just ship.

On the codebases where I've actually used these tools, the opposite is happening. The architectural decisions I used to defend on grounds of human ergonomics "future maintainers will thank us" have turned out to be the ones that decide how well these AI tools perform. A lot of what gets attributed to "the model isn't good enough yet" is really the codebase asking too much of it.

Where architecture meets the model

The examples below are in TypeScript but the principles apply to any language with a reasonable module system.

What the model can see

The obvious one first. Context windows are finite. A 200k-1Million token window sounds like it makes this a non-issue but even within a large window, models tend to reason better over shorter, focused context than over long, loosely-relevant context. A focused prompt with the right 5,000 tokens will usually outperform a dump of 50,000 tokens that happens to contain the same information; the context window is the ceiling, not the target.

The less obvious part: agents like Claude Code and Cursor don't read your repo, they search it. Every interaction starts with a retrieval step that pulls files based on your task description, file names, and whatever conventions the tool can infer from the codebase. That retrieval is only as good as the signals you've given it. Vague file names, related code scattered across unrelated directories, inconsistent naming between similar concepts, each of these makes the agent load more files before it finds the relevant code and the context budget can run out before it does.

The practical consequence isn't that you necessarily need smaller files. It's that you need discoverable ones. A 2,000-line module with a clear name and a single job can be fine; the agent loads it when it needs to, and the other 99% of the time it doesn't. A 400-line module called utils.ts that collects seven unrelated helpers is much worse if it gets retrieved constantly for the wrong reasons, and its contents confuse the model about what the surrounding system does. File size is less of a problem than module responsibility.

What the model can reason about

Once the model has loaded context, it has to predict what impact a change in one place will have on the rest of the system. This is where unclear boundaries start to hurt. If a module's effects leak beyond its public surface (because other modules reach into its internals, or because effects propagate through transitive imports in ways that aren't obvious locally) the agent needs all of the affected files loaded to reason correctly. When it doesn't have them, it writes a confident-looking edit that breaks something in another file it was never aware of.

An architectural property that helps mitigate this is explicit, narrow boundaries between modules. A module's public surface should be small, obvious, and stable; internal changes should not cross that surface. When boundaries are explicit, the agent can reason locally the set of files it needs to load is bounded, and the set it can safely ignore is everything else. When they're implicit, shared mutable state, reaching into other modules' internals, cross-cutting utilities that everything depends on, the scope of a change's effects is the entire repo, and no amount of context window solves that.

One concrete example: TypeScript doesn't have first-class module privacy. Anything a file exports is importable from anywhere else in the codebase. You can create code conventions to establish public/private splits, but the language won't stop a distant module from reaching into a service's internals, and once someone does, the boundary has quietly leaked.

One convention is to give each service a single designated module (often called something like api.ts) that re-exports exactly what external code is allowed to use. Everything else in the service is internal. This is like a barrel export with a rule attached. A plain barrel aggregates for convenience, but anyone can still reach around it. What makes this a boundary is the convention that external code only import from the barrel.

Sometimes api.ts contains thin wrappers over internal functions, which reads as duplication but is doing real work: it owns the shape of what's public, so internal refactors can't silently become breaking changes. Enforcement is partly by convention and partly by the rules we give to coding agents. We adopted this convention because TypeScript's boundaries leak, but it also happens to make services legible to an agent, one file, full public surface.

What the model expects

Models have seen millions of repositories during training. They have good prior knowledge about what typical TypeScript projects look like, where routes live, how services are structured, how tests are named, which patterns handle which kinds of problems. Those priors do a lot of work on every request. When your codebase matches them, the model generates well-placed code that fits the surrounding style with almost no prompting.

Using conventional code pattens has always been valuable for this reason. A diverse team with varied experience levels can read, change, and trust code that follows widely-known patterns, in a way they can't with code that's been highly customised. AI tooling is a new reader in this class, not the reason the principle exists. It just makes the return on convention higher than it used to be.

Classical design patterns such as Observer, Factory, Builder and Strategy are in every training corpus. On seeing them, a model recognises them and knows how to work with them. The problem arises when creating novel bespoke patterns: inventing your own abstractions and naming them after internal concepts, or stacking a framework-within-a-framework, when a standard pattern would have worked. But even standard patterns can have a cost when they stack. Each one adds a layer of indirection: Observer means the flow isn't here, it's wherever listeners subscribed; Factory means the type you get back isn't what this line says. One layer is fine; three on a single interaction gives a reader, human or model, three searches to do before they understand what actually runs.

Novel abstractions should only exist when a standard pattern can't do the job. Most of the time, one can.

What the model (and you) can check

The last mechanism is the feedback loop. The model writes code; something needs to tell it and you whether the code is right. The faster and more automated that something is, the more the agent can iterate without your attention, and the more confidence you can have in the result.

This is where types earn their keep in a way they didn't before. A function signature like:

interface InvoiceInput {
  customerId: string;
  lineItems: LineItem[];
  dueDate: Date;
}
 
function createInvoice(input: InvoiceInput): Promise<Invoice> { ... }

is a contract the model can read, obey, and verify against. If it tries to pass the wrong shape, the compiler tells it immediately. The feedback is local, fast, and unambiguous. Compare this to a codebase where half the functions take any or loosely-typed objects assembled from several sources the model has no way to check whether it's got the shape right short of running the code, and you have no way to verify the model's work short of reading every line carefully.

Tests do the same job from the other direction. A test suite the agent can run gives it a feedback loop that doesn't depend on you. Types catch shape errors; tests catch behaviour errors; together they turn the review cycle from "human reads every line" into "human reviews what the checks can't judge." Without them, you're still reviewing every line by hand, and the gap between AI-assisted and unassisted work narrows sharply.

What this all adds up to

Read as a list of failure modes, the four mechanisms are daunting. Read as properties to optimise for, they're a reasonable description of good engineering practice: discoverable structure, narrow boundaries, conventional patterns and strong contracts. These design principles were always important and they are still relevant to give your coding agents the best chance to understand and work on your code.

A codebase optimised for LLMs and a codebase optimised for human readers turn out to be the same codebase. The agent is, functionally, a new reader on every request and it wants what any new reader wants: to orient quickly, find the relevant code easily, and trust that a change's effects stay within its boundaries. If you were already nurturing a codebase that a new engineer could be productive in on their first week, you're already most of the way to one that works well with AI tooling.

Good code architecture and design always mattered and it matters even more now to get better outcomes from coding agents.