The Mechanics of AI-First Engineering

Andrej Karpathy posted in December that he's "never felt this much behind as a programmer." The profession, he said, is being "dramatically refactored."

Boris Cherny, the creator of Claude Code, revealed he runs five Claudes in parallel in his terminal. Eighty to ninety percent of Claude Code itself is written using Claude Code.

Dario Amodei predicted 90% of code would be AI-written within months. Theo said the teams he advises are already at 70-90%.

Kent Beck, after coding for 52 years, said he's re-energized by AI agents. He drew a circle with a stick figure on paper and Claude generated his entire website.

DHH wrote in January: "At the end of last year, AI agents really came alive for me. Partly because the models got better, but more so because we gave them the tools to take their capacity beyond pure reasoning."

Simon Willison called Claude Code "a tool for general computer automation" and said skills might be "a bigger deal than MCP."

Everyone's talking about the percentages. Nobody's explaining what it actually looks like day to day.

I've been living this for the past few months, both in my own work and helping teams adopt it. Here's what I've learned about the mechanics.

Spec-Driven Development

This isn't a new idea. But it's become essential.

The more detail and verifiability you put into a spec, the more autonomously the agent can work. Define the done state. List the constraints. Specify the files. Make every requirement something you can actually check.

Simon Willison put it well: "A language model changes you from a programmer who writes lines of code, to a programmer that manages the context the model has access to, prunes irrelevant things, adds useful material to context, and writes detailed specifications."

That last part is key. Detailed specifications. The agent can do the work without you if you've defined the game clearly enough.

This is also why typed languages have an advantage now. TypeScript, mypy, Rust. The agent can catch its own mistakes more easily when the compiler is checking its work. Fewer round trips, less babysitting.

Think Like a Manager

The mental shift isn't about prompting. It's about delegation.

I keep 2-6 Claude Code terminals open at any given time. One might be doing deep research on a codebase. One is working on the main task. One handles the side quests that come up mid-flow. Sometimes one is just tailing logs.

For teams using shared repos, tools like Conductor help manage multiple git worktrees so agents don't step on each other. For solo work on private projects, I've found a single terminal is often fine. The orchestration overhead isn't worth it when you're the only one touching the code.

The shift is from "I'm writing code" to "I'm managing agents who write code." You're reviewing, redirecting, unblocking. The typing is the smallest part.

Skills Architecture

If you enter the same prompt three times across the same repo, that's a signal.

Claude Code has this concept of skills, which are just markdown files that get loaded when you trigger them. I have /post for my Typefully integration. /remind shows my task list. /analyze-ticket pulls production traces for debugging.

The key insight: don't bloat your CLAUDE.md with everything. Keep your invariants there, the things that should always be true. Then give Claude the ability to index into other instructions when needed.

Your global CLAUDE.md shouldn't contain workflows you only do occasionally. That context has a cost. Instead, make those workflows into skills that load dynamically. Performance gets much more reliable when you're not drowning the agent in irrelevant instructions.

The payoff compounds. Linear tickets update automatically now. TODOs get tracked and completed. It happens in the background without me thinking about it.

Give Your Agent Database Access

This one surprised me with how much it changed things.

A team I've been helping had bug resolution times averaging around 8 hours. Not 8 hours of work, but 8 hours of wall clock: ticket sits in queue, someone picks it up, they spend time reproducing and diagnosing.

We gave the agent read-only database access. Five minutes now. The agent reads the error, queries the relevant tables, pieces together a timeline of what happened. Outputs exactly where things went wrong.

If you can also give it access to Sentry for crashes or Braintrust for agent traces, even better. The agent can correlate across systems and build a complete picture before a human even looks at it.

Headless for Automation

Claude Code headless is underrated.

I have specs that run overnight. They generate reports, process backlogs, clean up data. Cron jobs, essentially, but with an LLM doing the work instead of a script I had to write.

The mental model shift: Claude Code isn't just for interactive coding sessions. It's a general-purpose automation layer. Anywhere you'd write a Python script, you can write a spec instead.

Content and Voice

For content creation, I use Typefully with voice dictation through Wispr Flow.

I talk through posts while walking or between tasks. They land in Typefully as drafts. Light reformatting, schedule them out. The whole social media workflow that used to take an hour is maybe ten minutes now.

Same pattern applies everywhere. Find the friction. Look at what you do repeatedly. Automate it, not with a script, but with a spec and an agent.

Beyond Code: Pattern Recognition in Life

This extends way beyond engineering.

I keep a journal in a git repo. Version controlled, pushed to a private remote. I dictate to my computer using Wispr Flow, capturing what happened each day. Over time it becomes a dataset.

Then I ask Claude to find patterns. Search for the unknown unknowns. When do I have the most energy? What foods correlate with good sleep? What triggers my stress?

Humans aren't good at this kind of analysis. We're not objective. We're too focused on what's happening right now. But file system agents are excellent at spotting patterns in large text corpora.

I've used this for nutrition and sleep. For exercise programming. For understanding what I actually want in a relationship. It helped me get through a breakup recently.

The approach I take for domains I don't know much about: ask Claude to first search and load up the best information from recognized experts, then use that as the primary knowledge base. The model may or may not have good grounding in a topic, so you give it custom instructions and run everything through that lens.

For relationships, I used the works of John Gottman, the researcher who can predict which marriages will last. For fitness, the protocols from actual sports scientists. The agent becomes a domain expert on demand.

What 90% Actually Looks Like

The people hitting these high percentages aren't typing faster or prompting better. They've systematically eliminated repetition.

Every repeated action becomes a skill or a hook. Every manual lookup becomes a tool. Every context explanation goes in the CLAUDE.md once and never again.

It's not one big change. It's dozens of small ones that compound.

Kent Beck said TDD is a "superpower" when working with AI agents because agents can introduce regressions and tests catch them. Uncle Bob said these tools "will help us write code a little better, and they can help us write tests a little better... And all that means is that there will be more for us to do."

The quotes from Karpathy and Boris and Dario aren't hype. But they're also not magic. It's methodical automation of everything that doesn't require human judgment, so the human judgment can focus on what matters.