Building a 100+ agent swarm for Web3. Agentic engineering, not vibe coding. Writing at http://johnpphd.com
97 Followers
Picking up from the last cast: if AI agents make addition mistakes, why use them? Why not stick to deterministic code? Because deterministic code only does what you wrote. An agent does things you didn't write. That matters most where the design space is huge and changing fast. Onchain is the sharpest case I've seen: thousands of protocols, hundreds of chains, new primitives shipping weekly. A user asking "earn yield on idle USDC, low risk" has dozens of viable paths, gated by rates, liquidity, and gas that change by the minute. Hard-coding the answer is a losing game. By the time you ship, it's wrong. A probabilistic agent can stitch together a route nobody pre-wrote. Read a vault it's never seen, compare to a market it knows, propose a strategy. That's the upside. The thing that lets it see novel routes also lets it hallucinate one. Its Achilles heel is built right into it. So the real question for builders: how do we get the best of probabilistic agents while still having them do deterministic tasks?
I'm finally learning that AI agents amplify your habits. All of them. The good and the bad. I skipped planning for twenty years. Believed in DRY but never enforced it with tooling. Knew principles mattered but never wrote them down. I got away with it because I could hold it all in my head. With 100+ agents, I couldn't anymore. Those habits scaled. And at scale, they broke everything. So I separated planning from execution. Started codifying rules I had been carrying around in my head for years. The harness I built to constrain agents ended up constraining me first. Six posts on building agent guardrails. Turns out the biggest guardrail was on me.
It took me a while to understand why you need an AI harness. I have been running 100+ agents across a Web3 codebase. The thing nobody tells you: agents follow your prompts perfectly and still break everything. Tell it "build the signup flow" and it wires the form straight to the database. Nothing in the prompt was violated. Every important principle was. That's the difference between orchestration and a harness. Orchestration tells agents what to do. The harness defines what they cannot do, catches violations while they run, and verifies the output before it ships. I kept trying to improve the prompt, but all I got was higher quality mistakes. Now, I'm getting better code and trying to write down how I got there.
I've been coding with AI agents in production for almost a year, and it took me a while to get to my biggest realization: AI is not logical. It looks logical. It sounds logical. But at its core, every response is governed by probability. Ask a model to add up a list of numbers. It won't get it right 100% of the time. Better models make fewer mistakes, but they still make them. That's because LLMs are trained on pattern matching, not computation. This is why you hear so much debate about what it takes to build correctly with AI. And why some developers still won't touch it. They're not wrong to be skeptical. They're just identifying the real problem. So how do you get reliable outcomes from a system that isn't reliable by nature? Especially in crypto, where the EVM doesn't grade on a curve? That's what I'll be writing about over the next few casts.