Claude Code vs Codex vs Cursor: 6-Month Honest Review

Six months ago I made a decision I still think about. I told the Seahawk team we were going to properly commit to AI coding assistants — not dabble, not cherry-pick the easy wins, but actually route real client work through these tools and measure what happened. That meant billing hours, live repos, and production deployments. Not toy projects. Not "build me a to-do app" demos.

Twelve thousand sites across nine years gives you a fairly calibrated nose for what's hype and what's a genuine shift. And honestly? This space is both at once, which is what makes writing about it so annoying.

So here it is — six months withClaude Code, OpenAI Codex (via the API and the newer Codex CLI), andCursor. No rankings, no winners declared before we've even started. Just what I found.

---

Why I Ran All Three Simultaneously

The temptation is to pick one and go deep. I almost did that. Back in January I was ready to just standardise on Cursor because the VS Code integration felt like the path of least resistance. Then a client — a SaaS founder in Manchester building an internal logistics dashboard — handed me a Python-heavy backend that was genuinely opaque, and Cursor's suggestions kept missing context that lived three files away.

That's when I decided the only honest evaluation method was to run the same categories of task through each tool in parallel. Not the same exact prompt, because that's artificial, but the sametypeof work: refactoring legacy PHP, writing new React components from Figma specs, debugging intermittent API errors, and generating test coverage for existing functions.

The results surprised me in ways I didn't expect.

---

Claude Code: Frighteningly Good at Context, Slower Than I'd Like

Let me be direct. Claude Code is the mostthoughtfulof the three. That word sounds vague, so let me make it concrete.

When I fed it a 400-line WordPress plugin I'd written in 2021 — back when I was doing things I now consider embarrassing, like storing options directly in$_POSTwithout sanitisation — it didn't just fix the obvious issues. It flagged the architectural pattern, explainedwhythe approach was fragile, and offered a refactored version that preserved the exact behaviour while fixing the security gaps. Cursor did half of that. Codex basically gave me a cleaner version of the same bad pattern.

Where It Wins

The long-context reasoning is real. You can paste in a full component tree, describe a bug three layers deep, and Claude Code will track the thread without losing it. For agency work where you're regularly inheriting other people's chaos, that's not a small thing.

It also writesexplanationswell. When a junior on my team doesn't understand why a refactor works a certain way, Claude Code's output tends to teach. That has actual value when you're trying to level up a small team.

Where It Frustrates

Speed. The responses are slower than Cursor's in-editor autocomplete, which isn't a fair comparison — they're different interaction models — but when you're in flow, waiting three to five seconds for a reply breaks something.

Pricing is also a real conversation. At heavy usage, the API costs add up faster than you'd expect. I ran about £340 worth of Claude API calls in February alone, across client projects. That's not ruinous, but it needs to go on the invoice somewhere.

---

OpenAI Codex: The One Everyone Forgets About

Here's the thing about Codex — people talk about it less now that ChatGPT and GPT-4o get all the oxygen, but theCodex CLIthat OpenAI shipped in 2025 is genuinely interesting for terminal-native workflows.

I used it heavily on a project for a fintech client (can't name them, NDA, standard stuff) where the entire codebase lived in a monorepo and we were doing a lot of work in the terminal rather than an editor. Being able to runcodexinline with shell context, have it read files directly, and execute commands in a sandboxed environment felt different from the chat-style interaction of the other tools.

Where Codex Shines

Automation tasks. Bash scripting. Writing GitHub Actions workflows. Generating boilerplate that follows a strict pattern. For that fintech project, I had Codex generate roughly 60% of the CI/CD pipeline YAML, and it was clean enough that I only made minor edits.

It's also the mostliteralof the three. If you give it a precise spec, it follows it. No editorialising, no "here's a better approach" — it just does the thing. Sometimes that's exactly what you want.

Where It Falls Short

The flip side of literal is brittle. Vague prompts produce vague code. And unlike Claude Code, it doesn't reliably catch the thing youshouldhave asked about but didn't. I had a situation in March where Codex generated a perfectly functional database migration script that would have caused a silent data loss issue on a Postgres 14 database because of how it handledDEFAULTvalues on existing columns. It did exactly what I asked. It just didn't tell me the thing I needed to know.

That's a meaningful difference in trust.

---

Cursor: The One I Actually Use Every Day

I'll be honest — Cursor is the tool I open first. Not because it's the "best" in some abstract sense, but because it lives where I work. The VS Code foundation means zero context-switching. My extensions are there. My keybindings are there. The colour theme I've been using since 2019 (One Dark Pro, if you're wondering) is there.

The In-Editor Experience

Cursor's Tab completion is genuinely eerie when it's working well. There were stretches last month where I'd start a function, hit Tab twice, and the entire implementation was exactly what I would have written. Not similar —exactly. That happens maybe 30% of the time. The other 70% it's useful but not magical. Which is still a good ratio.

TheCmd+Kinline editing and the chat panel in the sidebar cover different workflows, and I appreciate that Cursor doesn't force you into one mode. Sometimes I want to have a conversation about the code. Sometimes I just want to fix this one line. The tool lets me do both without friction.

Where It Disappoints

Long-context tasks are where Cursor starts to wobble. I gave it a codebase with about 85,000 lines of code — a large WooCommerce build for a UK retailer — and asked it to trace how a custom shipping calculation was affecting cart totals across three different plugin interactions. It got confused. Gave me confident-sounding answers that were wrong about which file was doing what.

Claude Code handled the same task better. Took longer. But got it right.

There's also the question of the underlying model. Cursor lets you choose between Claude, GPT-4o, and others, which is useful — but the default "Cursor Tab" model for autocomplete is its own trained model, and it's not always clear what you're getting or why it made a particular suggestion. Some opacity there that I'd rather not have on client work.

---

Head-to-Head: The Task Breakdown

After six months, here's how I'd roughly score each tool across the task types I actually care about:

Refactoring legacy code (PHP, older JS):

Claude Code: best. Catches things you didn't ask about.
Cursor: good. Faster, slightly less thorough.
Codex: fine if your prompt is precise.

Writing new components from scratch:

Cursor: best. The in-editor flow is faster.
Claude Code: strong, slightly slower.
Codex: solid for boilerplate.

Debugging intermittent or logic errors:

Claude Code: best. The reasoning chain is visible and usually correct.
Cursor: decent for obvious bugs.
Codex: weakest here. Too literal when you need nuance.

DevOps / scripting / automation:

Codex CLI: best for terminal-first work.
Claude Code: strong.
Cursor: not the right tool for this.

Team legibility (code a junior can understand):

Claude Code: best by some distance.
Cursor: varies by model.
Codex: terse.

---

The Cost Reality Nobody Talks About Honestly

Running three tools for six months costs actual money. Here's roughly what I spent:

Cursor Pro— $20/month. The fast requests cap (500/month on the standard tier) gets hit surprisingly quickly on heavy days.
Claude API (for Claude Code)— varied between £180 and £340/month depending on project intensity.
OpenAI API (for Codex CLI)— around £90–£120/month at my usage level.

That's somewhere between £300 and £500 per month in tooling. For a solo freelancer, that's a real line item. For an agency billing client work, it's more easily absorbed — but you have to actually track it and account for it, which a surprising number of people don't.

The honest ROI calculation for me: I estimate these tools save me 10–15 hours per month of billable-equivalent time. At my rate, that's worth considerably more than £500. But the maths only works if you're disciplined about what you use the time for. If you just use the saved time to scroll Hacker News, the ROI is zero.

There's decent third-party analysis onAI developer tool pricing models over at the Pragmatic Engineerif you want to go deeper on the economics.

---

What I've Changed About How I Work

A few concrete things that shifted after this experiment:

I stopped treating these tools as autocomplete engines and started treating them as a first-pass reviewer. Write the code. Then ask the tool what I missed.
I use Claude Code for anything I'muncertainabout and Cursor for anything I'mconfidentabout but just want to go faster on.
I've started writing better prompts by treating them like tickets. Context, constraints, expected output.Simon Willison's writing on promptingchanged how I think about this.
I review every single piece of AI-generated code before it goes into a PR. Not because I don't trust the tools, but because the one time I didn't — a Cursor suggestion in November that introduced a subtle race condition in a Node.js handler — cost me two hours of debugging.

That last point matters. These tools are fast and often right. They are not always right. The professional obligation to review doesn't go away.

---

FAQ

Which tool is best for a freelancer just starting with AI coding tools?

Cursor, without hesitation. The $20/month price point is reasonable, the VS Code integration means no learning curve on the environment, and the quality is high enough that you'll see genuine productivity gains in the first week. Start there. Branch out later.

Can I use Claude Code without being a heavy API user?

Yes, though the economics shift. If you use it through Claude.ai's Pro plan ($20/month) rather than the raw API, you get access to Claude Code with a usage cap. That's a more predictable cost. The API route gives you more control but requires you to track spend carefully.

Is Codex still worth using in 2025 given how much attention GPT-4o gets?

For terminal-native and automation-heavy workflows, yes. It's underrated for scripting and CI/CD work specifically. If your work is primarily in an editor, you can skip it. But if you spend real time in the terminal — and a lot of backend devs do — theCodex CLIdeserves a look.

Do these tools actually understand large codebases?

Partially. Claude Code handles large context windows better than the others right now — Anthropic publishes theircontext window specsif you want the technical detail. But "understanding" is generous. They reason well within what they can see. The discipline of keeping your codebase readable and well-documented matters more with AI tools than without them, not less.

Will AI coding tools replace developers?

Not the ones I know. What they replace is the low-attention work — boilerplate, obvious refactors, repetitive pattern application. What they don't replace is knowingwhyyou're building a thing,whetherthe architecture makes sense, andwhatthe client actually needs versus what they asked for. That judgment gap is where the job still lives.

---

Six months in, my opinion is probably not what you expected: I don't think there's a winner. There's a right tool depending on what you're doing in a given hour. The developers who'll get the most out of this era are the ones who stay curious about the tooling, keep their critical thinking on, and don't outsource the judgment — just the grunt work.

That's always been true. It's just more obvious now.

Pick your view

Claude Code vs Codex vs Cursor: Honest Review After 6 Months