BUILDING SOFTWARE WITH CLAUDE
The brainstorm-to-spec workflow, the toolchain, and the decisions AI cannot make. From shipping HostList.io, gautamkhorana.com, and Seahawk client products.
Why this guide exists
Building software in 2026 with Claude as the primary engineering collaborator changes the cost curve and the team shape, not the underlying craft. The judgement-heavy parts of software (what to build, why, how it should behave under failure) still require human attention. The execution parts (writing the code, refactoring, generating boilerplate, running migrations) compress dramatically. Knowing which is which is the entire skill in 2026.
This guide is the workflow I actually use to ship custom products at Seahawk Media and on personal projects like HostList.io, gautamkhorana.com, and Deluxe Astrology. Not a Claude tutorial, not a prompt-engineering manual. The process discipline that produces shippable software with Claude as the load-bearing engineer.
The brainstorm-to-spec workflow
Stage 1: structured brainstorm
Before any code is written I run a structured brainstorm with Claude. The prompt: describe the problem in plain language, ask Claude to surface five questions whose answers would change the design, answer them, then ask Claude to summarise the resulting product spec in 200 words. This stage takes 30 to 60 minutes and produces a written spec that the rest of the work flows from.
Stage 2: technical decisions
With the spec in hand, I ask Claude to identify the architecture choices that have the highest leverage downstream. Database shape, API surface, rendering strategy, deployment model. For each, Claude proposes two or three options and the trade-offs. I pick. The decisions are written down in the same document so the build phase has a single source of truth.
Stage 3: spec-driven implementation
Code generation is the last stage, and it is fastest because the spec is already complete. Claude writes the schema, the queries, the components, the routes, the tests, in roughly that order. I review every commit. Most reviews surface a small refactor or a missing edge case; full-rewrites are rare when the spec was clear.
What Claude is great at and what it is not
Great at
Greenfield code in well-known patterns: REST APIs, CRUD admin panels, auth flows, blog engines, marketing sites. Refactoring existing code where the target shape is clearly stated. Generating tests for code that has clear inputs and outputs. Writing migration scripts when the schema diff is unambiguous. Drafting documentation. Debugging code where the symptom is reproducible and the trace is in context.
Less great at
Architectural decisions in unfamiliar domains. Integration code where the third-party API is poorly documented or recently changed. Performance optimisation where the bottleneck is non-obvious. Code generation in obscure languages or frameworks where training data is thin. Anything where the requirements are ambiguous and the LLM will fill in plausible-sounding defaults that are wrong for your specific case.
The judgement gap
Claude is consistently better than the median engineer at writing the next line of code. Claude is consistently worse than a senior engineer at deciding whether the next line of code should exist. The senior judgement layer is what you bring. The execution speed is what Claude brings. The combination beats either alone.
The toolchain I actually use
Claude Code as the primary surface
Claude Code is the IDE for AI-assisted development. Project context loaded once, terminal access, file system access, MCP tool integration. The single highest-leverage tool I added to my stack in 2025. Most engineering work now happens in Claude Code rather than directly in VS Code or Cursor.
Cursor for direct in-editor edits
Cursor is still the best editor experience for working alongside AI on a single file. Tab completion, inline edits, side-by-side diff. I use Cursor when the work is concentrated on one or two files; I switch to Claude Code when the work spans the project.
Claude Sonnet via API for batch work
When I need to generate or rewrite hundreds of pages programmatically (the auto-blog pipeline, content humanisation, schema generation), I call the Claude API directly via Sonnet. Lower latency than the chat surface, predictable cost, scriptable. The right tool for content pipelines specifically.
OpenAI GPT-4o for prompt engineering
My discovery in late 2025 was that prompts for Kimi, Minimax, or any other agent are best written by Claude or GPT, not by hand. I describe the goal, ask GPT-4o to write the prompt, then run that prompt against the executor. The output quality is materially better than hand-prompted equivalents.
Kimi Researcher and Minimax Agent
Deep research and full-app design mockups respectively. The /blog/kimi-minimax-deep-research-design-mockups/ post covers the full workflow. Both load-bearing tools at Seahawk for client research and rapid prototyping.
Repository discipline that survives AI-assisted development
Small commits, descriptive messages
AI-assisted development tends to produce large diffs because it is easy to ship 800 lines of generated code in one commit. Discipline yourself to commit smaller. A commit message that says what changed and why is the only artefact future-you will have to debug a regression. Make it readable.
Tests as the contract
Claude can write code that compiles and runs but quietly violates a constraint that was implicit in the spec. Tests that encode those constraints as executable contracts catch the violations. The test-first discipline matters more in the AI-assisted era than it did when humans wrote every line.
Code review is human-only for now
I do not let Claude approve its own pull requests. The review gate is the most important quality control left to humans, and outsourcing it to the same model that wrote the code defeats the purpose. The Anthropic SDK and Claude Code workflows make AI-assisted review easy, but the final approval is mine.
Versioned dependencies and lockfiles
Pin every dependency. Use lockfiles. Run npm audit on every build. The supply-chain attack surface is real and AI-assisted development tends to add more dependencies than human-written equivalents because adding a package is fast. Lockfile discipline keeps the surface auditable.
The product decisions that AI cannot make
Five categories of decision where Claude is the wrong tool:
What to build at all. The product decision is judgement-heavy, depends on user context Claude does not have, and is the most consequential decision in any product. Founders and PMs own this; AI assists at most.
What to launch and what to cut. Scope decisions during a build always come down to constraints the model does not see. Time pressure, team morale, partner relationships, marketing positioning. Decide as a human, document the reasoning, then ask Claude to execute the decided scope.
Failure mode design. How the system should behave when things go wrong is rarely specified up front and is where most production incidents originate. Spend disproportionate time on failure modes; they will not emerge naturally from happy-path code generation.
The ethics layer. What you build has implications. Privacy, data residency, accessibility, environmental cost. These are human decisions, not optimisation outputs. Ask the question explicitly at design time.
Long-term maintainability. Future-self reading future-Claude reading present-Claude's code is the deepest version of technical debt. Optimise the present code to be readable by humans first, AI second.
Specific projects I have shipped this way
Concrete examples from the last twelve months at Seahawk and personally:
HostList.io (28,000 programmatic pages on Next.js + Supabase): scaffolded with Claude Code over five days, content pipeline written by Claude with Tavily research integration, hero generation via FAL, schema markup auto-generated. Total engineering time was roughly one third of what I would have estimated three years ago.
gautamkhorana.com (this site, on Astro + Supabase): rebuilt over a 24-hour weekend with Claude Code as the primary collaborator. Site editor surface, blog system, schema layer, i18n, all generated, reviewed, refactored. The cost of building a personal site at this quality dropped to a weekend of focused time.
Deluxe Astrology auto-blog and translation pipeline: the Tavily-research-to-Claude-draft-to-Winston-check-to-FAL-hero-to-Supabase-publish loop runs daily across 30 languages on 91,000+ pages. The pipeline itself is roughly 800 lines of code Claude wrote across about 12 hours of focused work, much of which would have taken a week with a human-only team.
Internal admin dashboards at Seahawk: Minimax Agent generates the first prototype from a six-line description, Claude refines it, the result is shippable in an hour. Time from "we need an internal tool" to "we are using the internal tool" dropped from days to under a working day.
The bottom line
Building custom software in 2026 is a craft that has compressed by an order of magnitude on execution speed and stayed roughly constant on judgement complexity. The teams that adapt are the ones that bring more judgement (better specs, better architectural choices, better failure-mode design) and outsource more execution to AI. The teams that do not adapt produce software at competitive speed but lose on quality, or vice versa.
You do not need to use every tool in this guide. You do need to know which decisions are yours and which can be delegated. The skill is the boundary, not the toolchain.
If you want help shipping a custom software project at this cost curve, we run product builds at Seahawk Media. The conversation is free; the project pricing reflects the AI-assisted cost reality, not the 2018 hourly cost reality.
WHEN YOU ARE READY TO TALK
If you are mid-build on something this guide touches and want a second pair of eyes, the fastest path is a 30-minute call.
BOOK YOUR 30-MIN CALL