SEO Automation Stack 2026: Built to Survive a Cron

The DataForSEO blog has a useful piece on combining their endpoints with NLP APIs to automate SEO content workflows. The API surface they describe is real. The version that survives a Monday 09:00 UTC cron, caching, rate limits, deploy gates, error replay, cost ceilings, looks meaningfully different from the demo.

This site runs the production version. The /admin/llm-mentions/ dashboard pulls real DataForSEO ai_optimization payloads on a weekly cron. The /tools/ai-citation-checker/ exposes a free DataForSEO SERP + AI Overview check to public traffic with IP rate-limiting. The 47-URL Lighthouse sweep that runs every Monday into seo_pages costs $0.30 and replaces a $40/mo PageSpeed Insights subscription. The whole pipeline was built in Claude Code over four working sessions.

Here is what that stack actually looks like, and where you decide what to automate versus what to keep human.

The five DataForSEO endpoints worth chaining

Most SEO automation that holds up in production uses some combination of these five. Not all at once, you pick by use case.

/v3/serp/google/organic/live/advanced, current top-10 organic results for a keyword + AI Overview detection. The backbone of any citation tracker.
/v3/on_page/lighthouse/live/json, full Lighthouse audit (Perf / A11y / BP / SEO + Core Web Vitals). Costs ~$0.005 per URL. Replaces PageSpeed Insights for any sweep above ~20 URLs/month.
/v3/on_page/instant_pages, page-level on-page analysis (title length, description, schema types, internal/external link counts, deprecated tags, content consistency scores). $0.001/URL. The cheapest single endpoint for content health.
/v3/ai_optimization/{engine}/llm_responses/live, query ChatGPT / Perplexity / Gemini directly with web search enabled, get back the response text plus the citation list. ~$0.03/run. The endpoint that powers actual AI-search visibility tracking.
/v3/dataforseo_labs/google/keyword_overview, search volume, KD, intent classification, monthly trend. The keyword-data backbone. Cheap at $0.001 per keyword.

The Claude Code part, what the model actually does

Claude Code's value in this stack is not the API call itself. The call is a fetch with auth headers; any junior engineer can write it. The value sits in four places.

1. Schema parsing

DataForSEO responses are deeply nested. /v3/on_page/lighthouse/live/json returns categories at tasks[0].result[0].categories, not at items[0].categories where you would guess. /v3/ai_optimization/.../llm_responses/live returns text inside result.items[].sections[].text, two levels deeper than typical AI APIs. Gemini wraps every citation URL in vertexaisearch.cloud.google.com/grounding-api-redirect/<token> so the real source domain is in the citation title, not the URL field. Claude Code reads the actual response shape in a smoke test, debugs the mismatch, fixes the parser. The skill 'verification-before-completion' refuses to declare done until a real response is parsed correctly.

2. Cron + cache discipline

DataForSEO costs add up. The LLM mentions tracker here costs $0.03 per run × 30 prompts × 4 runs/month = $10-12/month. That works because it is weekly. Run it hourly and the cost is 168x. The pattern that survives: cron-driven writes to Supabase, public reads from cache, manual force-refresh behind a JWT gate. The Claude Code hook layer enforces this, pre-commit refuses to push code that calls a paid endpoint without rate-limit + cache headers.

3. Deploy gates

A scheduled function only fires on production main. So a tracker that writes to seo_llm_runs every Monday needs to be merged before Monday. Claude Code's session-handoff discipline catches this. Yesterday's HANDOFF doc literally says 'The Monday 09:00 UTC cron only fires once this lands on production Netlify', the merge gate is documented and enforced before the work is closed.

4. Error replay

DataForSEO returns HTTP 200 with status_code: 40402 inside the JSON body when a task fails. A naïve fetch().then(r => r.json()) thinks it succeeded. Production code reads the inner status_code, retries on 4xx, logs the raw task wrapper to a debug table for forensics. Claude Code's systematic-debugging skill enforces hypothesis-driven debugging when this trips, write the hypothesis, design the minimal test, narrow further. The smoke script (scripts/smoke-llm-mentions.mjs on this site) was the artefact that surfaced the Gemini URL-wrapping bug before the full cron burned $5 of API spend.

The five-step content workflow, in production form

The DataForSEO article describes a content generation chain: SERP → AI summary → sub-topics → text generation. Here is what the same chain looks like with the production discipline applied.

Step 1. SERP fetch with cache

POST /v3/serp/google/organic/live/advanced for the target keyword. Cache the top-10 results in Supabase with a 7-day TTL. Every subsequent brief on the same keyword reads from cache. Saves 95% of API spend on the long tail (most queries cluster around the same keywords).

Step 2. AI summary with prompt caching

Pipe the top-10 result URLs into Anthropic Claude (Sonnet 4.6, prompt-cached on the system message). The summary asks 'what are the common themes across these results? What entities are universally cited?' Output: a 400-word common-pattern brief. Anthropic prompt caching cuts the cost 75% on repeat keywords.

Step 3. Entity coverage map

POST /v3/content_analysis/search with the target keyword. Returns the brand/product mentions across top-ranking content with sentiment scores. Output: a list of entities your draft must mention to look complete. Skip this step if the keyword is generic, only worth it for buyer-shape queries (commercial intent).

Step 4. Sub-topic structure

Use Anthropic to convert the SERP summary + entity map into a proposed H2/H3 structure. Six to nine H2s, each with three to five candidate H3s. Output: a Markdown outline ready for human review. Human gates this step, no full automation. The skill 'requesting-code-review' applies to content too.

Step 5. Draft generation, RDA Humaniser, ingest

Claude (Sonnet 4.6 for bulk, Opus 4.7 for flagship) writes the draft against the approved outline. The output runs through the RDA Humaniser pass, strips AI tells, breaks up uniform sentence rhythm, varies clause length. Then ingests into Supabase as status='ready'. A second human review approves before status flips to 'published'.

Real numbers from the production pipeline on this site

The /tools/ai-citation-checker/ uses a simplified version of this same chain. Live numbers from May 2026:

DataForSEO SERP call: 5 queries × $0.001 = $0.005 per citation check
Anthropic Claude summarisation: ~$0.02 per check (Sonnet 4.6, prompt-cached)
Effective cost per public tool use: ~$0.03
Rate limit: 3 fresh checks per IP per hour; cached results unlimited
Daily worst-case: ~$15 at full saturation across all IPs
Actual daily spend after caching: ~$0.80 (cached queries are 90%+ of traffic on popular brand searches)

What to automate, what to keep human

Automate

SERP fetching, caching, monitoring
Lighthouse + page-health audits at scale
LLM citation tracking across ChatGPT, Perplexity, Gemini
Schema + meta + hreflang validation
Content brief scaffolding (outline + entity coverage map)
First-pass draft generation for high-volume long tail (e.g. status code pages, location pages)

Keep human

Outline approval before draft generation, automation without this gate produces generic content at scale
Final review before publish, every flagship post
Topical authority strategy, what cluster to build, what to skip
Discovery and scoping calls, the buyer signals you read in a 30-minute call do not transfer to automated workflows
Penalty diagnosis and recovery, Google Search Console anomalies need a human pattern-matching against past sites

Where the DataForSEO article stops, and where production starts

The DataForSEO article shows the API surface and a happy-path demo. The production layer is everything that sits between that demo and a site that actually ranks: the cache strategy, the cron schedule, the JWT gate on force-refresh, the hooks that block direct prod writes, the smoke tests that verify response shape before burning API spend, the Supabase write retries, the deploy gates, the error replay. None of those ship in the demo because they are infrastructure problems, not API problems.

If you want the API surface, the DataForSEO docs are well written. If you want the production layer wired up against your codebase, you are looking at roughly four to six weeks of work for a single-domain setup, longer if you want multi-domain or multi-locale support. Claude Code shortens that to two to three weeks at a senior rate. That is the engagement on offer.

The engagement

Three project shapes available. Build feature: one specific automation surface (citation tracker, content brief generator, Lighthouse sweep), 1-2 weeks, £5k-£15k. Build product surface: marketing site + admin dashboard + content pipeline + automation hooks, 4-8 weeks, £15k-£45k. Build platform: full SEO automation stack with multi-domain, multi-locale, custom integrations, 12-20 weeks, £45k-£150k.

Discovery is one paid week. Output is a written technical spec, a fixed-price quote, and a working smoke test that proves the DataForSEO + Anthropic combination runs against your specific codebase before the full build. Book a 30-minute call to start.

Frequently asked questions

What is the DataForSEO and Claude Code SEO stack?

It pairs DataForSEO's API for SERP, keyword, and on-page data with Claude Code to orchestrate and reason over that data, into an automated SEO pipeline. The production version adds caching, rate limits, and deploy gates so it survives a real scheduled cron, not just a demo.

Which DataForSEO endpoints are worth automating?

The piece chains five: SERP, keyword, on-page, and related data endpoints worth combining for a content and citation workflow. The value is in chaining them with caching so identical queries do not double-charge, rather than calling each in isolation.

What does Claude Code do in the pipeline?

It handles the reasoning and orchestration: deciding what to fetch, interpreting the DataForSEO results, and turning them into briefs or actions. The model does the judgment work; DataForSEO supplies the data. Humans stay in the loop for the calls that need real editorial decisions.

What should stay human in SEO automation?

Final editorial decisions, strategy, and anything where a wrong call is expensive. Automate the data gathering, the repetitive analysis, and the first-draft synthesis. Keep a human on the judgment: which topics to target, what to publish, and whether the output is actually correct.

The SEO automation stack that survives a Monday cron