technical-seo.html

Technical SEO that survives AI Overviews, headless rebuilds, and your next algorithm update.

Crawl, index, schema, Core Web Vitals, multi-locale, AI search citability — built into the codebase, not bolted on after launch. WordPress, headless WordPress, Next.js, Astro, Nuxt, and the CMS layer behind them.

BOOK A TECHNICAL AUDIT CALL SEE ALL SERVICES

12,000+ sites shipped WordPress, headless, Next.js, Astro, Nuxt Build-time SEO linter on every project Cited-in-AI tracking included

WHAT IS TECHNICAL SEO IN 2026

Technical SEO is the layer of work that decides whether search engines and AI assistants can crawl, render, and trust your pages — before content or backlinks have any effect. It covers HTTP behaviour, HTML semantics, structured data, hreflang, canonicalisation, Core Web Vitals, and JavaScript rendering. Get it wrong and the rest of your investment is dead weight.

In 2026 the definition has stretched. Two changes pushed it. First, AI Overviews and ChatGPT-powered search now sit between most users and the underlying pages, which means citation-readiness matters as much as ranking. Second, the modal site is no longer a WordPress install — it is some flavour of headless front-end pulling content from Sanity, Strapi, Payload, Storyblok, Contentful, or a headless WordPress backend. Each of those stacks introduces its own technical SEO failure modes that the old playbook does not cover.

My working definition for clients in 2026: technical SEO is everything that a Lighthouse run, a Search Console crawl report, an AI Overview citation check, and a schema validator notice — across every locale, every template, every render path. If any of those four signals is failing on a representative URL, the engagement starts there.

WHY IS TECHNICAL SEO DIFFERENT ON HEADLESS AND JAMSTACK SITES

On a headless or Jamstack site, your front-end framework owns rendering and your CMS owns content — and SEO can fall through the gap between them. The classic WordPress + Yoast model assumed one server, one renderer, one rules engine. Pull content out of WordPress into Next.js or Astro and you inherit none of that automatically.

Headless WordPress paired with Next.js or Astro

The most common configuration we see in 2026: WP REST or WPGraphQL exposing posts and pages, Next.js App Router or Astro pulling them at build time or via ISR. The wins are real — pages ship with no plugin bloat, Core Web Vitals are easy to keep green, and editors keep a familiar admin. The pitfalls are also real. Yoast canonicals and meta descriptions need to be transported across the API boundary; redirects defined in WordPress need to land in vercel.json or Netlify _redirects; sitemaps usually need to be regenerated on the front-end build, not the WordPress side; and search console verification needs to happen on the public origin, not the wp-admin domain.

Pure modern stacks

A Next.js + Sanity build, an Astro + Storyblok build, a Nuxt + Strapi build, a Next.js + Payload build — all skip WordPress entirely. The technical SEO work shifts to making sure your CMS schema models the SEO fields the front-end needs (canonical override, redirect map, hreflang group, schema-extension JSON), and making sure on-demand revalidation triggers when content changes. ISR cache poisoning is the silent killer — a page is served stale for hours after publish because revalidatePath was never wired up.

What actually breaks first

Across about 200 headless audits we have run, the single most common production-breaking issue is canonical conflict — the front-end emitting one canonical URL while the CMS metadata or sitemap emits another. Google picks one, almost always not the one you wanted, and the wrong URL ends up in the index. We catch this with a build-time linter that compares canonical-emitted-from-template against canonical-stored-in-CMS for a sample of pages and fails the build on mismatch.

HOW IS WORDPRESS TECHNICAL SEO DIFFERENT FROM HEADLESS

WordPress hands you the most plugin firepower and the most ways to shoot yourself. The technical SEO playbook on a classic WordPress site is mostly about subtraction — removing duplicate canonicals, killing schema fights between Yoast and RankMath and a third plugin nobody remembers installing, taming page builders that ship 2 MB of CSS, and clearing redirect chains that have grown for eight years.

The default stack we recommend in 2026: a single SEO plugin (RankMath or Yoast, never both), a managed host with proper edge caching (Cloudways, Kinsta, WP Engine), Bricks Builder for new builds where performance matters or native Gutenberg with Kadence or Blocksy, a redirect manager that exports to a portable format, and Perfmatters or FlyingPress for asset cleanup. The audit work is finding which of those decisions was made differently on your site and unwinding the consequences.

On the headless side, the work moves from subtraction to construction. There is no Yoast, so meta clamping has to be built. There is no plugin schema, so JSON-LD has to be templated. There is no built-in sitemap, so it has to be generated and streamed. The volume of code we add to a headless project for SEO is usually three to five times what a WordPress site already gives you for free — but that code is owned, version-controlled, testable, and does not break on the next plugin update.

WHAT DOES GEO AND AEO MEAN FOR YOUR PAGES

GEO and AEO are the names for two different ways AI features eat search traffic. AEO (Answer Engine Optimisation) is the older term — it covers Google features like Featured Snippets, People Also Ask, and Knowledge Panels that pull a passage from your page and display it as the answer. GEO (Generative Engine Optimisation) is the newer term — optimising for AI Overviews, ChatGPT search, Perplexity, and Bing Copilot, where the assistant generates a paragraph and cites your page as a source.

What that means structurally

Both surfaces want the same thing: a citation-ready passage. The structural rules that win across all of them are the same. Use a question as your H2, not a topic label. Put the answer in the first one or two sentences after the heading. Keep that answer under 250 words. Make sure the answer is rendered server-side, not inside a JavaScript component that hydrates after page load. AI crawlers and Google extractors mostly do not run JavaScript — if your answer needs JS to appear, you are invisible.

Where GEO diverges from AEO

Three things matter more for GEO specifically. Entity authority — Google and the LLMs build a graph of who is allowed to talk about what, and unlinked brand mentions feed it. Schema with about and mentions arrays — they make your page parseable as part of an entity graph rather than a wall of text. And llms.txt — an emerging standard file at /llms.txt that gives AI tools a curated map of your site, the same way robots.txt and sitemap.xml help search crawlers. We deploy llms.txt on every site we build and update it whenever a major page lands.

WHAT SCHEMA MARKUP DO YOU ACTUALLY NEED

You need fewer schema types than most plugins emit, but the ones you ship have to be valid and consistent. A short list covers nine out of ten projects.

Organization — a single sitewide graph in your layout, with logo, sameAs (only real social accounts), address, and contactPoint. Never duplicate this per page.
WebSite — once, in the layout, with potentialAction for sitelinks search if you have on-site search.
BreadcrumbList — on every non-home page, with locale-correct URLs (a French page must point to /fr/ ancestors, not /en/).
Article or BlogPosting — on long-form content, with about and mentions arrays for entity-graph reinforcement.
Service — on commercial pages, with serviceType, provider, areaServed, audience, and offers.priceSpecification when you can publish a price range.
Product — on ecommerce, with offers.priceCurrency, availability, and aggregateRating only if you have real reviews.
FAQPage — when there are at least two genuine questions on the page. Faking FAQs to win schema markup is the most common manual-action trigger we clean up.
LocalBusiness or one of its subtypes — on physical-location pages, with geo coordinates and openingHoursSpecification.

Three things kill schema in production. Inventing properties that schema.org does not define. Inventing values for sameAs (fake LinkedIn URLs, abandoned Twitter handles). And shipping conflicting Organization graphs from multiple plugins. We run schema validators in the build linter and fail the build on any of those three patterns.

HOW DO YOU DO HREFLANG AT SCALE WITHOUT BREAKING IT

Hreflang fails at scale because the constraint is bidirectional and the dataset is sparse. Every locale variant of a page must self-reference, must reference every other variant, and must be referenced back by every other variant. Miss one direction on one page in one locale and Google silently downgrades the cluster.

The pattern that scales

Store a content_group_id (or equivalent) on every translatable row. Every locale variant of a page shares one ID. The hreflang emitter, the sitemap emitter, and the canonical emitter all derive their cluster from that ID. Never compute hreflang from URL pattern matching alone — it falls apart on edge cases (a Spanish page that has no Hindi translation breaks the cluster if your code assumes "if pageX exists in locale A, it exists in locale B").

What kills hreflang in practice

Three patterns we see repeatedly. Locale regex ordering — putting "zh" before "zh-Hant" in your locale detection regex, which captures the wrong locale and writes a broken hreflang. Forgetting x-default — every cluster needs an x-default fallback, usually pointing at the English version. Cluster ID drift — a translation gets a new ID instead of inheriting the source ID, silently splitting a single cluster into two unrelated ones, neither of which has a full reciprocal set.

We always add a build-time hreflang linter that crawls a sample of pages, walks the hreflang clusters they reference, and fails the build if any cluster is incomplete or asymmetric.

WHAT IS PROGRAMMATIC SEO AND HOW DO YOU DO IT SAFELY

Programmatic SEO is generating thousands of pages from a structured data source plus a template — directories, comparison pages, location pages, glossary pages. Done right it can hit the long-tail at a scale single-author content cannot match. Done wrong it triggers a manual action and removes most of your indexed pages overnight.

What separates a clean programmatic build from a thin-content one

Three things. Real data per page — every URL has at least one fact, number, or detail unique to it; thin programmatic pages share 95% of their content. A meaningful template — the template adds context, comparison, recommendation, or aggregation around the unique data, not just a search-optimised wrapper. And quality gating — pages with insufficient unique data are kept out of the sitemap, blocked from index, or held in a draft state until the data layer fills in.

What we have learned from running this at scale

I built HostList.io as a programmatic SEO platform with around 28,000 web hosting company pages on Next.js plus Supabase. The pages that survived two years of Google updates were the ones with at least three unique data points per URL plus a template that compared, scored, or recommended on top of that data. The pages we de-indexed were the ones where the unique data was just a name and a price. The cost of pulling thin pages out of the index was small; the cost of leaving them in was a sitewide ranking penalty when the March 2024 helpful content update landed.

We bring that operating playbook to client programmatic builds — Next.js or Astro front-end, Supabase or Postgres data, an ingest pipeline that scores pages on uniqueness before publish, a sitemap that streams in chunks because over 50,000 URLs cannot fit in a single sitemap.xml, and an internal-link graph that pulls every leaf into a topical cluster.

HOW DO YOU KEEP CORE WEB VITALS GREEN AT SCALE

Pass Core Web Vitals at the 75th percentile of field data, not in a controlled lab Lighthouse run. Field data from CrUX is what Google uses; Lighthouse is a debugging tool. The two often disagree by 30% or more on real sites.

Where the budget actually goes

LCP is almost always the hero image and almost always solved by re-encoding to WebP at 80% quality, sizing to the actual display dimensions plus 2x retina, adding a preload tag in the head, and setting fetchpriority="high". A 1 MB hero image becoming a 30 KB WebP is the single highest-impact change on most projects. CLS comes from images and ads without explicit dimensions — explicit width and height attributes on every image, fixed-height ad slots, and reserved space for any client-side widget. INP comes from heavy JavaScript on interaction — usually a third-party tag manager or an over-eager analytics library. The fix is debouncing, lazy-loading, or replacing with a lighter equivalent.

Where most projects miss

Two patterns. First, an LCP image inside a Carousel component or a JavaScript-driven layout — the image only renders after the JS runs, and your LCP is the carousel's loading skeleton, not the photo. Second, web fonts loaded without font-display: swap and without preload — text is invisible for 200-400 ms while fonts download, and your LCP gets pushed past the 2.5 s threshold even though the image is fast. Both are caught by a CrUX field-data review, not a single Lighthouse run.

WHAT IS A BUILD-TIME SEO LINTER

A build-time SEO linter is a script that runs at the end of your build, samples a slice of rendered HTML files from the output directory, and fails the build if it finds patterns that would degrade SEO in production. It is the single highest-impact habit we add to client codebases.

What ours checks

Every page has exactly one H1.
Meta description on every indexable URL is between 120 and 155 characters.
html lang attribute matches the locale path (a /fr/ page has lang="fr").
Hreflang clusters are complete and bidirectional on translatable routes.
JSON-LD on every page is valid against schema.org definitions.
No banned content patterns — fake social URLs in sameAs, hardcoded test placeholder text, banned generic copywriting words.
Canonical URL emitted by the template matches the canonical stored in the CMS.
Image WebP and explicit dimensions check on a sample of templates.

The linter runs as the last step of npm run build. Any violation fails the build, which fails the deploy. Without it, every regression — a meta description that grew over the limit, an SEO field someone forgot to set, a schema emitter that broke when a property was renamed — silently ships. With it, the regression is caught before it leaves the developer's laptop.

HOW DO YOU GET AI OVERVIEWS AND PERPLEXITY TO CITE YOUR PAGES

Get cited by writing every relevant page as a stack of citation-ready passages. A citation-ready passage is a question-form H2, a direct one-or-two-sentence answer immediately after, supporting nuance for the next 100-200 words, and zero JavaScript-rendered content in the answer block. AI extractors lift that opening sentence and cite the page.

What helps beyond passage structure

Entity authority — Wikipedia presence, consistent organisation schema, real sameAs accounts, brand mentions across the open web.
llms.txt at the site root — a curated map of the site for AI tools, separate from the robots.txt and sitemap.xml that serve traditional crawlers.
Schema with about and mentions on long-form content — declares the entity graph the page sits inside.
AI-crawler robots.txt allow-list — explicitly allow GPTBot, PerplexityBot, ClaudeBot, OAI-SearchBot, Google-Extended, Applebot-Extended, CCBot, and Anthropic-AI. Blocking even one of these is a self-inflicted citation outage.
Speakable schema property on the answer-rich sections of long-form pages — a hint to voice and AI extractors that this is the citable passage.

Tracking citations is the part most teams skip. Otterly, Profound, and AthenaHQ track AI Overview and Perplexity citation share by domain. We add weekly citation tracking on top of every engagement and report it alongside organic traffic. If you are not measuring citations you cannot tell whether your GEO work is doing anything.

HOW DO YOU MAKE SURE AI CRAWLERS CAN READ YOUR SITE

AI crawlers can read your site only if your robots.txt explicitly allows their user-agent and your hosting layer does not block them at the network edge. Both checks need to pass. Default Cloudflare settings, default Vercel WAF rules, and default WordPress security plugins routinely block AI bots without warning.

The robots.txt allow-list we ship

GPTBot and ChatGPT-User and OAI-SearchBot for ChatGPT and OpenAI search products. PerplexityBot and Perplexity-User. ClaudeBot and Claude-Web and anthropic-ai. Google-Extended for Bard and AI Overviews. Applebot-Extended. CCBot for Common Crawl, which feeds many open-source models. Cohere-AI. Meta-ExternalAgent. Bytespider — TikTok's crawler — usually blocked because it is aggressive and most projects do not want TikTok lifting their content.

What you also have to do

Robots.txt is necessary but not sufficient. Three other checks. Cloudflare bot fight mode and bot management — turn off the ones that block AI agents, or whitelist them by IP and user-agent. Vercel WAF and Edge Middleware — make sure they do not match AI user agents on a generic regex. WordPress security plugins like Wordfence — they often ship rules that block GPTBot and PerplexityBot by default; whitelist explicitly. Test by curling each user agent against three representative URLs and confirming a 200 response.

BUILT TO GDS SERVICE STANDARD

We build technical SEO foundations that align with the UK Government Digital Service Standard and the GOV.UK Design System. The GDS Service Standard is the most rigorously documented set of digital-service quality principles in the public domain: progressive enhancement, accessibility to WCAG 2.2 AA, performance, semantic HTML, and graceful degradation when JavaScript fails. We follow it on every Seahawk technical SEO engagement.

Why this matters for SEO specifically: GDS-aligned sites pass Core Web Vitals consistently, rank well in classic organic, and are uniquely well-suited to AI Overview citation because the structural clarity that GDS demands is the same structural clarity that AI surfaces extract from. The standard predates the AI search era but maps perfectly onto it.

For UK enterprise, public sector, and regulated-industry clients, GDS alignment is a real procurement signal. For everyone else, it is a quality marker that sets the work apart from agencies that ship to a lower bar.

WHAT DOES A TECHNICAL SEO ENGAGEMENT WITH US ACTUALLY LOOK LIKE

Three to ten weeks, three phases, fixed price. Phase one is audit — full crawl, GSC and Ahrefs review, JSON-LD validation, hreflang cluster check, Core Web Vitals field-data pull, AI citation baseline. Phase two is remediation — we ship the fixes, your team or ours. Phase three is the linter and monitoring layer that prevents regressions.

The audit phase deliverables

Full Screaming Frog crawl in CSV plus a written brief on the issues that matter, ranked by impact.
Search Console export — coverage report, queries, page-level performance — with annotations on what is anomalous.
Schema validator pass over a sample of templates and a written note on every type that is broken or missing.
Hreflang cluster integrity report on the translatable routes.
Core Web Vitals field-data pull from CrUX with a chart of the last 25 weeks per metric.
AI Overview and Perplexity citation baseline — current share, gap analysis against three named competitors.

The remediation phase

Fixed-scope work. Each ticket has a clear before-state and after-state. We ship them in priority order and you can stop the engagement at any milestone if the budget runs out. The first batch we ship is always the items that fail the build linter — they have the shortest path to value because they will keep regressing without the linter, and once the linter is in, every fix sticks.

The monitoring phase

Weekly automated linting on staging and production, monthly Core Web Vitals report, monthly AI citation tracking, and a kept-warm Slack channel where I respond to anything Google or your team flag. We hand off the dashboards on close so you keep visibility whether you renew or not.

FREQUENTLY ASKED QUESTIONS

What is technical SEO?

Technical SEO is the practice of making sure search engines and AI assistants can crawl, render, and understand your site at the level of HTTP, HTML, structured data, and Core Web Vitals. It sits beneath content and links and decides whether either of those ever get rewarded.

Do I need technical SEO if my content is good?

Yes. Good content on a site that returns 5xx under load, ships 4 MB of JavaScript before the H1, has invalid JSON-LD, leaks 302 redirects everywhere, or hides text inside a client-only React component will not rank — and will not be cited by AI Overviews. Technical SEO is the floor; content is the ceiling.

How long does a technical SEO engagement take?

A focused audit and remediation cycle on a 200-page WordPress site is 3-5 weeks. A headless Next.js or Astro site with multi-locale and ISR usually runs 5-10 weeks. Programmatic platforms over 50,000 URLs need a longer arc — 8-16 weeks of phased work plus an ongoing linting and monitoring layer.

How is technical SEO different on Next.js vs Astro vs Nuxt vs WordPress?

Each stack has its own footguns. Next.js with the App Router is excellent if you handle ISR cache invalidation, hreflang in metadata, and the streaming sitemap pattern. Astro is the simplest static path, but partial hydration interactions can break server-rendered answers. Nuxt 3 mostly behaves but has historically struggled with hreflang and trailing-slash hygiene. WordPress out of the box is the noisiest — duplicate canonicals, plugin schema fights, redirect chains from page builders. The remediation playbook differs for each.

What is the difference between SEO, GEO, and AEO?

SEO is optimising for blue-link search results in Google and Bing. AEO (Answer Engine Optimisation) is optimising for Google features like Featured Snippets, People Also Ask, and Knowledge Panels — answer boxes that pull from indexed pages. GEO (Generative Engine Optimisation) is optimising for AI assistants like ChatGPT, Perplexity, Bing Copilot, and Google AI Overviews — getting cited inside generated answers, which is a different ranking mechanism altogether.

Will AI search kill blue-link traffic?

It is already eating into informational traffic — zero-click queries are up sharply since AI Overviews launched in 2024. Commercial and navigational queries are more resilient. The honest read is that traffic profiles are shifting; sites that are cited inside AI Overviews keep relevance, sites that are not lose it. Adapting your content structure for passage extraction is the work.

Do I need to rewrite all my content for AI search?

No. The expensive change is structural. Convert topic-label H2s ("Our approach") into question H2s ("How does our approach differ from competitors?"), put the answer in the first 1-2 sentences after each heading, keep each answer under 250 words, and make sure none of those answers live inside a JavaScript-rendered component. The wording mostly stays.

What schema do I actually need?

Per page type: Article or BlogPosting on long-form, Service on commercial pages, Product on ecommerce, FAQPage when there are at least two real questions, BreadcrumbList on every non-home page, and a single sitewide Organization graph in the layout. Add the about and mentions arrays on Article pages where the entity graph matters. Skip the rest until you can measure it.

What is a build-time SEO linter?

A script that runs at the end of your build, samples a slice of rendered HTML files, and fails the build if it finds missing H1s, oversized meta descriptions, broken hreflang clusters, invalid JSON-LD, or banned content patterns. Without it, SEO regressions ship to production every time someone forgets to set seo_description or breaks a schema emitter. We add one to every site we build.

Do you handle WordPress and headless?

Yes. Most of our work is split across managed WordPress (Cloudways, Kinsta, WP Engine), headless WordPress paired with Next.js or Astro front-ends, and pure modern stacks — Next.js with Sanity or Payload, Astro with Storyblok, Nuxt with Strapi or Contentful. Pick the stack to match the brief; we do not push a default.

How do you measure technical SEO success?

A combination of crawl health (zero 5xx in Search Console, indexable URL count tracking expected URL count), Core Web Vitals field data from CrUX (75th percentile passing on LCP, INP, CLS), AI Overview citation tracking via Otterly or manual sampling, and organic traffic in GA4 weighted against impressions in GSC. We report against all four monthly during the engagement and hand off the dashboards on close.

WHAT THE FIRST 48 HOURS LOOK LIKE

Book a 30-minute call. Tell me your domain, your stack, and what is bothering you in Search Console or Ahrefs. By the end of the call you will have an honest read on whether your problem is technical, content, or trust — and what an audit and remediation engagement would look like for your specific stack. If we are not the right fit I will tell you who is.

BOOK A 30-MIN CALL EMAIL ME DIRECTLY