hostlist-io.html

A 25,000-company hosting directory in Next.js with crawl-budget-tuned Core Web Vitals

The proof artefact for what a programmatic SEO directory looks like when you build it the right way — pages that pass Lighthouse, schema that survives a manual audit, content quality gates that keep Google's thin-content patrol away.

BOOK A 30-MIN CALL

The numbers

  • 25,000+ Companies indexed
  • 6 page types Templates company, list, comparison, country, datacenter, category
  • 95+ Lighthouse mobile
  • 5–9 Schema items per page depending on page type

The brief

Web hosting is one of the most crowded SEO categories on the internet. The directory category itself is owned by HostingAdvice, WebsitePlanet, HostAdvice, and a handful of others — most running on outdated WordPress installs with the same recycled comparison content. The opening was: build a directory at modern-stack speed, with real schema rigour, and let the Core Web Vitals score do the discovery.

The constraint was the constraint that always matters with programmatic SEO — quality at scale. 25,000 pages cannot all be hand-written; if they are 25,000 templated pages with no real content differentiation, they trigger thin-content penalty in 90 days. The architecture had to make every page genuinely distinct without requiring 25,000 hours of human writing.

The content pipeline

Six page templates, each with a defined structured-data shape and a content quality gate before publish. Company pages pull live performance data, recent reviews, pricing history, and a generated honest-take section that uses the data to write the actual prose. List pages and comparison pages aggregate from the same source-of-truth, so a number that changes once propagates across the cluster.

Quality gates: minimum word count, Flesch-Kincaid floor, schema.org validation pass, banned-word list (the AI-prose tells like "leverage", "robust", "seamless"), uniqueness check against the site's own existing content. Pages that fail go back to the queue with the specific failure flagged. This is the pattern the auto-blog scripts on this site evolved from.

The Core Web Vitals story

Crawl budget is the silent killer at directory scale. If every page on a 25,000-page site takes 4 seconds to render, Googlebot crawls a fraction of the catalogue, and the long-tail rankings never materialise. The architecture is built for cheap crawls: static where possible, on-demand ISR where not, image pipeline cropped per layout, no client-side JavaScript on the listing pages, only the company-detail pages have any meaningful JS.

The result is a Lighthouse mobile score over 95 on every page type and a Googlebot crawl rate that captures the catalogue end-to-end on a weekly cycle. The traffic story follows from that: long-tail queries that would normally take six months to surface in a thin directory site surface here in eight to twelve weeks.

What this proves for the broader work

HostList is the working version of every claim made in /blog/programmatic-seo-quality-gates-2026/, in /blog/crawl-budget-large-sites-hostlist/, and across the directories topic. When a client asks whether modern-stack programmatic SEO can really work at scale, the answer is "yes, here is the live site". The architectural decisions are explicit; the cost numbers are explicit; the SEO transport considerations are explicit.

For clients in the directory or programmatic-SEO category, this case study is the most direct proof. The codebase is private but the patterns are public.

The stack

  • Next.js
  • Supabase
  • Vercel
  • Anthropic Haiku
  • Algolia

Timeline: 6 months from kickoff to 25k pages live

Related work

If your project shape matches this one — the conversation is short

If the architectural pattern here is close to your project shape, the next step is a 30-min call. Describe your brief; I tell you whether I am the right person; by the end you have a stack pick, a price range, and a delivery window. No deck, no qualification screen.