BUILDING A DIRECTORY SITE
Data model, templated rendering, indexability gates, and programmatic internal linking. From shipping HostList.io and Not Another Sunday at six-figure-page scale.
What this guide is
Directory sites and listing platforms are a specific category of programmatic-SEO project that I have shipped at meaningful scale. HostList.io has roughly 28,000 web hosting companies indexed across categories, regions, and pricing tiers, all generated programmatically from a structured data model. Not Another Sunday has 137,000 UK pubs, bars, and restaurants. Both run on the same architectural pattern: a clean data model, a templated page renderer, a strict indexability gate, and a programmatic internal-link graph.
This guide is the operator view of how to build a directory or listing platform that holds up under search-engine scrutiny in 2026, what to avoid, and the parts of the conventional advice that are wrong in practice. From shipping multiple directory platforms personally and at Seahawk Media client engagements.
When a directory is the right product
A directory site wins when three things are true at once: a fragmented market with many competing or similar entities, an audience that searches with comparative intent (best, top, vs, cheapest), and the absence of a single dominant aggregator who already owns the SERPs.
Examples that worked: web hosting companies (fragmented, comparative, no single Google-owned aggregator), London restaurants (fragmented, query intent is intense, the existing aggregators are weakening), niche software tools by category (fragmented, the comparison sites are inconsistent in quality).
Examples that did not work: anything where Google itself is the dominant aggregator (jobs, flights, mortgages in some markets), anything where the entities are too few to support thousands of pages (top-tier law firms in a single city, for instance), anything where the comparison criteria are too subjective to encode in a data model.
The data model decision
Normalise hard up front
Every entity (the company, the location, the product) gets a single canonical row with a stable ID and a slug that never changes. Every attribute (category, price tier, region, feature) gets its own table or enum, never a freeform text field. Many-to-many relationships get junction tables. The cost of getting this wrong on day one is years of slug instability and broken redirects.
Slugs are forever
Once a directory page is indexed by Google, the slug should not change. Use slug-of-record patterns: deterministic generation from the entity name, collision handling via numeric suffix, and a lookup table that maps any historical slug to its current one for permanent 301 redirects. We have a slug-stability rule on HostList.io that has not changed in three years.
The schema design that scales
On Supabase: an entities table with the master record, a categories table, an attributes table, junction tables for many-to-many, and a generated computed_slug that is unique. RLS policies allow public read for published rows, admin write only. Indexes on every column you query at the page level. The schema for HostList.io is roughly twelve tables and has not needed structural changes since launch.
The template renderer
One template, many faces
A directory site has roughly six page archetypes: the entity page, the category listing, the cross-category comparison, the regional page, the home, and the search. Each gets one template that renders thousands of pages. The discipline is: make the template good enough that no individual page would benefit from special treatment, and resist the urge to special-case.
Above-the-fold uniqueness
Each entity page must have unique content above the fold: a unique opening paragraph, unique key facts, unique pricing or feature data. The unique content can be programmatically generated from the data model (HostList.io does this with templated openings that interpolate the company name, founding year, primary market, and one differentiating fact), but it must read as if it was written for that entity, not as a fill-in-the-blanks template.
Schema markup at scale
Every entity page emits Organization or LocalBusiness schema with name, url, address (where applicable), aggregateRating (where genuinely available), and sameAs links to verified social profiles. BreadcrumbList on every category and entity page. ItemList on every category listing. The schema layer is what makes the directory machine-readable for both Google and the AI surfaces.
The indexability gate that prevents disasters
The Helpful Content Update is the single greatest threat to a directory site, and the most common cause of a domain-wide ranking collapse on programmatic sites. The countermeasure is a per-page indexability gate that decides which pages are worth indexing.
Quality thresholds per page
Pages below a content quality threshold get noindex on the page itself, regardless of how the rest of the site is configured. Examples of quality thresholds we apply: minimum word count of unique content (300 words at HostList.io), minimum number of structured data fields populated (8 of 12 for hosting companies), minimum number of internal links pointing to the page (3 incoming links from other entity or category pages).
Sitemap follows the gate
The sitemap excludes any page that is gated. The sitemap is the most reliable signal you can send Google about which pages you want indexed; pages excluded from the sitemap but reachable via crawl get crawled less frequently and ranked weakly. The sitemap discipline keeps the indexed surface clean.
Robots and noindex are layered
We do not use robots.txt to block indexability decisions; robots.txt blocks crawl entirely, which removes Google's ability to honour the noindex header. The right pattern is: allow crawl, set noindex on gated pages via meta robots header, exclude from sitemap.
The internal-link graph
Programmatic linking, programmatically
At directory scale, internal linking has to be generated, not hand-curated. Each entity page links to its parent categories, its sibling entities (same category, similar attributes), the regional page if applicable, and the home. Each category page links to its child entities, parent categories, and lateral categories. The graph is computed at build time and updated whenever entities are added or removed.
Anchor text variety
Internal anchor text must vary. Linking 8,000 pages to a category page with the same anchor text is a strong negative signal. We rotate anchor text across a set of templates: "[category] companies", "best [category] services", "[category] at [region]", "[entity] alternatives". The rotation is deterministic per source page so the graph is stable across rebuilds.
No more than three internal links per paragraph
Heavy internal linking is fine; clustering links makes pages look auto-generated. We cap internal links at three per paragraph and at thirty per page, with the rest distributed across the page. The discipline is editorial, applied at the template level.
Hosting and infrastructure for directories
Static rendering with periodic revalidation
On Next.js: static generation at build time, ISR with a 24-hour revalidation window for entity pages, on-demand revalidation when an entity is updated through the admin. Vercel handles 28,000 pages comfortably; the cost lever is ISR write events, which we keep manageable by gating revalidation strictly to entity changes rather than every content tweak.
Database connection pooling
Direct Supabase queries from the page template do not scale; you hit connection limits during a full rebuild. We use the static-generation pattern: at build time, fetch all pages in a few large queries, generate the pages from in-memory data. Page renders never hit the database directly. Rebuild times stay under ten minutes for HostList.io at 28,000 pages.
CDN and edge caching
Cloudflare in front of the origin always. The free tier handles directory traffic comfortably; paid tiers add Argo for global routing if traffic warrants. CDN caching reduces origin load to roughly 5% of public traffic at a typical directory traffic profile.
The metrics that diagnose directory health
Three metrics I check weekly on every directory site I run:
Indexed pages versus published pages
Search Console > Pages > Indexed compared to your sitemap count. Healthy: 90%+. Below 80%: Google has signal-quality concerns; review the gating thresholds. Below 70%: structural problem, likely thin content or duplicate intent.
Average position by template archetype
Search Console filtered by URL pattern, segmented by entity / category / regional. Tells you which template is working and which is dragging the domain. We have caught template regressions on HostList.io within 7 days using this metric.
Clicks per indexed page
Total clicks divided by indexed pages, weekly. Healthy directory sites: 0.3 to 1.5. Below 0.2: the index is too broad and pages without traffic are diluting the domain. Tighten the gate.
The mistakes that tank directory sites
Five mistakes that have killed directory sites I have audited:
1. Hand-curated internal linking that did not survive a content addition. The graph needs to be programmatic from day one.
2. Slugs that changed when entity names were corrected. Even one slug change without a 301 costs significant ranking signal; ten slug changes can be terminal.
3. Indexable pages with no unique content. The Helpful Content Update is unforgiving on directories where category pages all read like the same boilerplate.
4. AggregateRating schema with fake or unverifiable ratings. Google detects this and devalues the schema globally. Use AggregateRating only when the ratings are real and verifiable.
5. Treating the directory as a launch project rather than an ongoing operational concern. Directories need active maintenance: stale entities removed, new entities added, broken external links pruned, schema kept in sync. Without that, directories degrade over 12 to 24 months.
The bottom line
A directory site that succeeds in 2026 is a clean data model, a disciplined template renderer, a strict indexability gate, a programmatic internal-link graph, and ongoing operational care. The architecture is repeatable; what varies is the data quality and the editorial judgement about which entities and categories deserve indexable surface.
You do not need to do all of this on day one. You do need to know which of these you have not yet built, and to fix the next one before Google notices.
We build directory and listing platforms at Seahawk Media starting from 18,000 USD. The HostList.io case study illustrates the pattern at scale; the conversation about what your specific directory should look like is free.
WHEN YOU ARE READY TO TALK
If you are mid-build on something this guide touches and want a second pair of eyes, the fastest path is a 30-minute call.
BOOK YOUR 30-MIN CALL