Back in 2022 I watched a client burn through £40,000 building a local services directory. Beautifully designed. 80,000 pages auto-generated from a clean Airtable base. Launched in March. By June it had 214 indexed pages and ranked for exactly nothing. The problem wasn't the idea — directories are still one of the few programmatic SEO plays that can compound into serious organic traffic. The problem was that they'd done everything technically correct and strategically wrong.
This post is about not making that mistake.
---
What "Programmatic SEO" Actually Means for a Directory in 2026
People throw this phrase around like it's one thing. It's not. For a directory specifically, programmatic SEO means generating hundreds or thousands of location-, category-, or attribute-scoped pages from a single template and a structured data source — and doing it in a way where each page gives Google a reason to rank it over a hand-written competitor.
That last part is where most directories fall flat.
The 2026 version of this game is harder than it was in 2019.Google's Helpful Content systemhas been baked into the core ranking algorithm since late 2023, which means thin templated pages get downweighted at a site level, not just a page level. One bad batch can tank your whole domain. I've seen it. Seahawk had a travel aggregator project in late 2023 where 12,000 city pages — each with roughly 90 words and a listings table — dragged the entire domain's crawl budget into the floor within eight weeks of launch.
So the baseline bar is higher. But the opportunity is still massive.
---
The Data Layer Is Everything
Start with a source that has depth, not just breadth
Most directory builders start by asking "how do I get 50,000 listings?" They should be asking "what do I actually know about each listing that nobody else does?"
I use Airtable for small-to-medium projects (under 100k records) and either Supabase or a straightforward PostgreSQL setup for anything larger. The tool matters less than the schema. Every listing in your database should have fields that can generatedifferentiatedpage content. Not just name, address, phone. Think: year founded, price range, average review sentiment, number of verified reviews, specialisms, last verified date, distance from city centre, whether they have a physical location vs. remote-only.
More fields = more angles for on-page differentiation. Simple as that.
Scraping vs. licensed data vs. user-submitted
Honest answer: all three have a role, and I've used all three.
- Scraped datais fast and cheap but degrades quickly. I ran a UK accountants directory in 2021 that scraped Companies House data. Within 14 months, 23% of the records were stale.
- Licensed data feeds(think Dun & Bradstreet, Yext, or vertical-specific APIs) are expensive but accurate. Worth it if your monetisation model supports it.
- User-submitted listingsstart slow but create the freshness signals Google rewards. Add a "claim your listing" flow from day one, even if you have two hundred listings total.
The directories that compound traffic over 18–24 months are almost always the ones that mix licensed seed data with ongoing user contribution.
---
Template Architecture: The Part Nobody Talks About
Here's the thing most tutorials skip. The difference between a programmatic directory that ranks and one that gets filtered into oblivion is usually at the template level — not the data level.
One template is not enough
You need at minimum three template tiers:
- Hub pages— "Best Solicitors in London" style. High competition, editorial tone, manually curated or heavily enriched. These are the pages you point links at.
- Category × location pages— "Family Law Solicitors in Manchester". Mid-tail. These can be more templated but need at least one dynamic section that pulls genuinely unique data (review counts, average fee bracket, notable listings).
- Individual listing pages— The leaf nodes. These live or die by data richness. If every listing page has the same 60-word description and a phone number, Google will figure that out fast.
I've tested this split on four directory projects in the last two years. The ones with a clear three-tier hierarchy consistently outperformed flat architectures inGoogle Search Consoleimpression data within the first 90 days of indexing. Not a coincidence.
Dynamic content blocks that actually help
Stop stuffing pages with AI-generated boilerplate. Instead, build template logic that pulls:
- Related listings in the same postcode district
- "Also viewed" categories from your own analytics
- A "last updated" timestamp that's actually accurate (not just today's date injected by JS)
- User review snippets, even if you only have three reviews — three real ones beat zero fake ones
The goal is that a human landing on a leaf-node listing page walks away with something they couldn't have Googled for themselves.
---
Internal Linking: Your Most Underused Ranking Lever
I'll be blunt. Most programmatic directories have catastrophic internal linking. Pages exist. They point nowhere useful. Google's crawler visits once, sees a dead-end, and deprioritises the whole subdirectory.
A proper internal linking architecture for a directory looks roughly like this:
- Homepage → top hub pages (manually curated, 8–15 links)
- Hub pages → category × location pages (dynamic, based on listing count)
- Category × location pages → individual listings (paginated, max 20–25 per page)
- Individual listings → related category × location pages (2–3 contextual links)
- Individual listings → "nearby" listings via a distance-based query
That last one — nearby listings — is underrated. It creates a crawlable web inside your leaf nodes that keeps Googlebot moving through the site rather than bouncing back up to the hub. I implemented this on a dental directory for a client in Birmingham in early 2024 and the crawl rate from GSC went up 3.4x within six weeks.
UseScreaming Frogto audit your link graph before you launch, not after. The free tier handles up to 500 URLs, which is plenty for a sanity check on your templates.
---
Handling Indexation at Scale Without Getting Burned
Google will not index all 80,000 of your pages. Accept this. Work with it.
The practical approach I use:
- Submit only your hub and category × location pages to the sitemap on launch day
- Let Google discover leaf nodes through internal links, not the sitemap
- Use
noindexaggressively on thin, duplicate, or low-data listing pages until you can enrich them - Set up a crawl budget report in GSC (Settings → Crawl Stats) and check it weekly for the first three months
Thenoindexadvice always gets pushback. "But I want all my pages indexed!" Yeah. And Google wants all of them to be good. You can't have 40,000 thin pages indexed and also have a healthy domain authority. Pick one.
One more thing: pagination. Use properrel="next"andrel="prev"where appropriate, but also consider whether you need paginated category pages at all. On three recent projects I replaced paginated listings with a JS-loaded "show more" approach (with a static fallback for crawlers) and saw cleaner indexation patterns in GSC within 60 days.
---
Content Enrichment at Scale Without Losing Your Mind
Right. So you've accepted that thin pages are death. How do you actually enrich 20,000 listing pages without a team of content writers?
A few approaches that work in practice:
- Structured review aggregation.Pull from Google Business Profile data via their API, or scrape (carefully) from Trustpilot or Yelp where ToS allows. Even a star rating + review count displayed as structured data adds measurable differentiation.
- Automated freshness signals.Write a script that hits your listings weekly and checks whether the business website, phone, or address has changed. Update the record. Show the "last verified" date on the page. This alone reduced our bounce rate on a legal directory by 18% — people trust current data.
- LLM-assisted summaries, used carefully.I do use GPT-4 to generate structured summaries for listings where we have enough raw data. But the prompt is tightly constrained to the specific data fields for that listing — it's not generating generic blurb. And every summary is filtered through a similarity check (I use a basic cosine similarity script against the full corpus) to catch near-duplicate outputs before they go live.
---
The Monetisation Model Shapes Your SEO Architecture
This one catches people off guard. How you plan to make money from the directory directly affects which pages you prioritise, how much data depth you need, and whether you can afford the content enrichment that ranking requires.
The three models I've seen work consistently:
- Paid listings / featured placement.Simple. Businesses pay to appear higher or with enhanced profiles. Incentivises you to grow the free tier to create the marketplace dynamic.
- Lead gen.You capture enquiry form submissions and sell them to businesses. Higher revenue per conversion but requires significantly richer listing pages to earn the trust needed for form fills.
- Affiliate / referral.Works well in verticals like software, finance, or hospitality where there are established affiliate programmes. Niche directories in SaaS tool categories can hit £10k–£30k/month on this model with under 5,000 pages if the keyword targeting is right.
Pick your model before you design your templates. A lead-gen directory needs trust signals and conversion elements baked into every listing page from day one — adding them later is always messier than it sounds.
---
FAQ
Does programmatic SEO still work after Google's 2024 algorithm updates?
Yes, but the threshold for "good enough" is significantly higher than it was even two years ago. TheMarch 2024 Google core updatehit a lot of thin programmatic sites hard — particularly those relying on templated AI content with no unique data. Sites with genuine data depth and clear entity relationships weathered it fine. In some verticals, those sites actually gained ground as thin competitors got filtered out.
How many pages should I launch with on day one?
As few as you need to demonstrate the concept to Google. I'd rather launch with 500 genuinely good pages than 50,000 thin ones. Build your hub pages and top 20 category × location combinations first. Get them indexed, get some early ranking signals, then roll out the long tail in batches. Rushing to 100,000 pages in month one is almost always a mistake.
What CMS or tech stack should I use?
For most clients I still use WordPress with a custom post type and ACF Pro pulling from a database. It's not glamorous but it's fast to build, easy to hand off, and the plugin ecosystem for SEO (Rank Math, specifically) is mature. For higher-scale projects — over 50,000 pages — I'll typically go headless with Next.js and a PostgreSQL or Supabase backend. The SSG/ISR capabilities in Next.js are genuinely useful for keeping crawl behaviour clean at scale.
How long before a programmatic directory starts ranking?
Realistically? Six to nine months for meaningful traffic, assuming you've done the architecture right and you're in a vertical where Google isn't explicitly preferring large established brands. I've seen exceptional cases hit traction in four months and disappointing ones take 18. The variable that matters most, honestly, is topical authority — how clearly your site establishes expertise in a specific vertical from day one.
---
The directory SEO playbook isn't dead. It's just been properly price-discriminated by Google. The operators who got burned in 2023–24 were mostly building for volume rather than value. Build for value first — deep data, honest enrichment, a link architecture that respects how Google actually crawls — and the volume takes care of itself over time. It always has.
