Directory websites that survive 28,000 pages without thin-content penalties.
Programmatic-SEO directory and listing platforms on Next.js plus Supabase. Built by the operator who runs HostList.io — about 28,000 web hosting company pages live since 2024 on this exact stack.
HostList.io: ~28,000 pages live Next.js + Supabase + Vercel Streaming sitemap past 50,000 URLs Quality gates on every listing
WHAT KIND OF DIRECTORIES DO YOU BUILD
Pretty much any directory shape, given a structured data source. Over the last two years, the patterns I have shipped break down into four broad types, and most client projects are some flavour of one of these.
Industry directories list companies inside a vertical, segmented by category, location, size, and feature set. HostList.io is the canonical example I run myself — about twenty-eight thousand web hosting companies, sliced by hosting type, region, price band, and use case. Buyers find providers, providers get traffic, and the directory itself monetises through sponsored placements, affiliate links, or paid premium listings depending on what suits the vertical.
Local and location directories are the second pattern. Restaurant guides, pub guides, dentist directories, contractor directories. Every listing carries LocalBusiness schema with geo coordinates, opening hours, and ratings where you have data rights. Programmatic city-and-category pages — "best Italian restaurants in Manchester" or "pubs in Stoke Newington" — provide most of the long-tail SEO surface area on these sites.
Tool and software directories list software products inside a category. CRM tools. Project management apps. No-code platforms. AI tools. The traffic engine on these is comparison pages — Notion versus Linear versus ClickUp — and feature-matrix pages, where the searcher already knows the names and just wants a tiebreaker.
People and service directories are the fourth pattern. Agencies. Freelancers. Consultants. Photographers. Lawyers. The challenge with this one is that most people directories die because the listings get stale and nobody updates them. We build in expiry workflows and self-service profile editing on day one of the project rather than retrofit it later.
WHAT IS THE HOSTLIST CASE STUDY
HostList.io is the directory I built solo to catalogue the entire web hosting industry. About twenty-eight thousand hosting company pages, live since spring 2024, on the same Next.js plus Supabase plus Vercel stack we now use for client directory builds.
What HostList does is catalogue every web hosting company we can verify, segmented by type — shared, VPS, managed WordPress, cloud, dedicated, reseller — region, price band, and use case. There are comparison pages between specific hosts, category pages for each segment, a search and filter UI that handles the twenty-eight-thousand-row dataset without query latency, schema markup on every listing, and a streaming sitemap because the URL count is already past what a single sitemap.xml can hold.
Three lessons from running it shape every client directory build now. First, data quality is the entire game. Pages with three unique data points beyond the entity name survive Google updates; pages with only a name and a generic description get de-indexed. Second, internal linking matters more than backlinks at this scale. The link graph between listings, categories, and comparison pages decides which leaf pages get crawled often enough to stay indexed. Third, programmatic does not mean lazy. Every page needs a reason to exist, and "we have a row in the database" is not a reason.
We held about fifteen percent of the database back from index because the unique-data threshold was not met on those rows. We cut category pages that had under five strong listings because they read as thin even when the underlying schema was correct. We added comparison pages between named competitors as a separate page type, and that template ended up being some of the highest-converting traffic on the site. The same playbook is now standard on every directory we ship for clients.
WHY MOST DIRECTORY SITES FAIL
More directories die than survive, and the failure modes are predictable enough that I can usually tell on the first call which one a project is heading toward.
Thin-content de-indexing is the most common failure. A directory launches with five thousand listings, half of them only have a name and a one-line description, and Google indexes the first fifteen hundred then stops. The site reads as a low-effort scrape. Six months later most of the indexed pages get de-indexed in a core update. The fix has to be at data-collection time — every row needs three unique data points before it qualifies for the sitemap, not "we will fill it in later".
Stale data drift is the second pattern. A directory that listed accurate businesses in 2023 lists half-defunct businesses in 2026 because nobody updated the rows, the contact information goes out of date, the websites resolve to parking pages, and the directory loses trust signal with both Google and human visitors. We build in either crowd-sourced editing flows where the listed business can claim and edit their profile, automated freshness checks that disable dead listings, or both. Without a freshness layer the directory ages out of relevance regardless of how good the original data was.
No moat is the third pattern. Three competing directories cover the same vertical with similar data. None has unique data, so none has a defensible reason to exist. Search-share fragments and none of them rank. The fix is the editorial layer — original analysis, scoring, recommendations, comparison frameworks — that the underlying data alone cannot provide. HostList competes on its scoring rubric, not on its hosting list, because the hosting list itself is not particularly defensible.
Index bloat from filters is the fourth pattern. A directory with eight filter dimensions can technically generate millions of URL combinations. If every combination is indexable, you flood Google with thin pages and dilute the strong ones. We always block thin filter combinations from index — anything with under three listings gets noindex, anything with no real query intent like sort orders or page-2 onwards gets noindex, and only the canonical filter combinations that map to real searches stay indexable.
WHAT GOES INTO A DIRECTORY BUILD WE SHIP
A reference architecture for a directory ships with five layers. Each project flexes the specifics, but the spine repeats across builds.
The data layer is Postgres via Supabase or self-hosted, with proper indexes on every facet column. There is a dedicated listings table per entity type — companies, products, locations, people — and quality-gate columns alongside the content (uniqueness score, completeness percentage, last-verified timestamp). A sitemap-eligibility view filters out rows below the quality threshold automatically.
The page templates split into a listing detail page (full data, related listings, schema, breadcrumb), a category page (paginated list with filter UI and ItemList schema), a comparison page for head-to-head between named entities, a location page with map embed and geo schema where geography matters, and about and methodology pages that carry the original editorial weight the underlying data cannot provide.
Search and filter use Postgres full-text search up to about ten thousand listings, then Algolia or Meilisearch for larger directories with low query latency requirements. Server-rendered filter URLs give every filter combination a canonical, and noindex on thin or duplicate combinations prevents index bloat. Submission and moderation get a public submission form where the model is crowd-fed, an admin queue with quality-gate scores surfaced for moderator review, templated rejection emails with specific reasons, and a self-service edit flow for listed entities to claim and update their own profile.
SEO scaffolding is the layer that decides whether the directory survives. Streaming sitemap with a chunk-per-template pattern, schema.org Organization or Product or Place or Service or LocalBusiness on every listing as appropriate, CollectionPage with ItemList on category pages, BreadcrumbList everywhere, canonical URL emitted from a single source of truth (the database, not the template), and a build-time SEO linter that fails the build on missing H1, oversized meta descriptions, or invalid JSON-LD.
Monetisation comes through featured listings (a boolean flag promotes a row to the top of category pages), sponsored category placements (a brand owns the top of one category for a billing period), affiliate-link tracking with proper rel="sponsored" attribution, and paid premium tiers for listed entities to get better placement, more rich data fields, and analytics access.
WHAT DATA SOURCE DO YOU NEED TO BUILD A DIRECTORY
The single biggest variable in a directory project is the data source itself. Most engagements live or die on the answer to one question: where will the data come from on day one, and how will it stay fresh after launch?
Manual editorial means a team writes every listing. Slow, expensive, but defensible. Suitable for under one thousand listings. Examples I have seen work: high-end hotel guides, curated agency directories, niche editorial sites where the act of being listed is itself the value.
Structured import means you bring a CSV or database export from somewhere reliable, and we clean, dedupe, enrich, and ingest it. Suitable for one to one hundred thousand listings. Examples: industry directories with public data, government register imports, companies-house style exports.
Automated scraping or API means listings get populated from a third-party API or a respectful scraping pipeline. Legally and ethically dependent on the source. Suitable for ten thousand to millions of listings where the data lives in a known canonical place. Examples: developer tool directories pulled from GitHub, hosting reviews scraped from public reviews on the company sites themselves.
User-submitted means listings come from the people being listed. Cheap to launch, expensive to moderate. Best as a layer on top of editorial seed data, not as the only source. The hybrid pattern (editorial seed plus structured import plus annual editorial review) is what HostList runs and what most real directories end up doing whether they planned for it or not.
On the first call we will ask which combination matches your data reality. If you do not have a clear answer, the data question is itself the first phase of work; the build comes after.
HOW MUCH DOES A DIRECTORY BUILD COST AND HOW LONG DOES IT TAKE
Honest ranges based on real recent engagements rather than aspirational pricing on a sales deck. A small editorial directory under one thousand listings runs eighteen to thirty-five thousand US dollars over six to nine weeks. A mid-sized directory of one to ten thousand listings with a structured data import runs thirty to sixty thousand over ten to fourteen weeks. A large directory of ten to one hundred thousand listings, programmatic at scale, runs fifty to ninety thousand over twelve to eighteen weeks. A marketplace shape — two-sided, with bookings or transactions — runs sixty to one hundred fifty thousand over fourteen to twenty-two weeks.
All ranges include the SEO scaffolding (schema, sitemap, linter), the search-and-filter layer, and a basic admin dashboard. They do not include data acquisition itself (manual editorial, scraping infrastructure, third-party API costs), original brand and design work, or paid traffic acquisition. Care plans for ongoing operation run five hundred to three thousand US dollars per month after launch.
FREQUENTLY ASKED QUESTIONS
What is directory website development?
Directory website development is the process of building a site that catalogues and lists entities — companies, products, locations, tools, services — and surfaces them through search, filter, category, and individual listing pages. The work spans data modelling, programmatic SEO, schema markup, internal linking at scale, and a publish pipeline that handles thousands or hundreds of thousands of pages without breaking.
How is a modern directory site different from a 2010-era directory?
The 2010-era directory was a WordPress site with a custom post type, a category taxonomy, and a home-rolled search box. The modern directory is a programmatic SEO platform — one template plus a structured data source generates thousands of unique pages, each indexable, each schema-tagged, each Core-Web-Vitals-passing, each cross-linked to relevant siblings and parents. The bottleneck shifted from "how do we list things" to "how do we keep ten thousand pages indexed without thin-content penalties".
What stack do you use to build a directory site?
Default stack: Next.js App Router for the front-end, Supabase or Postgres for the database, Vercel for deployment, Algolia or Meilisearch for site search where the volume justifies it, and a streaming sitemap because directory sites pass 50,000 URLs faster than you expect. We used the same stack to build HostList.io, a programmatic-SEO directory of about 28,000 web hosting companies live since 2024.
Tell me about the HostList case study.
HostList.io is a directory I built solo to catalogue the entire web hosting industry. About 28,000 hosting company pages, every page programmatically generated from a structured data source, every page indexable, every page passing Core Web Vitals, every page schema-tagged with Organization plus the relevant offer types. Live since 2024 on Next.js plus Supabase plus Vercel. Lessons from running it at scale inform every directory we build for clients now — the data quality gates, the thin-content avoidance pattern, the streaming sitemap, the internal-link graph that pulls every leaf page into a topical cluster.
How do you avoid thin-content penalties on a programmatic site?
Three rules. Every page needs at least three unique data points beyond the entity name — a price, a description, a feature list, a location, a rating, anything that is not shared across all listings. The template adds context, comparison, recommendation, or aggregation around that unique data, not just an SEO wrapper. Pages with insufficient unique data are kept out of the sitemap and blocked from index until the data layer fills in. We hold roughly 15% of HostList's database back from index for this reason; the indexed pages are the ones with enough unique signal to deserve a spot.
Can you build a directory on WordPress instead of Next.js?
Yes, but only if the directory is under about 1,000 listings or you accept the performance ceiling. WordPress with a directory plugin (HivePress, Listify, GeoDirectory) ships fast for small directories. Past 1,000 listings, the editorial overhead and the front-end performance both degrade — search becomes slow, listing-page LCP slips past 3 seconds, and the index bloat from category and tag archive pages becomes a maintenance project of its own. We default to Next.js plus Supabase for anything over 1,000 listings.
How do you handle search and filter on a large directory?
Postgres full-text search handles up to about 10,000 listings before query latency becomes painful. Past that we add Algolia or Meilisearch for the search index, with Postgres remaining the source of truth. Filters are server-rendered as URL parameters, every filter combination has a canonical URL, and we use noindex on filter combinations that would generate thin or duplicate content (e.g. "hosting in Atlantis sorted by price" when Atlantis has zero results).
What schema markup goes on a directory site?
Per page type. Listing pages get either Organization, Product, Place, Service, or LocalBusiness depending on what the entity is — never invented types. Category and tag pages get CollectionPage with ItemList of the listings on that page. Home and about pages get Organization for your directory site itself. Comparison pages get a custom approach — we have built FAQPage plus Article schema combinations that work for "best X for Y" comparison pages without falsifying review aggregates.
How do you handle the sitemap when there are 50,000-plus URLs?
Stream it. A single sitemap.xml caps at 50,000 URLs and 50 MB; past that you need a sitemap index pointing at multiple chunked sitemaps. We generate the sitemap index at build time and stream each chunk on demand from Postgres so memory usage stays flat regardless of URL count. HostList.io has been past 25,000 URLs since launch; the sitemap pipeline handles 100,000 without changes.
What about user-submitted listings and moderation?
Two-tier publish pipeline. New submissions land as draft rows with status = "pending". A moderation queue surfaces them in the admin dashboard with our quality gates run automatically on submit — minimum word count, no banned-word matches, no duplicate detection against existing rows, image size and format checks. A human approves or rejects. Approved rows go live with status = "published" and trigger an on-demand sitemap regeneration. Rejection sends a templated email to the submitter with the specific reason.
Can the directory accept paid listings or sponsored placements?
Yes. The standard model is a "featured" boolean on the listings table that promotes a listing to the top of category pages and surfaces it in a sponsored slot on the home page. We also build sponsored category placements (a brand owns the top of a specific category) and full-page sponsored editorials (an article-format page with a clear "Sponsored" disclosure). All are fully disclosed via Schema.org Sponsored markings to avoid Google policy violations.
How long does it take to build a directory and what does it cost?
A directory built from scratch with a structured data source ready to import: 8-14 weeks. Pricing typically runs 25,000-90,000 USD depending on volume, search complexity, and admin features. If you bring 5,000 well-structured rows ready to import, build is faster. If you bring an Excel file that needs cleansing, the data work is half the engagement.
WHAT THE FIRST 48 HOURS LOOK LIKE
Book a 30-minute call. Tell me your industry, your data source, your rough listing count, and what success looks like in 12 months. By the end of the call you will have an honest read on whether a directory is the right shape for your idea, what stack matches your scale, and a price range. If your idea works better as something else — a marketplace, a comparison site, a content site with a database angle — I will tell you that.