business-directory-website-development.html
< BACK TO BLOG Designer's desk showing directory listings on multiple screens

What I Learned Building Hostlist: 25,000 Web Hosts

Somewhere around host number 11,000, I genuinely questioned every decision I'd ever made. Not in a dramatic way — more like the quiet, specific dread of realising you've painted yourself into a corner with a dataset that keeps growing and a schema that was designed for maybe 500 entries. That was Hostlist. A directory of web hosts. All of them, or as close to all of them as I could get.

I'm going to tell you what actually happened — the architecture choices, the data nightmares, the moments where it clicked, and the bits I'd do completely differently if I started today.

Why a Web Hosting Directory

Honestly? I got annoyed. I was doing research for a Seahawk client — a mid-market SaaS that needed to migrate hosts — and I couldn't find a single directory that was both comprehensiveandcurrent. Most were either thin affiliate pages pretending to be neutral, or outdated lists that still featured hosts that had gone under in 2017.

The web hosting industry has thousands of active providers. Not dozens. Thousands. Shared hosts, managed WordPress hosts, VPS providers, bare-metal specialists, regional players you've never heard of. Nobody had mapped it properly. So I thought: I'll do it. Six weeks, I told myself.

It took considerably longer than six weeks.

The market validated the instinct, though. Look at what niche directories can do at even modest scale —Soak Oregon, a simple hot springs directory, pulls roughly $1,000 a month in ad revenue on just 25,000 monthly visitors. That's not a typo. 25,000 visitors. The economics of a well-targeted directory are genuinely different from a general content site.

The Data Problem Nobody Talks About

This is where most directory-building guides completely let you down. They'll tell you to set up categories and listing fields. Fine. What they won't tell you is that gathering 25,000 accurate, structured records is a different class of problem entirely.

My first approach was manual research plus a scraping layer I bolted together over a weekend. The scraper was fine. Thedatawas chaos. Hosting providers change their pricing constantly. Some had three different brand names. Some were resellers of resellers — the same underlying infrastructure wearing fifteen different logos. Deduplication alone cost me three weeks.

A few things I wish I'd decided earlier:

  • One canonical record per legal entity, not per brand. Some hosts have four brands. They're still one host.
  • Freshness dating on every field.Not just "last updated" on the row — per field. Pricing goes stale faster than feature sets.
  • A human review queue from day one.Automated ingestion is fine for first-pass. But you need a process for flagging records that look wrong before they go live.

The third point especially. I skipped it early on and ended up with a chunk of listings that had completely wrong pricing tiers because a host had rebranded their plans and the scraper had matched on the old page structure. Took me ages to find it.

Choosing the Right Tech Stack

I went with WordPress. I know. But hear me out.

For a directory at this scale, you want something with a mature plugin ecosystem and a query layer you understand deeply. I'd usedDirectoriston smaller projects and it held up well — flexible schema, works with Gutenberg, sensible defaults. For Hostlist specifically I paired it with a custom post type layer on top, because I needed fields that no off-the-shelf plugin anticipated (things like data-centre locations, peering arrangements, control panel versions).

The four pages that actually matter — and I'd say this is true of any directory regardless of niche — are:

  1. Homepage with clear purpose, featured listings, and a dead-simple search
  2. Archive/browse page with fast filtering (this is where 80% of your users live)
  3. Single listing with the full record, structured data markup, and a way to claim/report
  4. Submission page (even if you're not doing user submissions initially, build it ready)

I can't stress the archive page enough. Users don't come to your homepage andthennavigate. They land on an archive page from Google and decide within four seconds whether the data looks credible. Get that page right first.

What I'd Change About the Stack

Custom tables. I should have moved the core listing data out of post meta and into proper relational tables much earlier. WordPress post meta is fine up to maybe 5,000 records. Past that, the queries get painful. Theperformance considerations for large-scale web applicationsare real — RAM, query optimisation, caching strategy — none of which you plan for when you're just trying to get the thing launched.

Hosting the Directory Itself (Genuinely Awkward)

There's a particular irony in building a web hosting directory and then having to choose a host for it. I went through three hosts in the first year.

The first was a managed WordPress host that I won't name. It choked on the import process — 25,000 posts going in via WP-CLI was not something their infrastructure was designed for. The second was a VPS where I handled everything myself: Nginx as a reverse proxy, Redis for object caching, ufw for firewall.That self-hosted architecture approach works brilliantly when you know what you're doing— total visibility, no mystery throttling, you control the cache headers. But it's also 11pm on a Thursday when something breaks and it's entirely your problem.

I landed on a managed VPS with root access. Best of both. I kept Nginx in front, added a CDN layer for the static assets, and that's held up since.

The lesson: whatever host you choose,test it with your actual data volume before you commit. Not a sample. Your real import. A host that handles a 500-post blog with flying colours will sometimes completely fall over when you throw 25,000 records at it during a database rebuild.

Monetisation: What I Tried, What Worked

Back in 2019 a client once said to me, "the money's in the listing, not the traffic." I didn't fully understand it then. I do now.

Hostlist's revenue has come from a few places, in rough order of what actually moved the needle:

  • Featured/premium listings— hosts pay to appear at the top of relevant category pages. This works. The CPMs are good because the intent is high.
  • Verified badges with annual renewal— lighter-touch than a full premium listing, but it adds up.
  • Display advertising— I added this late and it's the weakest performer by quite a lot. The audience is too small and too specific for broad ad networks to value properly.
  • Lead gen / affiliate— I was cautious here because I didn't want Hostlist to look like every other biased comparison site. I have a small number of referral arrangements but they're disclosed and limited.

What I havenotdone is a freemium model where basic listings are free and upgrades are paid. I thought about it. The problem with web hosting specifically is that the providers worth having on your platform are also the ones least likely to need your directory for exposure. The smaller hosts benefit more from being listed, but they're also the ones with the smallest budgets. The economics are awkward.

Brilliant Directories and similar platformshave this figured out for more community-oriented directories — wedding vendors, parenting resources — where the members genuinelywantto be found by locals. Web hosting is different. It's a global, hyper-competitive market.

SEO for a Large Directory: The Bits That Actually Helped

A directory with 25,000 entries is an SEO asset if you handle it right. It's an SEO liability if you don't.

The specific things that helped:

  1. Unique, templated but variable meta descriptions per listing— not just the host name + "web hosting review". I pulled in actual data points (price tier, primary use case, founding year) to generate descriptions that were genuinely different.
  2. Category and tag pages with real editorial content— not just a grid of cards. A 200-word intro explaining what "managed WordPress hosting" actually means, written once, applied to the category. Google wants to see that someone thought about the page.
  3. Structured data (Schema.org)— every listing hasLocalBusinessorOrganizationmarkup. Click-through rates improved noticeably after I added this properly.
  4. Canonicals on filter combinations— this nearly killed me. Faceted search generates thousands of URL combinations. If you don't canonical them back to the clean archive URL, you'll be crawl-budget bankrupt within a month.
  5. Indexed listings only for active hosts— I noindex anything I can't confirm is still operating. Dead listings are worse than no listing.

The one thing I got wrong early: I indexed everything immediately. Including stubs with almost no data. Google crawled them, found thin pages, and partially discounted the whole domain for a while. Lesson:don't index it until it's worth indexing.

What I'd Do Differently

A few things, quickly:

  • Start with a smaller, tighter niche first. "Web hosting directory" is enormous. I should have launched with "managed WordPress hosts" — maybe 300-400 records — proven the concept, then expanded.
  • Build the data pipelinebeforethe front end. I did it backwards. The front end was live before the import process was solid, which meant I was constantly patching live data.
  • Charge for listings from day one. Even £1/month. Free listings attract hosts who fill in the form badly and never respond to update requests. A tiny payment filters for quality.
  • Invest in a proper contributor system earlier. Some of the best data corrections I've received came from users who spotted errors. I had no structured way to accept those for the first eight months.

Honestly, building Hostlist has been one of the most technically interesting side projects I've worked on — and one of the most humbling. The directory format looks deceptively simple from the outside.

---

FAQ

How long did it take to build Hostlist?

The first version — rough, full of data gaps, but live — took about three months of evenings and weekends. Getting it to a state I was genuinely proud of took closer to a year. The data quality work never really stops.

What WordPress plugin did you use for the directory functionality?

Directorist as a base, then a substantial amount of custom development on top. For a smaller directory I'd use it more or less out of the box. At 25,000 entries, you'll eventually need to write custom queries anyway — the plugin just gives you a starting point.

Is a web hosting directory actually profitable?

It can be. Mine covers its costs and earns beyond that, but I won't pretend it's a passive income machine. The margins depend heavily on whether you can get premium listings sold. Display ads alone won't get you there at moderate traffic levels.

How do you keep 25,000 listings up to date?

Imperfectly. I have a combination of scheduled scrapers that check for pricing page changes, a community-reported corrections queue, and a manual review cycle for the top 500 hosts by traffic. The long tail degrades over time. I've accepted that.

Would you recommend building a large directory as a first project?

No. Start with something you can do in 500 records. Prove that people use it and that there's a monetisation path.Thenscale. The technical and data-management complexity of a large directory is genuinely non-trivial, and you want to encounter those problems after you've validated the idea, not before.

---

The thing about directories is they're a long game. You're building a data asset, not a content site. Traffic grows slowly, the work is unglamorous, and for the first six months you'll wonder if anyone cares. But when the data is good and the niche is right, directories develop a kind of gravitational pull that's hard to replicate with any other format. That's why I keep building them.

< BACK TO BLOG