technical-seo-audit-screaming-frog-gsc.html
< BACK TO BLOG Cluttered London desk with handwritten SEO notes, warm amber lamp light, shallow depth of field

How I Run a Technical SEO Audit With Screaming Frog & GSC

A client once sent me a site that had been "SEO-optimised by a professional agency" for 18 months. Rankings were flat. Traffic was down year-on-year. The agency's report was 47 pages long and included a section on "brand voice alignment." What it didn't include was the fact that 3,400 pages were returning 200 status codes but had noindex tags baked into the meta. Three and a half thousand pages. Gone. Invisible. The agency had never actually crawled the site.

I fixed it in a week. With Screaming Frog and Google Search Console.

That's the thing about technical SEO — it rewards people who actually look at the data rather than talking about it. And honestly, for 90% of sites I audit through Seahawk, I don't need Ahrefs, Semrush, or any of the big platforms to find the problems that are genuinely hurting performance. Two tools. One process. Here it is.

---

Before You Crawl Anything, Set Up Screaming Frog Properly

Most people open Screaming Frog, paste a URL, and hit start. That's fine for a 50-page blog. For anything bigger, you'll be waiting 40 minutes for a crawl that gives you wrong data.

Configuration matters more than crawling speed

First thing I do: go to Configuration > Spider and make sure I'm crawling the correct protocol. If the site is on HTTPS (it should be), I'm starting from the canonical HTTPS homepage. I also turn off crawling of certain file types — PDFs, images, videos — unless I specifically want to audit those. It halves the crawl time.

Then I set Configuration > Respect Canonical Tags to off. Counter-intuitive, I know. But I want to see every canonicalised URL so I can audit whether the canonicalisation is actually correct. If Screaming Frog skips canonicalised pages, you'll never know they exist.

One more thing: under Configuration > Custom Extraction, I set up an extraction rule to pull the raw <title> and meta description directly from the HTML source. Why? Because some WordPress sites — particularly ones running Yoast alongside a page builder — output two title tags. Screaming Frog's default column only shows you the first one. The extraction rule shows you everything.

---

The First Pass: What I Look For in the Crawl Data

Once the crawl finishes, I don't start with broken links. Everyone starts with broken links. I start with the Response Codes tab and filter for 3xx redirects.

Back in 2021, Seahawk took on an e-commerce client — mid-sized furniture retailer, about 8,000 URLs. Their dev team had been handling redirects ad hoc for two years. We found 19 redirect chains, some of them four hops long. Page A redirected to Page B, which redirected to Page C, which redirected to Page D. Google says it follows up to 10 hops, but in practice, anything beyond two hops wastes crawl budget and dilutes link equity. We collapsed everything to single-hop redirects. That alone — no content changes, no link building — moved three category pages from page 3 to page 1 within six weeks.

The order I work through tabs

  1. Response Codes → 3xx — redirect chains and loops
  2. Response Codes → 4xx — broken pages (filter by inlinks to prioritise)
  3. Indexability → Non-Indexable — noindex, canonicals pointing elsewhere, blocked by robots.txt
  4. Page Titles — missing, duplicated, over 60 characters
  5. Meta Description — missing or duplicated (not a ranking factor, but click-through matters)
  6. H1 — missing, duplicated, or more than one per page
  7. Images → Missing Alt Text — quick win, especially for product sites
  8. Directives → Canonical — check these match the actual indexable URL

That order is deliberate. I work from structural problems (redirects, broken pages) down to on-page issues. Fixing a broken redirect chain helps every page in that chain. Fixing a missing meta description helps one page.

---

Layering in Search Console: Where Things Get Interesting

Screaming Frog tells you what's on the site. Search Console tells you what Google thinks is on the site. The gap between those two data sets is where the real problems live.

Open Coverage (or Indexing → Pages in the newer interface). You're looking at four things:

  • Error — pages Google tried to index and couldn't
  • Valid with warnings — often "Submitted URL not selected as canonical," which is a mess you need to untangle
  • Excluded — pages Google chose not to index (crawled but not indexed, noindexed, etc.)
  • Valid — pages Google has indexed

The "Excluded" bucket is criminally underused. Most people ignore it. I go straight there. Filter by "Crawled — currently not indexed." This is Google saying: I found this page, I read it, and I decided it wasn't worth indexing. That's almost always a thin content problem. Or it's a page that's genuinely fine but is too similar to another page — a classic issue with faceted navigation or tag archives.

Matching GSC exclusions against your Screaming Frog crawl

Export your Screaming Frog crawl to CSV. Export the "Excluded" URLs from Search Console. Load both into Google Sheets and run a VLOOKUP. Any URL that appears in the Screaming Frog crawl and in the GSC excluded list is a priority investigation.

I know people reach for Python scripts for this. You don't need to. VLOOKUP in Sheets takes four minutes and gives you the same answer.

---

Crawl Budget: Only Matters If Your Site Is Actually Big

Right, let's be honest. If your site has under 1,000 pages, crawl budget is not your problem. You can stop worrying about it.

But once you're past about 10,000 URLs — and a lot of WooCommerce or Magento stores hit this just from product variants and filtered URLs — crawl budget starts to bite. The Google Search Central documentation on crawl budget is actually one of the clearer things they've written. Worth reading properly.

The two levers you have in Search Console are the Crawl Stats report and the URL Inspection tool. Crawl Stats shows you Google's crawl activity over 90 days: pages crawled per day, response times, response codes. If you see a spike in 404s on a specific date, that's a deployment that went wrong. If average crawl time is above 2 seconds, your server is the problem, not your SEO.

---

Internal Linking: The Thing Agencies Always Miss

I've audited well over a hundred sites at Seahawk where the client was spending real money on link building — guest posts, digital PR, the lot — and had orphaned pages that no internal link pointed to. Google can't prioritise what it can't find through your site structure.

In Screaming Frog, filter the crawl by Inlinks = 0. Any page with zero internal links is an orphan. Cross-reference it against Search Console's indexed pages. If the page is indexed but has no internal links, it means Google found it through an XML sitemap or an external backlink. That's fragile. Give it an internal link from a relevant page and you're giving Google a structural signal that this page matters.

A few things I watch for on internal linking

  • Pagination pages that link to product/article pages but those pages don't link back up to category pages
  • Blog posts published in 2019 that have never been linked to from any newer content
  • Pages that have dozens of inbound internal links but very low traffic in GSC — often a sign the page itself has a problem, not the linking

---

Core Web Vitals: Read the Data, Don't Panic

Search Console has a Core Web Vitals report. It pulls from real-user Chrome UX Report data, which is field data — actual users on actual devices, not a lab simulation. This is more meaningful than what you'd get from a one-off Lighthouse run.

The report groups URLs into "Good," "Needs improvement," and "Poor" by LCP, FID (now replaced by INP), and CLS. Don't try to fix everything at once. Sort by the "Poor" group and look at which URL pattern has the most failing pages. Usually it's a single template — all product pages failing CLS, or all category pages with slow LCP. Fix the template, fix hundreds of pages at once.

One thing I've learned the hard way: CLS issues on sites with ads or cookie banners almost always come from elements injecting above the fold after initial paint. Screaming Frog won't catch this. You need to look at the actual page. Use Chrome DevTools with the Layout Shift regions enabled in Rendering.

---

The Robots.txt and Sitemap Check (Takes 10 Minutes, Saves Weeks)

Go to yourdomain.com/robots.txt . Read every line. I have seen, with my own eyes, a live production site with Disallow: / in the robots.txt. Not a staging site. Production. A seven-year-old business. Their developer had copied the staging robots.txt during a migration and never checked it. They had been essentially invisible to Google for four months before they noticed.

In Search Console, go to Sitemaps. Check what's been submitted. Check the last time Google fetched it. If the sitemap hasn't been fetched in over a week, something is broken. Also check the submitted URL count vs the indexed URL count — if you've submitted 4,000 URLs and only 1,200 are indexed, that's a conversation you need to have about content quality, not about technical fixes.

---

FAQ

Do I need the paid version of Screaming Frog?

The free version caps at 500 URLs. For anything above that — which is most sites worth auditing — you need the paid licence. It's £259 per year as of writing. That's about the price of a single hour of agency time. Buy it.

How often should I run a technical audit?

For active sites that publish regularly or change products frequently, I'd say quarterly. For smaller, more static sites, twice a year is fine. Running an audit once and treating it as "done" is like changing the oil in a car once and expecting it to run forever.

Screaming Frog shows 200 status but GSC shows the page isn't indexed — why?

Almost always one of three things: a noindex meta tag, a noindex HTTP header, or a canonical tag pointing elsewhere. Run the URL through Search Console's URL Inspection tool and it'll tell you exactly what it found. That tool is underrated — it shows you Google's last crawled version of the page, including the rendered HTML, which catches JavaScript-injected noindex tags that a basic HTTP request wouldn't see.

What about JavaScript-rendered sites?

Screaming Frog has a JavaScript rendering mode under Configuration > Spider > Rendering. Turn it on for JS-heavy sites. It's slower — significantly slower — but it's the only way to catch issues with content or links that are injected by JavaScript after the initial HTML loads. For a React or Next.js site, always crawl in JS rendering mode.

Is Google Search Console enough for keyword research?

For finding which queries your existing pages rank for, yes, it's excellent. For discovering new keyword opportunities, no — you'll need something else. But that's out of scope for a technical audit.

---

Two tools. A spreadsheet. A few hours. That's genuinely all this takes. The expensive platforms have their place — I'm not against them — but I've seen too many site owners assume that paying more means finding more. The problems are almost always in the basics. They just need someone to actually look.

< BACK TO BLOG