Screaming Frog और Search Console के साथ Technical SEO Audit

एक क्लाइंट ने मुझे एक साइट भेजी थी जिसे "एक प्रोफेशनल एजेंसी द्वारा SEO-optimize" किया गया था 18 महीने के लिए। रैंकिंग flat थी। ट्रैफिक साल-दर-साल कम था। एजेंसी की रिपोर्ट 47 पेज लंबी थी और इसमें "brand voice alignment" पर एक सेक्शन था। जो इसमें शामिल नहीं था वह यह तथ्य था कि 3,400 पेज 200 status codes रिटर्न कर रहे थे लेकिन उनके मेटा में noindex टैग बेक किए हुए थे। तीन और आधा हजार पेज। गायब। अदृश्य। एजेंसी ने कभी भी साइट को क्रॉल नहीं किया।

मुख्य बात यह है: Screaming Frog क्रॉल को Search Console डेटा के साथ क्रॉस-रेफर करना अभी भी किसी भी साइट पर अधिकांश तकनीकी SEO समस्याएँ खोज निकालता है; विधि विदेशी टूलिंग से अधिक महत्वपूर्ण है।A Screaming Frog crawl cross-referenced with Search Console data still finds most technical SEO problems on any site; the method matters more than exotic tooling.

मैंने इसे एक हफ्ते में ठीक कर दिया। Screaming Frog और Google Search Console के साथ।

technical SEO की बात यह है कि यह उन लोगों को इनाम देता है जो वास्तव में डेटा को देखते हैं, इसके बारे में बात नहीं करते। और ईमानदारी से कहूँ तो, Seahawk के माध्यम से जिन 90% साइटों का मैंने ऑडिट किया है, उनमें समस्याओं को खोजने के लिए मुझे Ahrefs, Semrush या किसी भी बड़े प्लेटफॉर्म की जरूरत नहीं है जो वास्तव में परफॉर्मेंस को नुकसान पहुंचाती हैं। दो टूल्स। एक प्रक्रिया। यहाँ है।Seahawk, I don't need Ahrefs, Semrush, or any of the big platforms to find the problems that are genuinely hurting performance. Two tools. One process. Here it is.

---

कुछ भी क्रॉल करने से पहले, Screaming Frog को सही तरीके से सेट अप करें

ज्यादातर लोग Screaming Frog खोलते हैं, एक URL पेस्ट करते हैं, और शुरू करते हैं। 50 पेज के ब्लॉग के लिए यह ठीक है। इससे बड़ी किसी भी चीज़ के लिए, आप 40 मिनट का इंतज़ार करेंगे एक क्रॉल के लिए जो गलत डेटा देता है।

कॉन्फ़िगरेशन क्रॉलिंग स्पीड से ज्यादा मायने रखती है

पहली चीज़ जो मैं करता हूँ: Configuration > Spider पर जाता हूँ और सुनिश्चित करता हूँ कि मैं सही protocol को क्रॉल कर रहा हूँ। अगर साइट HTTPS पर है (होना चाहिए), तो मैं canonical HTTPS homepage से शुरू करता हूँ। मैं कुछ फाइल टाइप्स, PDFs, images, videos को क्रॉल करने से भी बंद कर देता हूँ, जब तक कि मैं खासतौर पर उनका ऑडिट न करना चाहता हूँ। इससे क्रॉल टाइम आधा हो जाता है।Configuration > Spider and make sure I'm crawling the correct protocol. If the site is on HTTPS (it should be), I'm starting from the canonical HTTPS homepage. I also turn off crawling of certain file types, PDFs, images, videos, unless I specifically want to audit those. It halves the crawl time.

फिर मैं Configuration > Respect Canonical Tags को off कर देता हूँ। उलटा लगता है, मुझे पता है। लेकिन मैं हर canonicalised URL देखना चाहता हूँ ताकि मैं ऑडिट कर सकूँ कि canonicalisation असल में सही है या नहीं। अगर Screaming Frog canonicalised पेजेज़ को स्किप कर देता है, तो आप कभी नहीं जान पाएँगे कि वो मौजूद हैं।Configuration > Respect Canonical Tags to off. Counter-intuitive, I know. But I want to see every canonicalised URL so I can audit whether the canonicalisation is actually correct. If Screaming Frog skips canonicalised pages, you'll never know they exist.

एक और बात: Configuration > Custom Extraction के अंतर्गत, मैं HTML source से सीधे raw <title> और meta description निकालने के लिए एक extraction rule सेट अप करता हूँ। क्यों? क्योंकि कुछ WordPress साइटें, खासकर जो Yoast को page builder के साथ चलाती हैं, दो title tags output करती हैं। Screaming Frog का डिफ़ॉल्ट कॉलम आपको सिर्फ पहला दिखाता है। extraction rule आपको सब कुछ दिखाता है।Configuration > Custom Extraction, I set up an extraction rule to pull the raw <title> and meta description directly from the HTML source. Why? Because some WordPress sites, particularly ones running Yoast alongside a page builder, output two title tags. Screaming Frog's default column only shows you the first one. The extraction rule shows you everything.

---

पहला पास: क्रॉल डेटा में मैं क्या देखता हूँ

जब क्रॉल खत्म हो जाता है, तो मैं टूटे हुए लिंक्स के साथ शुरू नहीं करता। सब लोग टूटे हुए लिंक्स के साथ शुरू करते हैं। मैं Response Codes टैब के साथ शुरू करता हूँ और 3xx redirects के लिए फिल्टर करता हूँ।Response Codes tab and filter for 3xx redirects.

2021 में वापस, Seahawk ने एक e-commerce क्लाइंट लिया, एक मध्यम आकार की फर्नीचर खुदरा विक्रेता, लगभग 8,000 URLs। उनकी dev टीम दो साल से redirects को ad hoc तरीके से संभाल रही थी। हमने 19 redirect chains पाई, जिनमें से कुछ चार hops लंबी थीं। Page A ने Page B को redirect किया, जिसने Page C को redirect किया, जिसने Page D को redirect किया। Google कहता है कि यह 10 hops तक का पालन करता है, लेकिन व्यावहारिक रूप से, दो hops से परे कुछ भी crawl budget को बर्बाद करता है और link equity को कमजोर करता है। हमने सब कुछ single-hop redirects में तब्दील कर दिया। यह अकेली चीज़, कोई content changes नहीं, कोई link building नहीं, छह हफ्तों में तीन category pages को page 3 से page 1 पर ले गई।Google says it follows up to 10 hops, but in practice, anything beyond two hops wastes crawl budget and dilutes link equity. We collapsed everything to single-hop redirects. That alone, no content changes, no link building, moved three category pages from page 3 to page 1 within six weeks.

जिस क्रम में मैं tabs के साथ काम करता हूँ

Response Codes → 3xx, redirect chains और loops, redirect chains and loops
Response Codes → 4xx, broken pages (inlinks के आधार पर filter करें), broken pages (filter by inlinks to prioritise)
Indexability → Non-Indexable, noindex, canonicals अन्यत्र pointing, robots.txt द्वारा blocked, noindex, canonicals pointing elsewhere, blocked by robots.txt
Page Titles, missing, duplicated, 60 characters से अधिक, missing, duplicated, over 60 characters
Meta Description, missing या duplicated (ranking factor नहीं है, लेकिन click-through मायने रखता है), missing or duplicated (not a ranking factor, but click-through matters)
H1, गायब, डुप्लिकेट, या प्रति पृष्ठ एक से अधिक, missing, duplicated, or more than one per page
इमेज → Alt टेक्स्ट गायब, त्वरित जीत, खासकर प्रोडक्ट साइट्स के लिए, quick win, especially for product sites
डायरेक्टिव्स → कैनोनिकल, जांचें कि ये वास्तविक इंडेक्सेबल URL से मेल खाते हैं, check these match the actual indexable URL

वह क्रम जानबूझकर है। मैं संरचनात्मक समस्याओं (रीडायरेक्ट्स, टूटे पेज) से शुरुआत करके पेज-दर-पेज मुद्दों तक काम करता हूँ। एक टूटी हुई रीडायरेक्ट चेन को ठीक करने से उस चेन में हर पेज को मदद मिलती है। एक गुम मेटा विवरण को ठीक करने से एक पेज को मदद मिलती है।

---

सर्च कंसोल में लेयरिंग: जहां चीजें दिलचस्प होती हैं

Screaming Frog आपको बताता है कि साइट पर क्या है। सर्च कंसोल आपको बताता है कि गूगल को क्या लगता है साइट पर है। इन दोनों डेटा सेट के बीच का अंतर ही वह जगह है जहां असली समस्याएं रहती हैं।

ओपन कवरेज (या इंडेक्सिंग → नए इंटरफेस में पेजेस)। आप चार चीजों को देख रहे हैं:Coverage (or Indexing → Pages in the newer interface). You're looking at four things:

त्रुटि, पृष्ठ जिन्हें Google इंडेक्स करने की कोशिश कर रहा था और नहीं कर सका, pages Google tried to index and couldn't
चेतावनियों के साथ वैध, अक्सर "सबमिट किया गया URL कैनोनिकल के रूप में चुना नहीं गया," जो एक गड़बड़ है जिसे आपको सुलझाना होगा, often "Submitted URL not selected as canonical," which is a mess you need to untangle
एक्सक्लूडेड, पृष्ठ जिन्हें Google ने इंडेक्स न करने का चुनाव किया (क्रॉल किए गए लेकिन इंडेक्स नहीं किए गए, noindexed, आदि), pages Google chose not to index (crawled but not indexed, noindexed, etc.)
वैध, पृष्ठ जिन्हें Google ने इंडेक्स किया है, pages Google has indexed

"एक्सक्लूडेड" बकेट का अपराध से कम उपयोग होता है। अधिकांश लोग इसे अनदेखा करते हैं। मैं सीधे वहां जाता हूं। "क्रॉल किया गया, वर्तमान में इंडेक्स नहीं किया गया" के लिए फ़िल्टर करें। यह Google कह रहा है: मैंने यह पृष्ठ पाया, मैंने इसे पढ़ा, और मैंने तय किया कि यह इंडेक्स करने लायक नहीं था। यह लगभग हमेशा थिन कंटेंट की समस्या है। या यह एक पृष्ठ है जो वास्तव में ठीक है लेकिन दूसरे पृष्ठ के लिए बहुत समान है, फेसेटेड नेविगेशन या टैग आर्काइव्स के साथ एक क्लासिक समस्या।I found this page, I read it, and I decided it wasn't worth indexing. That's almost always a thin content problem. Or it's a page that's genuinely fine but is too similar to another page, a classic issue with faceted navigation or tag archives.

जीएससी एक्सक्लूजन्स को अपने Screaming Frog क्रॉल से मेल खाना

अपने Screaming Frog क्रॉल को सीएसवी में एक्सपोर्ट करें। सर्च कंसोल से "एक्सक्लूडेड" यूआरएल एक्सपोर्ट करें। दोनों को गूगल शीट्स में लोड करें और वीएलूकअप चलाएं। कोई भी यूआरएल जो Screaming Frog क्रॉल में दिखाई देता है और जीएससी एक्सक्लूडेड लिस्ट में है, एक प्राथमिकता की जांच है।and in the GSC excluded list is a priority investigation.

मुझे पता है कि लोग इसके लिए Python scripts का इस्तेमाल करते हैं। आपको करने की जरूरत नहीं है। Sheets में VLOOKUP चार मिनट में आपको वही जवाब दे देता है।

---

Crawl Budget: सिर्फ अगर आपकी साइट वाकई बड़ी है तो मायने रखता है

ठीक है, चलिए सच कहते हैं। अगर आपकी साइट के 1,000 से कम पेज हैं, तो crawl budget आपकी समस्या नहीं है। आप इसके बारे में चिंता करना बंद कर सकते हैं।

लेकिन एक बार जब आप लगभग 10,000 URLs को पार कर जाते हैं, और बहुत सारी WooCommerce या Magento स्टोरें प्रोडक्ट वेरिएंट्स और फ़िल्टर किए गए URLs से सीधे इस तक पहुंचती हैं, क्रॉल बजट काटना शुरू कर देता है। Google Search Central पर क्रॉल बजट पर दस्तावेज़ दरअसल उन चीजों में से एक है जो उन्होंने स्पष्ट रूप से लिखी है। इसे सही तरीके से पढ़ने लायक है।Google Search Central documentation on crawl budget is actually one of the clearer things they've written. Worth reading properly.

Search Console में आपके पास दो levers हैं — Crawl Stats रिपोर्ट और URL Inspection tool। Crawl Stats आपको Google की 90 दिन की crawl activity दिखाता है: रोज crawl किए गए पेज, response times, response codes। अगर आप किसी खास तारीख पर 404s में स्पाइक देखते हैं, तो वह deployment है जो गलत हुई। अगर average crawl time 2 सेकंड से ऊपर है, तो आपका सर्वर समस्या है, SEO नहीं।Crawl Stats report and the URL Inspection tool. Crawl Stats shows you Google's crawl activity over 90 days: pages crawled per day, response times, response codes. If you see a spike in 404s on a specific date, that's a deployment that went wrong. If average crawl time is above 2 seconds, your server is the problem, not your SEO.

---

Internal Linking: वह चीज जिसे Agencies हमेशा मिस करती हैं

मैंने Seahawk में सौ से अधिक साइट्स ऑडिट की हैं जहां क्लाइंट लिंक बिल्डिंग, गेस्ट पोस्ट, डिजिटल PR आदि पर असली पैसा खर्च कर रहे थे, और ऐसे अनाथ पृष्ठ थे जिन पर कोई आंतरिक लिंक नहीं था। Google वह प्राथमिकता नहीं दे सकता जो वह आपकी साइट संरचना के माध्यम से नहीं खोज सकता।orphaned pages that no internal link pointed to. Google can't prioritise what it can't find through your site structure.

Screaming Frog में, crawl को Inlinks = 0 से filter करें। कोई भी पेज जिसके पास शून्य internal links हैं, एक orphan है। इसे Search Console के indexed pages से क्रॉस-रेफरेंस करें। अगर पेज indexed है लेकिन internal links नहीं हैं, तो इसका मतलब है कि Google को यह एक XML sitemap या एक external backlink के माध्यम से मिला। यह नाजुक है। इसे एक relevant page से एक internal link दें और आप Google को एक structural signal दे रहे हैं कि यह पेज महत्वपूर्ण है।Inlinks = 0. Any page with zero internal links is an orphan. Cross-reference it against Search Console's indexed pages. If the page is indexed but has no internal links, it means Google found it through an XML sitemap or an external backlink. That's fragile. Give it an internal link from a relevant page and you're giving Google a structural signal that this page matters.

आंतरिक लिंकिंग पर मैं कुछ बातें देखता हूँ

पेजिनेशन पेज जो प्रोडक्ट/आर्टिकल पेज को लिंक करते हैं लेकिन वे पेज कैटेगरी पेज पर वापस लिंक नहीं करते
2019 में प्रकाशित ब्लॉग पोस्ट जिनको कभी नई कंटेंट से लिंक नहीं किया गया
ऐसे पेज जिनके पास दर्जनों इनबाउंड इंटरनल लिंक हैं लेकिन GSC में ट्रैफिक बहुत कम है, अक्सर यह संकेत देता है कि समस्या लिंकिंग में नहीं, पेज में ही है।

---

Core Web Vitals: डेटा पढ़ें, घबराएँ नहीं

Search Console के पास Core Web Vitals रिपोर्ट है। यह real-user Chrome UX Report डेटा से खींचता है, जो फील्ड डेटा है, असली उपयोगकर्ता असली डिवाइस पर, लैब सिमुलेशन नहीं। यह एक बार चलाए गए Lighthouse रन से ज़्यादा meaningful है।Core Web Vitals report. It pulls from real-user Chrome UX Report data, which is field data, actual users on actual devices, not a lab simulation. This is more meaningful than what you'd get from a one-off Lighthouse run.

रिपोर्ट URLs को LCP, FID (अब INP से बदल दिया गया है), और CLS के आधार पर "Good," "Needs improvement," और "Poor" में बांटती है। सब कुछ एक साथ ठीक करने की कोशिश मत करो। "Poor" ग्रुप को सॉर्ट करो और देखो कि किस URL pattern में सबसे ज़्यादा failing pages हैं। आमतौर पर एक ही template होता है, सभी product pages CLS fail कर रहे होते हैं, या सभी category pages को slow LCP है। Template को ठीक करो, सैकड़ों पेज एक बार में ठीक हो जाएंगे।

एक चीज जो मैंने कठिन तरीके से सीखी है: विज्ञापन या कुकी बैनर वाली साइटों पर CLS समस्याएँ लगभग हमेशा एलिमेंट्स की वजह से होती हैं जो इनिशियल पेंट के बाद ऐबव द फोल्ड इंजेक्ट करते हैं। Screaming Frog इसे नहीं पकड़ेगा। आपको असली पेज को देखना होगा। Chrome DevTools का उपयोग करें जिसमें Layout Shift regions Rendering में सक्षम हो।

---

रोबोट्स.टेक्स्ट और साइटमैप चेक (10 मिनट लगते हैं, हफ्तों की बचत करते हैं)

yourdomain.com/robots.txt पर जाएँ। हर लाइन पढ़ें। मैंने अपनी आँखों से एक लाइव प्रोडक्शन साइट देखी है जिसमें robots.txt में Disallow: / था। कोई स्टेजिंग साइट नहीं। प्रोडक्शन। सात साल पुरानी कंपनी। उनके डेवलपर ने माइग्रेशन के दौरान स्टेजिंग robots.txt कॉपी कर दिया था और कभी चेक नहीं किया। वे चार महीने तक गूगल से अनिवार्य रूप से अदृश्य रहे थे इससे पहले कि उन्हें पता चले।yourdomain.com/robots.txt . Read every line. I have seen, with my own eyes, a live production site with Disallow: / in the robots.txt. Not a staging site. Production. A seven-year-old business. Their developer had copied the staging robots.txt during a migration and never checked it. They had been essentially invisible to Google for four months before they noticed.

Search Console में, Sitemaps पर जाओ। देखो कि क्या submit किया गया है। देखो कि Google ने इसे आखिरी बार कब fetch किया। अगर sitemap एक हफ्ते से ज़्यादा समय से fetch नहीं हुआ है, तो कुछ टूटा हुआ है। Submitted URL count को indexed URL count से compare करो भी, अगर तुमने 4,000 URLs submit किए हैं और सिर्फ 1,200 indexed हैं, तो यह technical fixes के बारे में नहीं, content quality के बारे में बातचीत है।Sitemaps. Check what's been submitted. Check the last time Google fetched it. If the sitemap hasn't been fetched in over a week, something is broken. Also check the submitted URL count vs the indexed URL count, if you've submitted 4,000 URLs and only 1,200 are indexed, that's a conversation you need to have about content quality, not about technical fixes.

---

FAQ

क्या मुझे Screaming Frog का पेड वर्जन चाहिए?

Free version 500 URLs तक सीमित है। उससे ऊपर कुछ भी, जो audit के लायक ज़्यादातर sites के लिए सच है, तुम्हें paid licence चाहिए। यह £259 प्रति साल है लेखन के समय। यह agency time के एक घंटे की कीमत के बारे में है। इसे खरीद लो।£259 per year as of writing. That's about the price of a single hour of agency time. Buy it.

मुझे तकनीकी ऑडिट कितनी बार चलाना चाहिए?

सक्रिय साइट्स जो नियमित रूप से प्रकाशित करती हैं या बार-बार प्रोडक्ट बदलती हैं, मैं कहूँगा तिमाही में। छोटी, अधिक स्थिर साइट्स के लिए, साल में दो बार ठीक है। एक बार ऑडिट चलाना और इसे "पूरा" मानना ऐसा है जैसे कार में तेल बदला हो और उम्मीद करो कि वह हमेशा चलती रहेगी।

Screaming Frog 200 status दिखाता है लेकिन GSC दिखाता है कि पेज indexed नहीं है, क्यों?

लगभग हमेशा तीन चीजों में से एक: एक noindex meta tag, एक noindex HTTP header, या कहीं और pointing एक canonical tag। URL को Search Console के URL Inspection tool के through चलाओ और यह तुम्हें बताएगा कि उसे क्या मिला। वह tool underrated है, यह तुम्हें Google का last crawled version दिखाता है पेज का, rendered HTML सहित, जो JavaScript-injected noindex tags को पकड़ता है जो एक basic HTTP request नहीं देख सकता।last crawled version of the page, including the rendered HTML, which catches JavaScript-injected noindex tags that a basic HTTP request wouldn't see.

JavaScript-rendered साइटों के बारे में क्या?

Screaming Frog के पास Configuration > Spider > Rendering के तहत एक JavaScript rendering mode है। JS-heavy sites के लिए इसे turn on करो। यह slow है, बहुत slow है, लेकिन यह content या links के साथ issues को पकड़ने का एकमात्र तरीका है जो JavaScript के बाद inject किए जाते हैं initial HTML loads के। एक React या Next.js site के लिए, हमेशा JS rendering mode में crawl करो।Configuration > Spider > Rendering. Turn it on for JS-heavy sites. It's slower, significantly slower, but it's the only way to catch issues with content or links that are injected by JavaScript after the initial HTML loads. For a React or Next.js site, always crawl in JS rendering mode.

क्या Google Search Console कीवर्ड रिसर्च के लिए काफी है?

यह पता लगाने के लिए कि तुम्हारे existing pages किन queries के लिए rank करते हैं, हाँ, यह excellent है। नए keyword opportunities discover करने के लिए, नहीं, तुम्हें कुछ और चाहिए। लेकिन यह एक technical audit के scope से बाहर है।existing pages rank for, yes, it's excellent. For discovering new keyword opportunities, no, you'll need something else. But that's out of scope for a technical audit.

---

दो tools। एक spreadsheet। कुछ घंटे। बस इतना ही इसमें लगता है। expensive platforms का अपना जगह है, मैं उनके खिलाफ नहीं हूँ, लेकिन मैंने बहुत सारे site owners को यह मानते हुए देखा है कि ज़्यादा pay करने का मतलब ज़्यादा खोजना है। समस्याएं लगभग हमेशा basics में होती हैं। उन्हें सिर्फ किसी को actually देखने की जरूरत है।