Back to Problem DictionaryView Documentation β
The Problem
How to find "Ghost Pages" (Indexed 404s) and Sitemap errors
You are looking for a way to find "Ghost Pages" (Indexed 404s) and Sitemap errors. Most people would tell you to buy a SaaS subscription for this.
We say: Build it yourself for free.
The Solution
The Automation Blueprint
Copy the logic below into a tool like Gemini CLI or Claude Code. It includes the role, constraints, and multi-step workflow needed to find "Ghost Pages" (Indexed 404s) and Sitemap errors.
# What This Does
Google Search Console (GSC) often gets out of sync with your real website. It might think a deleted page (`/old-page`) is still "Indexed", or it might claim a fixed 404 is still broken.
This agent cross-references your **GSC Coverage Report** with your **Live Site Status** to give you a definitive "Fix List".
# What You Need
1. **GSC Export:** Go to GSC > Pages > Export > CSV. Save the main table as `gsc_coverage.csv`.
2. **Sitemap:** Your `sitemap.xml` file.
# What You Get
- `ghost_pages.csv`: URLs Google lists as "Indexed" but are actually broken (404) or redirects (301) on your live site.
- `sitemap_errors.csv`: URLs in your sitemap that aren't actually live (200 OK).
- `action_plan.md`: A summary of which GSC issues are "Real" vs "False Positives" (so you can click "Validate Fix").
# How to Use
1. Export your coverage report from Google Search Console.
2. Save it as `gsc_coverage.csv` in your folder.
3. Ensure your `sitemap.xml` is also in the folder.
4. Run this blueprint.
---
# Prompt
You are a **Technical SEO Auditor**. Your goal is to reconcile the difference between "What Google Thinks" (GSC Data) and "What Is Real" (Live Site Status).
**Phase 1: Ingest Data**
1. Read `gsc_coverage.csv`. (Key columns: `URL`, `Status`).
2. Read `sitemap.xml`. (Extract all `<loc>` URLs).
**Phase 2: Live Diagnostics**
For every URL found in EITHER file, perform a **Live Status Check**:
- **Fetch:** Curl/Get the URL.
- **Record:** HTTP Status Code (200, 301, 404, 500).
- **Record:** Final Destination (if redirected).
**Phase 3: Analysis Logic**
Categorize every URL into one of these buckets:
1. **π» Ghost Page (High Priority):**
- *Condition:* GSC Status = "Indexed" AND Live Status = 404/5xx.
- *Meaning:* Google is sending traffic to a dead page.
- *Action:* "Immediate 301 Redirect".
2. **π§ Zombie Redirect:**
- *Condition:* Sitemap = Yes AND Live Status = 301/308.
- *Meaning:* Your sitemap is dirty. It points to a redirect, not the final page.
- *Action:* "Update Sitemap to point to [Final Destination]".
3. **β
False Alarm (Validate Fix):**
- *Condition:* GSC Status = "Error/404" AND Live Status = 301 (Redirected) OR 200 (Fixed).
- *Meaning:* You fixed it, but Google hasn't updated.
- *Action:* "Click 'Validate Fix' in GSC".
**Phase 4: Output**
1. Save `audit_results.csv` with columns: `URL`, `GSC_Status`, `Live_Status`, `Sitemap_Present`, `Category`, `Recommended_Action`.
2. Generate a summary `action_plan.md`:
- List top 5 "Ghost Pages" to fix immediately.
- Count of "False Alarms" where the user should just click "Validate Fix".
Start now.
Related SEO Automations
Want the Full Library?
I have over 500+ blueprints just like this one for every part of your Sales & Marketing stack.
Browse All 500 Blueprints