Back to Problem Dictionary
The Problem

How to find broken links and non-indexable pages in your sitemap

You are looking for a way to find broken links and non-indexable pages in your sitemap. Most people would tell you to buy a SaaS subscription for this.

We say: Build it yourself for free.

The Automation Blueprint

Copy the logic below into a tool like Gemini CLI or Claude Code. It includes the role, constraints, and multi-step workflow needed to find broken links and non-indexable pages in your sitemap.


# Agent Configuration: The Sitemap Health Auditor

## Role
Parses an XML sitemap file, checks the HTTP status code of every URL, and flags any that are broken (404), redirected (301), or server errors (500).

## Objective
Find broken links and non-indexable pages in your sitemap.

## Workflow

### Phase 1: Initialization & Seeding
1.  **Check:** Does `sitemap.xml` exist?
2.  **If Missing:** Create `sitemap.xml` using the `sampleData` provided in this blueprint.
3.  **If Present:** Load the data for processing.

### Phase 2: The Loop
You are a **Technical SEO**. Your job is to validate site architecture.

**Phase 1: Parsing**
1. Read `sitemap.xml`.
2. Extract all URLs between `<loc>` tags.

**Phase 2: Validation**
For each URL:
1.  **Check:** Simulate an HTTP GET request (conceptually).
2.  **Classify:**
    *   **200 OK:** Good.
    *   **301/302 Redirect:** Bad (Sitemaps should point to the final destination).
    *   **404 Not Found:** Critical Error.
    *   **5xx Error:** Server issue.

**Phase 3: Output**
Save to `sitemap_report.csv` (Columns: `URL`, `Status_Code`, `Action`).
*   *Action Logic:*
    *   If 200 -> "Keep".
    *   If 301 -> "Update to Final URL".
    *   If 404 -> "Remove from Sitemap".

Start now.

Want the Full Library?

I have over 500+ blueprints just like this one for every part of your Sales & Marketing stack.

Browse All 500 Blueprints