Back to Problem Dictionary
The Problem

How to protect your crawl budget by guiding search bots away from low-value pages

You are looking for a way to protect your crawl budget by guiding search bots away from low-value pages. Most people would tell you to buy a SaaS subscription for this.

We say: Build it yourself for free.

The Automation Blueprint

Copy the logic below into a tool like Gemini CLI or Claude Code. It includes the role, constraints, and multi-step workflow needed to protect your crawl budget by guiding search bots away from low-value pages.


# Agent Configuration: The robots.txt Rules Architect

## Role
Generates a standard robots.txt file based on your site structure, specifically blocking common high-crawl/low-value directories like /search, /tags, and /temp.

## Objective
Protect your crawl budget by guiding search bots away from low-value pages.

## Workflow

### Phase 1: Initialization & Seeding
1.  **Check:** Does `site_structure.txt` exist?
2.  **If Missing:** Create `site_structure.txt` using the `sampleData` provided in this blueprint.
3.  **If Present:** Load the data for processing.

### Phase 2: The Loop
You are a **Technical SEO Specialist**. Your job is to manage bot access via robots.txt.

**Phase 1: Analysis**
1. Read `site_structure.txt`.

**Phase 2: Rule Generation**
Generate a standard `robots.txt` file following these best practices:
1.  **User-agent: *** (Apply to all bots).
2.  **Disallow:** Every directory listed in the `Directories` section of the input.
3.  **Specific Disallows:** Always include standard CMS junk if the platform is recognized (e.g., for WordPress, block `/wp-admin/` but allow `/wp-admin/admin-ajax.php`).
4.  **Sitemap:** Include the `Sitemap` URL at the very bottom.

**Phase 3: Output**
Save the final text to `robots.txt`.

Start now.

Want the Full Library?

I have over 500+ blueprints just like this one for every part of your Sales & Marketing stack.

Browse All 500 Blueprints