Documentation
Getting Started/Traffic Control

Traffic Control

Spyglasses gives you powerful control over which AI agents and bots can access your website. You can block malicious scrapers, AI model trainers, and other unwanted traffic while ensuring legitimate bots like search engines can still crawl your site.

What You'll Learn

In this guide, you'll learn how to:

  • Configure basic bot blocking settings
  • Create custom block and allow rules
  • Exclude specific paths from monitoring
  • Implement traffic control on different platforms
  • Use advanced pattern matching for fine-grained control

Configuration Options

Spyglasses offers several configuration options to control traffic to your website:

Block AI Model Trainers

The simplest way to protect your content from being used to train AI models is to enable the blockAiModelTrainers option. This automatically blocks known AI model training bots like GPTBot, Claude-Bot, and others.

Custom Block Rules

Use customBlocks to block specific bots or categories of bots. You can specify:

  • Categories: Block entire categories like category:Scraper
  • Patterns: Block specific bot names like pattern:SomeBot
  • User agents: Block specific user agent strings

Custom Allow Rules

Use customAllows to override blocks and ensure important bots can always access your site. Allow rules take precedence over block rules.

Path Exclusions

Use excludePaths to exclude certain paths from monitoring entirely. This is useful for health checks, admin pages, or API endpoints.

Platform Implementation

(Code Configuration)

For Next.js applications and other sites that are built with code, you configure traffic control directly in your middleware code. Here's a comprehensive example:

// middleware.ts
import { createSpyglassesMiddleware } from '@spyglasses/next';
 
export default createSpyglassesMiddleware({
  apiKey: process.env.SPYGLASSES_API_KEY,
  debug: process.env.SPYGLASSES_DEBUG === 'true',
  
  // Block AI model trainers
  blockAiModelTrainers: true,
  
  // Custom block rules
  customBlocks: [
    'category:Scraper',        // Block all scrapers
    'category:Crawler',        // Block aggressive crawlers
    'pattern:BadBot',          // Block specific bot
    'pattern:.*scraper.*',     // Block anything with "scraper" in name
  ],
  
  // Custom allow rules (override blocks)
  customAllows: [
    'pattern:Googlebot',       // Always allow Google
    'pattern:Bingbot',         // Always allow Bing
    'pattern:facebookexternalhit', // Allow Facebook previews
  ],
  
  // Exclude paths from monitoring
  excludePaths: [
    '/health',                 // Health check endpoint
    '/api/status',            // Status endpoint
    /^\/admin/,               // Admin section (regex)
    /^\/internal/,            // Internal tools
  ],
});
 
export const config = {
  matcher: ['/((?!_next|api|favicon.ico|.*\\.(jpg|jpeg|gif|png|svg|ico|css|js)).*)'],
};

WordPress (Plugin Interface)

For WordPress sites and other platforms that use plugins, Spyglasses provides a user-friendly admin interface to configure traffic control without touching code.

WordPress Traffic Control Settings The Bot Blocking Settings interface showing the main toggle for blocking AI model trainers and category-based blocking rules. Each category (AI Visitors, AI Model Trainers, Crawler, Scraper, etc.) can be individually configured with Block or Allow settings.

Category-Based Rules

The WordPress plugin organizes bots into logical categories, making it easy to apply rules to entire groups:

  • AI Visitors: Includes AI assistants like ChatGPT, Claude, and Perplexity users
  • AI Model Trainers: Bots specifically designed to collect training data (GPTBot, Claude-Bot, etc.)
  • Crawler: General web crawlers and search engine bots
  • Scraper: Content scrapers and data collection bots
  • Special Purpose: Specialized bots for specific functions
  • Unknown: Unclassified bot traffic

You can quickly block or allow entire categories with a single click, and the interface provides immediate visual feedback on your current settings.

Pattern-Based Rules

For more granular control, switch to the "By Pattern" tab to manage individual bot patterns:

WordPress Pattern-Based Rules The pattern-based interface showing specific bot user agents with their categories and individual Block/Allow settings. Notice how GPTBot is set to "Block" while Googlebot is set to "Allow", demonstrating fine-grained control.

This view shows:

  • Individual bot patterns with their exact user agent strings
  • Hierarchical categorization (e.g., "AI Visitors > AI Assistants")
  • Current status with clear Block/Allow indicators
  • Search functionality to quickly find specific patterns
  • Visual color coding - blocked items show in red, allowed items in green

The interface makes it easy to override category settings for specific bots. For example, you might block the entire "AI Model Trainers" category but allow a specific research bot that you trust.

Advanced Configuration Examples

Protecting Specific Content

Block bots from accessing your most valuable content while allowing them to crawl general pages:

export default createSpyglassesMiddleware({
  apiKey: process.env.SPYGLASSES_API_KEY,
  blockAiModelTrainers: true,
  customBlocks: [
    'category:AI',             // Block AI bots from premium content
  ],
  excludePaths: [
    /^\/premium\//,           // Don't monitor premium section
    /^\/members-only\//,      // Don't monitor member content
  ],
});

E-commerce Protection

Protect product data while allowing legitimate shopping bots:

export default createSpyglassesMiddleware({
  apiKey: process.env.SPYGLASSES_API_KEY,
  blockAiModelTrainers: true,
  customBlocks: [
    'category:Scraper',        // Block price scrapers
    'pattern:.*price.*',       // Block price monitoring bots
  ],
  customAllows: [
    'pattern:Googlebot',       // Allow Google Shopping
    'pattern:ShoppingBot',     // Allow legitimate shopping bots
  ],
});

Content Publisher Setup

Ideal for blogs and news sites that want to protect their content:

export default createSpyglassesMiddleware({
  apiKey: process.env.SPYGLASSES_API_KEY,
  blockAiModelTrainers: true,
  customBlocks: [
    'category:AI',             // Block AI content harvesters
    'category:Scraper',        // Block content scrapers
  ],
  customAllows: [
    'pattern:Googlebot',       // Allow search engines
    'pattern:Bingbot',
    'pattern:facebookexternalhit', // Allow social previews
    'pattern:TwitterBot',
  ],
  excludePaths: [
    '/sitemap.xml',           // Don't monitor sitemaps
    '/robots.txt',            // Don't monitor robots.txt
    /^\/feed/,                // Don't monitor RSS feeds
  ],
});

Testing Your Configuration

After implementing traffic control, you can test your configuration:

  1. Check the Spyglasses dashboard to see which bots are being blocked
  2. Monitor your server logs for blocked requests
  3. Use browser developer tools to test excluded paths
  4. Verify search engine access using Google Search Console

Best Practices

Start Conservative

Begin with basic settings and gradually add more restrictive rules:

// Start with this
export default createSpyglassesMiddleware({
  apiKey: process.env.SPYGLASSES_API_KEY,
  blockAiModelTrainers: true, // Start here
});
 
// Then add custom rules as needed

Always Allow Search Engines

Make sure legitimate search engines can access your content:

customAllows: [
  'pattern:Googlebot',
  'pattern:Bingbot',
  'pattern:DuckDuckBot',
  'pattern:YandexBot',
]

Monitor Impact

Regularly check your analytics to ensure you're not blocking legitimate traffic. The Spyglasses dashboard provides detailed reports on blocked requests.

Use Exclusions Wisely

Exclude paths that don't need protection or monitoring:

excludePaths: [
  '/health',                // Health checks
  '/api/public',           // Public APIs
  /^\/static\//,           // Static assets
]

Troubleshooting

Bot Still Getting Through

If unwanted bots are still accessing your site:

  1. Check if they match an allow rule
  2. Verify your patterns are correct
  3. Look for new bot user agents in your logs
  4. Contact support for help with custom patterns

Legitimate Traffic Blocked

If you're accidentally blocking legitimate traffic:

  1. Add specific allow rules for important bots
  2. Check your custom block patterns aren't too broad
  3. Review your exclusion paths
  4. Test with debug mode enabled