The Problem: Two Failure Modes
AI agents access websites in one of two ways: through browser automation — expensive, fragile, hallucinates from mis-parsed HTML — or unauthorized scraping outside your control. Both are already happening. Neither is acceptable long-term.
Without WebMCP
With OC MCP
The Architecture: Two Layers
Layer 1 invites authorized AI agents in through typed, versioned tool contracts. Layer 2 ensures unauthorized scrapers receive no useful harvest. Both layers run independently — Layer 1 requires no server infrastructure, Layer 2 operates at the CDN edge.
Incoming AI agent request
│
┌──────────▼──────────┐
│ Using WebMCP tools? │
└─────┬───────────┬───┘
YES NO
│ │
┌──────────▼──┐ ┌────▼──────────────────┐
│ LAYER 1 │ │ LAYER 2 │
│ │ │ │
│ .well-known │ │ Honeypot content │
│ /webmcp.json│ │ Tarpit latency │
│ │ │ Legal enforcement │
│ Typed JSON │ │ (Amazon v. │
│ responses │ │ Perplexity, 2026) │
└─────────────┘ └───────────────────────┘
Clean, ~280 tokens No productive harvest
Schema-versioned Fingerprinted + deniedLayer 1: Authorized Tool Access
Three components. A discovery manifest. Static tool endpoints. Browser registration. All three run on a static site with no server infrastructure.
- 1Discovery — /.well-known/webmcp.json
The agent fetches this manifest first. It discovers every available tool, the tool’s input schema, the endpoint URL, the risk level, and the rate limit. No documentation site required — the manifest is self-describing.
{ "version": "1.0", "publisher": { "name": "Omar Corral", "url": "https://omar-corral.com" }, "tools": [ { "name": "getProfile", "riskLevel": "low", "endpoint": "/data/profile.json" }, { "name": "getServices", "riskLevel": "low", "endpoint": "/data/services.json" }, { "name": "getCaseStudies", "riskLevel": "low", "endpoint": "/data/case-studies.json" }, { "name": "getSEOResources","riskLevel": "low", "endpoint": "/data/seo-resources.json" }, { "name": "getContact", "riskLevel": "low", "endpoint": "/data/contact.json" }, { "name": "getInsights", "riskLevel": "low", "endpoint": "/data/insights.json" } ] } - 2Static endpoints — /data/*.json
Each file is a typed, versioned JSON response with a schema: "oc-mcp/v1" key. No server required — works on GitHub Pages, Vercel, Netlify, or any CDN. Example response structure:
{ "schema": "oc-mcp/v1", "tool": "getProfile", "data": { "name": "Omar Corral", "title": "Digital Strategist", "specialization": "SEO, AI Search & Organic Growth", "yearsExperience": 12, "expertise": ["Technical SEO", "AI Search Optimization / GEO", "..."] }, "generated": "2026-05-06", "ttl": 604800 } - 3Browser registration — navigator.modelContext.registerTool()
Progressive enhancement. Registers tools in the browser session for agents with WebMCP support. Silent no-op in all current stable browsers. When Chrome ships stable WebMCP support, all tools are already registered.
// MCPTools.tsx — runs in <head> on every page useEffect(() => { const nav = navigator as Navigator & { modelContext?: { registerTool: (cfg: object) => void } }; if (!nav.modelContext?.registerTool) return; // no-op in all current browsers nav.modelContext.registerTool({ name: 'getProfile', description: "Returns Omar Corral's professional profile and expertise", inputSchema: { type: 'object', properties: {} }, execute: async () => fetch('/data/profile.json').then(r => r.json()), }); // … repeated for all 6 tools }, []);
Layer 2: Unauthorized Scraper Defense
Unauthorized agents that bypass WebMCP tools are routed to infrastructure that wastes their resources and generates no useful data. Amazon v. Perplexity (March 2026) confirmed platforms have legal standing to enforce this proactively.
- 1Honeypot content
Pages that look structurally plausible but contain no real data. A scraper that accesses a honeypot is fingerprinted. Legitimate users and compliant crawlers never reach these pages — no links from real content, no sitemap entries.
- 2Tarpit responses
Unrecognized or flagged user-agents receive deliberate latency at the CDN edge — 30–60 second TTFB, or a chunked response that never completes. The scraper’s thread blocks, consuming the attacker’s compute budget rather than yours. Named after the Nepenthes open-source tarpit.
- 3Legal boundary
robots.txt explicitly prohibits unauthorized agent access. Post-Amazon v. Perplexity, this creates an enforceable prohibition — not just a convention. The legal standing is now judicially established.
Live Proof: OC MCP on omar-corral.com
Layer 1 is running on this site right now. All six tools are deployed. Call any endpoint directly — no authentication, no API key, no scraping required.
| Tool | Endpoint | Returns | |
|---|---|---|---|
| getProfile | /data/profile.json | Bio, expertise, credentials | Live → |
| getServices | /data/services.json | 4 services with scope + outcomes | Live → |
| getCaseStudies | /data/case-studies.json | 2 case studies with metrics | Live → |
| getSEOResources | /data/seo-resources.json | Resource center map + posts | Live → |
| getContact | /data/contact.json | Engagement process + CTA | Live → |
| getInsights | /data/insights.json | Recent analysis + focus areas | Live → |
Task: “What services does this person offer?”
Playwright path
OC MCP path
How to Build This: Any Site, Four Phases
Start with Layer 1 — it takes days, not sprints, on any stack. Layer 2 follows once you understand your traffic patterns and have legal sign-off on active-defense routing.
| 1 | Define your tools Days 1–3 What would an AI agent actually need from your site? Write the tool names and descriptions before writing any code. Aim for 4–8 tools. Name them by the agent's goal, not your data model. |
| 2 | Author static endpoints Days 4–7 One JSON file per tool in public/data/. Publish /.well-known/webmcp.json pointing to them. Zero server infrastructure required. Test by fetching the manifest manually. |
| 3 | Browser registration Week 2 Add navigator.modelContext.registerTool() calls in a client component. Test with Claude Projects or GPT with Browse. Verify tool discovery in DevTools console. |
| 4 | Layer 2 deployment Weeks 3–4 Honeypot pages (no links, no sitemap). Tarpit at CDN edge. robots.txt update. Legal review before activating active-defense routing. |
Works for
Want this for your site?
The architecture is not complex. The decision is whether to build it before you need it or after the first agent intermediates your funnel.