An AI agent that controls a real web browser using Playwright, running on Cloudflare Workers. Ask it to browse the web, fill forms, extract information, and more — all through a chat interface with a live browser view.
Built with the Cloudflare Agents SDK, Workers AI, and @cloudflare/playwright.
- Browser automation — Navigate, click, type, scroll, and extract text from any public website
- Live browser view — Real-time screencast of the browser alongside the chat
- Interactive Live View — Full Chrome DevTools access for inspecting DOM, debugging JS, and monitoring network requests
- Human-in-the-loop — The agent asks for help when it encounters CAPTCHAs, login screens, or unexpected issues
- SSRF protection — Internal and reserved IP addresses are blocked to prevent server-side request forgery
- Real-time — WebSocket connection with automatic reconnection and message persistence
git clone https://github.com/harshil1712/agent-browsing.git
cd agent-browsing
npm install
cp .env.example .env
# Edit .env with your Cloudflare credentials
npm run devOpen http://localhost:5173 and ask the agent to browse the web.
Try these prompts:
- "Search for iPhone 17 on Amazon"
- "Find a hotel in Lisbon"
- "Book a table at the Famous Indian Restaurant"
src/
server.ts # Chat agent with Playwright browser tools
app.tsx # Chat UI with browser view panel
screencast.ts # CDP screencast for real-time browser streaming
types.ts # Shared types, SSRF protection, validation
client.tsx # React entry point
styles.css # Tailwind + Kumo styles
The agent uses Workers AI (@cf/moonshotai/kimi-k2.5) to understand your requests and control a headless browser via Playwright tools:
- navigate — Go to a URL
- page_snapshot — Get a list of interactive elements with ref IDs
- click — Click an element by its ref number
- fill — Type text into an input field
- press — Press keyboard keys (Enter, Tab, Escape, etc.)
- scroll — Scroll the page up or down
- extract_text — Read visible text content from the page
- ask_user — Ask for help when stuck (CAPTCHAs, logins, etc.)
The agent follows an observe-act loop: it takes a page snapshot to "see" the page, then interacts with elements by their ref IDs, and takes another snapshot to verify the result.
Set these in your .env file for local development, or via wrangler secret put for production:
| Variable | Description |
|---|---|
CLOUDFLARE_ACCOUNT_ID |
Your Cloudflare Account ID (found in the dashboard) |
BROWSER_RENDERING_API_TOKEN |
API token with "Browser Run - Edit" permissions |
The Live View feature requires both secrets. Without them, the app still works with the screencast-only view.
- Go to API Tokens
- Click "Create Token"
- Use the "Custom token" template
- Grant Browser Run - Edit permission
- Copy the token into your
.envfile
npm run deployThen set the secrets on your deployed Worker:
npx wrangler secret put CLOUDFLARE_ACCOUNT_ID
npx wrangler secret put BROWSER_RENDERING_API_TOKEN- Cloudflare Workers — Serverless execution
- Workers AI — AI model inference (no API key needed)
- Browser Run — Headless browser on the edge
- @cloudflare/playwright — Playwright bindings for Workers
- Agents SDK — Durable Object-based agent framework
- Kumo — Cloudflare's design system
MIT