Agent Browsing

An AI agent that controls a real web browser using Playwright, running on Cloudflare Workers. Ask it to browse the web, fill forms, extract information, and more — all through a chat interface with a live browser view.

Built with the Cloudflare Agents SDK, Workers AI, and @cloudflare/playwright.

Features

Browser automation — Navigate, click, type, scroll, and extract text from any public website
Live browser view — Real-time screencast of the browser alongside the chat
Interactive Live View — Full Chrome DevTools access for inspecting DOM, debugging JS, and monitoring network requests
Human-in-the-loop — The agent asks for help when it encounters CAPTCHAs, login screens, or unexpected issues
SSRF protection — Internal and reserved IP addresses are blocked to prevent server-side request forgery
Real-time — WebSocket connection with automatic reconnection and message persistence

Quick start

git clone https://github.com/harshil1712/agent-browsing.git
cd agent-browsing
npm install
cp .env.example .env
# Edit .env with your Cloudflare credentials
npm run dev

Open http://localhost:5173 and ask the agent to browse the web.

Try these prompts:

"Search for iPhone 17 on Amazon"
"Find a hotel in Lisbon"
"Book a table at the Famous Indian Restaurant"

Project structure

src/
  server.ts      # Chat agent with Playwright browser tools
  app.tsx        # Chat UI with browser view panel
  screencast.ts  # CDP screencast for real-time browser streaming
  types.ts       # Shared types, SSRF protection, validation
  client.tsx     # React entry point
  styles.css     # Tailwind + Kumo styles

How it works

The agent uses Workers AI (@cf/moonshotai/kimi-k2.5) to understand your requests and control a headless browser via Playwright tools:

navigate — Go to a URL
page_snapshot — Get a list of interactive elements with ref IDs
click — Click an element by its ref number
fill — Type text into an input field
press — Press keyboard keys (Enter, Tab, Escape, etc.)
scroll — Scroll the page up or down
extract_text — Read visible text content from the page
ask_user — Ask for help when stuck (CAPTCHAs, logins, etc.)

The agent follows an observe-act loop: it takes a page snapshot to "see" the page, then interacts with elements by their ref IDs, and takes another snapshot to verify the result.

Configuration

Required secrets

Set these in your .env file for local development, or via wrangler secret put for production:

Variable	Description
`CLOUDFLARE_ACCOUNT_ID`	Your Cloudflare Account ID (found in the dashboard)
`BROWSER_RENDERING_API_TOKEN`	API token with "Browser Run - Edit" permissions

The Live View feature requires both secrets. Without them, the app still works with the screencast-only view.

Create the API token

Go to API Tokens
Click "Create Token"
Use the "Custom token" template
Grant Browser Run - Edit permission
Copy the token into your .env file

Deploy

npm run deploy

Then set the secrets on your deployed Worker:

npx wrangler secret put CLOUDFLARE_ACCOUNT_ID
npx wrangler secret put BROWSER_RENDERING_API_TOKEN

Tech stack

Cloudflare Workers — Serverless execution
Workers AI — AI model inference (no API key needed)
Browser Run — Headless browser on the edge
@cloudflare/playwright — Playwright bindings for Workers
Agents SDK — Durable Object-based agent framework
Kumo — Cloudflare's design system

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github/workflows		.github/workflows
.vscode		.vscode
public		public
src		src
.env.example		.env.example
.gitignore		.gitignore
.npmrc		.npmrc
.oxfmtrc.json		.oxfmtrc.json
.oxlintrc.json		.oxlintrc.json
.prettierignore		.prettierignore
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README.md		README.md
env.d.ts		env.d.ts
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
vite.config.ts		vite.config.ts
worker-configuration.d.ts		worker-configuration.d.ts
wrangler.jsonc		wrangler.jsonc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agent Browsing

Features

Quick start

Project structure

How it works

Configuration

Required secrets

Create the API token

Deploy

Tech stack

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Agent Browsing

Features

Quick start

Project structure

How it works

Configuration

Required secrets

Create the API token

Deploy

Tech stack

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages