Each session runs in an isolated sandbox with:
- VM isolation via Kata Containers (microVMs)
- Persistent storage via JuiceFS (copy-on-write, snapshots)
- Fast startup via warm pool (pre-booted VMs)
- Network isolation via Kubernetes NetworkPolicy
┌─────────────────────────────────────────────────────────────┐
│ Kubernetes Node │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────┐ │
│ │ Sandbox Pod │ │ Sandbox Pod │ │ Warm Pool │ │
│ │ (Kata microVM) │ │ (Kata microVM) │ │ Pods │ │
│ │ │ │ │ │ │ │
│ │ ┌───────────┐ │ │ ┌───────────┐ │ │ (ready to │ │
│ │ │ Agent │ │ │ │ Agent │ │ │ assign) │ │
│ │ └───────────┘ │ │ └───────────┘ │ │ │ │
│ │ │ │ │ │ │ │ │ │
│ │ JuiceFS PVC │ │ JuiceFS PVC │ │ JuiceFS │ │
│ └─────────────────┘ └─────────────────┘ └─────────────┘ │
│ │ │ │ │
│ └───────────────────┴───────────────────┘ │
│ │ │
│ JuiceFS CSI Driver │
│ │ │
└──────────────────────────────┼───────────────────────────────┘
│
┌──────┴──────┐
│ S3 Bucket │
│ (data) │
└──────┬──────┘
│
┌──────┴──────┐
│ Redis │
│ (metadata) │
└─────────────┘
Each sandbox runs in a Kata Container - a lightweight VM using Cloud Hypervisor.
Why Kata? Strong isolation (separate VM, not just namespaces), agents can run arbitrary code safely, sudo without risking host, full Docker-in-Docker support.
Requirements: Node must support nested virtualization (DigitalOcean, Vultr work; Hetzner Cloud doesn't).
Resources: Default 4 vCPUs, 4GB RAM per VM (configurable via KATA_VM_CPUS, KATA_VM_MEMORY_MB).
JuiceFS provides POSIX-compliant persistent storage with:
- Copy-on-write: Efficient snapshots
- S3 backend: Data stored in object storage
- Redis metadata: Fast file operations
Agent Pod ──► virtiofs ──► JuiceFS mount ──► S3 + Redis
The JuiceFS CSI driver mounts volumes into pods. For Kata Containers, virtiofs passes the mount into the VM with caching enabled.
/agent/ # PVC mount point (persistent)
├── workspace/ # User's code
├── docker/ # Docker data
├── .local/share/mise/ # Installed tools
├── .cache/ # Package caches
├── .claude/ # SDK session data
└── .session-mapping.json # Session ID mapping
JuiceFS with S3 backend has high latency for small file operations. Caching is essential:
| Configuration | IOPS |
|---|---|
| No caching | ~30 |
| + JuiceFS writeback | ~400 |
| + virtiofs cache | ~650 |
Configuration in infra/k8s/juicefs-config.yaml:
mountOptions:
- writeback # Async writes
- cache-dir=/var/jfsCache
- cache-size=102400 # 100GB cacheSee JuiceFS Maintenance Guide for:
- Garbage collection
- Trash cleanup
- Monitoring Redis memory
The warm pool keeps pre-booted VMs ready for instant session allocation.
- SandboxWarmPool maintains N ready pods
- Session creation claims a pod from the pool
- Agent calls control-plane API to get session config
- Pool replenishes automatically
apiVersion: extensions.agents.x-k8s.io/v1alpha1
kind: SandboxWarmPool
metadata:
name: netclode-pool
spec:
replicas: 2 # Number of warm pods
templateRef:
name: netclode-agent # SandboxTemplate to useEnable in control-plane:
env:
- name: WARM_POOL_ENABLED
value: "true"| Mode | Startup Time |
|---|---|
| Cold start (no warm pool) | ~30s |
| Warm pool | ~1s |
Since warm pool pods start before session assignment, they can't receive per-session env vars at boot. Instead, agents connect via gRPC and receive config when a session is assigned:
- Agent reads Kubernetes ServiceAccount token from
/var/run/secrets/kubernetes.io/serviceaccount/token - Agent connects to control-plane via gRPC with the token
- Control-plane validates token via Kubernetes TokenReview API (prevents impersonation)
- When a SandboxClaim binds to this pod, control-plane pushes
SessionAssignedmessage with config
This provides mutual authentication - the control-plane cryptographically verifies the agent's pod identity.
Represents a running sandbox pod:
apiVersion: agents.x-k8s.io/v1alpha1
kind: Sandbox
metadata:
name: sandbox-sess-abc123
spec:
runtimeClassName: kata-clh
template:
spec:
containers:
- name: agent
image: ghcr.io/angristan/netclode-agent:latestClaims a pod from the warm pool:
apiVersion: extensions.agents.x-k8s.io/v1alpha1
kind: SandboxClaim
metadata:
name: claim-sess-abc123
spec:
poolRef:
name: netclode-poolDefines the pod template for warm pool:
apiVersion: extensions.agents.x-k8s.io/v1alpha1
kind: SandboxTemplate
metadata:
name: netclode-agent
spec:
template:
spec:
runtimeClassName: kata-clh
containers:
- name: agent
# ...
volumeClaimTemplates:
- metadata:
name: workspace
spec:
storageClassName: juicefs-sc
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 50GiSandboxes are network-isolated via Kubernetes NetworkPolicy.
Sandboxes can:
- Reach the control-plane (for config, events)
- Reach the secret-proxy (for API requests)
- Resolve DNS
- Access the public internet (default)
Sandboxes cannot:
- Reach other pods (10.42.0.0/16)
- Reach services (10.43.0.0/16) except control-plane and secret-proxy
- Reach private networks (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16)
- Reach Tailnet (100.64.0.0/10) by default
See docs/network-access.md for policy details.
Enable with --tailnet flag when creating a session:
netclode sessions create --repo owner/repo --repo owner/other --tailnetThis allows the sandbox to reach other devices on your Tailscale network.
Exposed ports allow inbound traffic from the Tailnet:
ingress:
- ports:
- port: 3000
from:
- ipBlock:
cidr: 100.64.0.0/10 # Tailscale rangeAPI keys (Anthropic, OpenAI, etc.) are protected using a two-tier proxy architecture. Real secrets never enter the sandbox microVM.
┌─────────────────────────────────────────────────────────────────────────────┐
│ KATA MICROVM (Sandbox) │
│ │
│ ┌─────────┐ HTTP_PROXY ┌─────────────┐ │
│ │ SDK │ ───────────────── │ auth-proxy │ │
│ │ (Claude)│ localhost:8080 │ │ │
│ └─────────┘ └──────┬──────┘ │
│ │ │ │
│ │ ANTHROPIC_API_KEY= │ Adds: Proxy-Authorization │
│ │ NETCLODE_PLACEHOLDER_xxx │ Bearer <SA token> │
│ │ │ │
│ │ (NO real secrets) │ (NO real secrets) │
└───────┼───────────────────────────────┼─────────────────────────────────────┘
│ │
│ ▼
│ ┌───────────────────────────────┐
│ │ secret-proxy Service │
│ │ (OUTSIDE the microVM) │
│ │ │
│ │ 1. Validate token with │
│ │ control-plane │
│ │ 2. Check SDK type → hosts │
│ │ 3. Replace placeholder │
│ │ with real secret │
│ │ │
│ │ (HAS real secrets) │
│ └───────────────┬───────────────┘
│ │
│ ▼
│ ┌───────────────┐
│ │ Internet │
│ └───────────────┘
- Placeholder injection: Agent sees
ANTHROPIC_API_KEY=NETCLODE_PLACEHOLDER_anthropic - Local proxy:
HTTP_PROXY=localhost:8080routes traffic through auth-proxy - Token auth: auth-proxy reads mounted ServiceAccount token, adds to request
- Validation: secret-proxy validates token with control-plane (token → pod → session → SDK type)
- Secret injection: If target host is allowed for SDK type, placeholder is replaced with real secret
| SDK Type | Allowed API Hosts |
|---|---|
| Claude | api.anthropic.com |
| OpenCode | api.anthropic.com, api.openai.com, api.mistral.ai, openrouter.ai, api.openrouter.ai, api.opencode.ai, open.bigmodel.cn |
| Copilot | api.github.com, copilot-proxy.githubusercontent.com, api.anthropic.com |
| Codex | api.openai.com |
- No secret exfiltration: Even with RCE, attacker only sees placeholder values
- Host restriction: Secrets only sent to allowlisted API endpoints
- Per-session authorization: Claude session can't use OpenAI key
- Cryptographic identity: Token-based auth via K8s TokenReview API
For detailed documentation, see Secret Proxy Architecture.
create ──► creating ──► ready ◄──► running
│ │ │
│ ▼ │
│ paused ◄───────┘
│ │
└──────────┴──────► deleted
- Control-plane creates Sandbox (or SandboxClaim for warm pool)
- Kata boots a microVM
- JuiceFS PVC is mounted
- Agent starts and registers with control-plane
Triggered manually, by capacity limit (MAX_ACTIVE_SESSIONS), or by idle timeout (IDLE_TIMEOUT_MINUTES):
- Control-plane deletes Sandbox (VM stops)
- Session anchor ConfigMap preserves PVC
- PVC retains workspace data
- Control-plane creates new Sandbox with same PVC
- New VM boots with preserved workspace
- Agent registers and resumes SDK session
- Session anchor ConfigMap deleted
- PVC garbage collected
- JuiceFS data eventually cleaned up
When paused, the Sandbox CR is deleted but we need to keep the PVC. A ConfigMap "anchor" acts as a second owner:
- Session created → ConfigMap
session-anchor-<id>created - PVC gets two
ownerReferences: Sandbox + ConfigMap - Pause → Sandbox deleted, ConfigMap keeps PVC alive
- Resume → New Sandbox uses existing PVC
- Delete → ConfigMap deleted, PVC garbage collected
Check available resources:
kubectl describe node | grep -A5 "Allocated resources"Check warm pool status:
kubectl --context netclode -n netclode get sandboxwarmpool
kubectl --context netclode -n netclode get pods -l agents.x-k8s.io/poolCheck CSI driver logs:
kubectl --context netclode -n kube-system logs -l app=juicefs-csi-driverVerify secret exists:
kubectl --context netclode -n netclode get secret juicefs-secretCheck containerd config:
ssh root@netclode cat /var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl | grep kataVerify Kata runtime:
ssh root@netclode /opt/kata/bin/kata-runtime kata-envList anchors:
kubectl --context netclode -n netclode get configmap -l netclode.dev/component=session-anchorCheck PVC ownership:
kubectl --context netclode -n netclode get pvc <pvc-name> -o jsonpath='{.metadata.ownerReferences}' | jq- Max concurrent sessions: 1-2
- Warm pool replicas: 1
- Scale down CoreDNS: 2 replicas
- Max concurrent sessions: 3-5
- Warm pool replicas: 2-3
- Consider separate Redis for JuiceFS metadata