Skip to content

Code Mode - Script Evaluation Tool for Dynamic LLM-Generated Code #710

@manusa

Description

@manusa

Description

Background

The kubernetes-mcp-server currently provides a set of pre-defined MCP tools for Kubernetes operations. While these tools cover common use cases, there are scenarios where LLMs could benefit from more flexibility to perform custom operations and transformations.

Feature Request

Implement a new "Code Mode" feature that enables LLMs to provide dynamically generated scripts that the MCP server evaluates at runtime. This allows AI assistants to:

  1. Retrieve Kubernetes resources via pre-configured clients (dynamic, core, typed, metrics)
  2. Transform and filter output to produce more digestible responses that consume fewer tokens
  3. Perform complex multi-step operations that would otherwise require multiple tool calls

Use Case Example

An LLM could generate a script that:

  • Lists all pods across namespaces
  • Filters only those with restart count > 0
  • Extracts only name, namespace, and restart count
  • Returns a compact summary instead of full pod specs

This reduces token consumption and provides tailored responses.

Technical Solution

JavaScript via Goja is used for script execution:

  • Pure Go implementation (no cgo)
  • LLMs are highly proficient in JavaScript
  • Automatic Go struct/method exposure via reflection
  • Good sandboxing (isolated runtime, execution timeout support)
  • Battle-tested (used by Grafana k6, Ethereum go-ethereum)

SDK Design

The tool exposes the following globals to scripts:

Global Description
k8s Kubernetes client with access to typed and dynamic clients
ctx Request context for cancellation (values masked for security)
namespace Default namespace from kubeconfig

k8s API Clients

  • k8s.coreV1() - pods, services, configMaps, secrets, namespaces, nodes, etc.
  • k8s.appsV1() - deployments, statefulSets, daemonSets, replicaSets
  • k8s.batchV1() - jobs, cronJobs
  • k8s.networkingV1() - ingresses, networkPolicies
  • k8s.rbacV1() - roles, roleBindings, clusterRoles, clusterRoleBindings
  • k8s.metricsV1beta1Client() - pod and node metrics (CPU/memory usage)
  • k8s.dynamicClient() - any resource by GVR
  • k8s.discoveryClient() - API discovery

LLM-Friendly Features

  • Case-insensitive method resolution: CoreV1(), coreV1(), COREV1() all work
  • Standard Kubernetes YAML/JSON structure: Objects with metadata wrapper are transparently converted
  • API introspection: Scripts can discover available methods and resources at runtime

Tool Interface

{
  "name": "evaluate_script",
  "inputSchema": {
    "type": "object",
    "properties": {
      "script": {
        "type": "string",
        "description": "JavaScript code to execute. The last expression is returned as the result."
      },
      "timeout": {
        "type": "integer",
        "description": "Execution timeout in milliseconds (default: 30000, max: 300000)"
      }
    },
    "required": ["script"]
  }
}

Security Requirements

  • Sandboxed execution environment (Goja runtime isolation)
  • Execution timeout to prevent infinite loops (default 30s, max 5min)
  • Call stack depth limit to prevent stack overflow (max 1024)
  • No file system access from scripts
  • No network access from scripts (except via provided K8s clients)
  • No access to environment variables or system information
  • No timer functions (setTimeout, setInterval)
  • Context values masked to prevent credential leakage (safeContext)
  • Existing access control (denied_resources, RBAC) enforced through K8s clients

Note: Memory limits are not built into Goja. For production deployments with untrusted scripts, run the MCP server in a container with memory limits.

Acceptance Criteria

  • New toolset code created under pkg/toolsets/code/
  • Toolset is opt-in (disabled by default)
  • Single evaluate_script tool implemented using Goja
  • KubernetesClient interface exposed to scripts as k8s global
  • Additional globals: ctx (safe context), namespace (default namespace)
  • Execution timeout enforced with configurable limit
  • Script output captured and returned as tool result
  • Tool description includes SDK documentation for LLM consumption
  • Unit tests covering script evaluation scenarios
  • Integration tests validating Kubernetes operations from scripts
  • Security tests verifying sandbox restrictions

References

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions