Skip to content

feat(local-executor): add opt-in sandbox posture to LocalCommandLineCodeExecutor#7598

Draft
xr843 wants to merge 1 commit intomicrosoft:mainfrom
xr843:feat/local-executor-sandbox-flag
Draft

feat(local-executor): add opt-in sandbox posture to LocalCommandLineCodeExecutor#7598
xr843 wants to merge 1 commit intomicrosoft:mainfrom
xr843:feat/local-executor-sandbox-flag

Conversation

@xr843
Copy link
Copy Markdown

@xr843 xr843 commented Apr 17, 2026

Summary

Refs #7462. Supersedes closed #7467.

Adds an explicit three-state sandbox parameter to LocalCommandLineCodeExecutor so callers cannot accidentally rely on an easily-suppressed UserWarning as their only signal that LLM-generated code is running unsandboxed.

sandbox Behavior
None (default, legacy) Emits DeprecationWarning and logger.warning(). Execution unchanged — fully backward compatible. In a future release this parameter becomes required.
False Explicit acknowledgement of unsandboxed execution. Silent — the caller has accepted the risk.
True Best-effort in-process hardening (details below).

sandbox=True hardening

  1. Environment scrub: entries whose name matches credential patterns (TOKEN, SECRET, API_KEY, PASSWORD, PRIVATE_KEY, CREDENTIAL, SESSION, COOKIE, AUTH) are removed from the child process — the most common path by which LLM-generated code exfiltrates provider keys.
  2. POSIX rlimits via preexec_fn: RLIMIT_CPU, RLIMIT_AS, RLIMIT_NOFILE, RLIMIT_NPROC cap runaway memory and fork-bomb payloads.
  3. Windows degrade path: preexec_fn is unavailable → env scrub applies but a UserWarning directs callers to DockerCommandLineCodeExecutor for strong isolation.

This is NOT a substitute for DockerCommandLineCodeExecutor. The docstring says so verbatim. Adversarial payloads can still read files, call out to the network, and write within work_dir. The opt-in flag exists to:

Backward compatibility

  • Constructor signature is additive (sandbox: Optional[bool] = None).
  • LocalCommandLineCodeExecutorConfig gains the same field so serialization round-trips the posture — a declarative deployment cannot silently downgrade to the default warning path.
  • Existing callers keep working; they just receive a DeprecationWarning with actionable guidance.

Threading to the issue's open questions

Test plan

Four new tests in test_commandline_code_executor.py:

  • test_sandbox_default_emits_deprecation_warning — default path raises DeprecationWarning
  • test_sandbox_false_is_silent_opt_out — explicit acknowledgement emits nothing
  • test_sandbox_true_strips_credential_env (POSIX-only) — subprocess sees <missing> for MY_API_KEY/SOME_TOKEN, HARMLESS_VAR survives
  • test_sandbox_roundtrips_through_configdump_component / load_component preserve the posture and don't re-trigger the default warning

Local run: pytest tests/code_executors/test_commandline_code_executor.py → 17 passed, 1 skipped (pre-existing venv skip). ruff check, ruff format --check, and mypy all clean on the touched files.

Follow-ups (not in scope here)

Happy to split this into two PRs (warning upgrade vs. sandbox=True hardening) if maintainers prefer smaller reviews. Drafting so the direction can be sanity-checked before polish on the Windows path.

…odeExecutor

Refs microsoft#7462. Supersedes closed PR microsoft#7467.

The legacy `UserWarning` at construction was easily suppressed by production
warning filters and `python -W ignore`, leaving unsandboxed execution of
LLM-generated code as the silent default. This change introduces an explicit
three-state sandbox posture parameter:

- sandbox=None   (default, legacy behavior for backward compatibility):
                 DeprecationWarning + logger.warning() surface the risk in
                 both Python warning channels and structured logging
                 pipelines. A future release will make this parameter
                 required.
- sandbox=False  Caller explicitly acknowledges unsandboxed execution;
                 no warning is emitted.
- sandbox=True   Best-effort in-process hardening:
                 * Environment entries whose name contains credential
                   patterns (TOKEN, SECRET, API_KEY, PASSWORD,
                   PRIVATE_KEY, CREDENTIAL, SESSION, COOKIE, AUTH) are
                   stripped from the child process.
                 * On POSIX, per-child rlimits (RLIMIT_CPU, RLIMIT_AS,
                   RLIMIT_NOFILE, RLIMIT_NPROC) are applied via
                   preexec_fn so runaway memory/fork-bomb payloads are
                   capped.
                 * On Windows, env scrub applies but preexec is
                   unavailable; a UserWarning directs callers to the
                   Docker executor for strong isolation.

Docstring and LocalCommandLineCodeExecutorConfig are updated to round-trip
the posture through serialization so declarative deployments cannot silently
downgrade.

This is NOT a substitute for DockerCommandLineCodeExecutor — adversarial
payloads can still read files, make outbound connections, and write to
work_dir. The docstring states this explicitly.

Tests cover: default DeprecationWarning, explicit opt-out silence, env
scrubbing on POSIX, and config round-trip.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant