config-linux: Add security considerations for linux.devices raw block I/O

## Problem

The `linux.devices` and `linux.resources.devices` sections of `config-linux.md` describe how to configure device access for containers but include no security guidance about the implications of granting `r` (read) or `w` (write) access to block devices.

When a block device is configured in `linux.devices` and `linux.resources.devices` grants `access: "rw"` or `"rwm"`, the container process can perform **raw block-level I/O** via standard `read()` and `write()` syscalls — regardless of the process capabilities set.

Specifically:

- `read()` on a block device fd does **not** require `CAP_SYS_RAWIO` or any other capability
- `write()` on a block device fd does **not** require `CAP_SYS_RAWIO` or any other capability
- `mount()` correctly requires `CAP_SYS_ADMIN`

This means a container with a block device entry and **only the default unprivileged capability set** can read the entire contents of the host device (including all filesystem data, credentials, and keys) and potentially write to it (modifying or corrupting the host filesystem at the block level).

The specification does not document this behavior. As a result, runtime implementors and container orchestrators may assume that Linux capabilities serve as a security boundary for device access — which they do for `mount()`, but not for raw I/O.

## Impact

The gap affects the entire container ecosystem that consumes this specification:

- **Container runtimes** (runc, crun, youki) faithfully implement the spec and create device nodes with the specified access — no additional validation is performed on block devices
- **Container orchestrators** (containerd, CRI-O, Docker) populate `linux.devices` based on higher-level configuration (`--device`, device plugins, `hostPath BlockDevice`) without security warnings
- **Kubernetes** exposes block devices via `hostPath type: BlockDevice`, device plugins (GPU, FPGA, SR-IOV), and CSI raw block volumes — all of which result in `linux.devices` entries
- **Security tooling** (admission controllers, policy engines) commonly audit capabilities and seccomp profiles but rarely inspect device cgroup rules for block device access

### Verified behavior

Tested with runc 1.3.4 on cgroup v2 (eBPF device controller), default seccomp profile active:

```
# Container capabilities (default set, no SYS_ADMIN, no SYS_RAWIO):
CapPrm: 0x00000000a80425fb

# mount() — correctly blocked:
mount: permission denied (are you root?)

# Raw read via dd — succeeds, extracts host /etc/shadow:
$ dd if=/dev/hostdisk bs=4096 count=38400 2>/dev/null | strings | grep '^root:'
root:x:0:0:root:/root:/bin/sh
root:*::0:::::

# Raw write via dd — succeeds:
$ echo TEST | dd of=/dev/hostdisk bs=1 seek=153000000 count=5 conv=notrunc
5+0 records in
5+0 records out
```

## Proposed Changes

### 1. Add security note to `linux.devices` section

After the existing description of `linux.devices`, add:

> **Security consideration**: Creating a block device node (type `"b"`) and granting `r` or `w` access in `linux.resources.devices` allows the container process to perform raw block-level I/O on the underlying host device using standard `read()` and `write()` syscalls. These syscalls are not gated by any Linux capability — device cgroup permission and Unix file permissions are the only controls. Removing `CAP_SYS_ADMIN` prevents `mount()` but does not prevent raw data access.
>
> Runtimes and orchestrators SHOULD warn when block devices are configured with read or write access. Effective defenses include user namespaces (remapped UID 0 cannot open root-owned device nodes) and running container processes as non-root users.

### 2. Add note to `linux.resources.devices` access field

After the `access` field description, add:

> **Note**: The `r` and `w` permissions control access through the device cgroup controller (or eBPF device program on cgroup v2). When applied to block devices, these permissions enable raw block-level I/O that is independent of Linux capabilities. `CAP_SYS_RAWIO` is not required for `read()` or `write()` on block device file descriptors.

## References

- [GHSA-g54h-m393-cpwq](https://github.com/opencontainers/runc/security/advisories/GHSA-g54h-m393-cpwq) — runc devices resource list treated as denylist (related)
- [kernel.org: devices cgroup v1](https://www.kernel.org/doc/Documentation/cgroup-v1/devices.txt) — `r`/`w` control `read()`/`write()` on device inodes
- [superuser.com/q/842525](https://superuser.com/questions/842525/does-dd-require-cap-sys-rawio) — confirms `dd` uses `read()`/`write()`, not raw I/O ioctls
- [config-linux.md#devices](https://github.com/opencontainers/runtime-spec/blob/main/config-linux.md#devices) — current spec text
- [PR #1214](https://github.com/opencontainers/runtime-spec/pull/1214) — ongoing work to deprecate device access denial (related device cgroup work)
- [PR #1148](https://github.com/opencontainers/runtime-spec/pull/1148) — device node location clarification (related)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

config-linux: Add security considerations for linux.devices raw block I/O #1313

Problem

Impact

Verified behavior

Proposed Changes

1. Add security note to `linux.devices` section

2. Add note to `linux.resources.devices` access field

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

config-linux: Add security considerations for linux.devices raw block I/O #1313

Description

Problem

Impact

Verified behavior

Proposed Changes

1. Add security note to linux.devices section

2. Add note to linux.resources.devices access field

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. Add security note to `linux.devices` section

2. Add note to `linux.resources.devices` access field