The Datadog Agent is a comprehensive monitoring and observability agent written primarily in Go. It collects metrics, traces, logs, and security events from systems and applications, forwarding them to the Datadog platform. This is the main repository for Agent versions 6 and 7.
-
/cmd/- Entry points for various agent componentsagent/- Main agent binarycluster-agent/- Kubernetes cluster agentdogstatsd/- StatsD metrics daemontrace-agent/- APM trace collection agentsystem-probe/- System-level monitoring (eBPF)security-agent/- Security monitoringprocess-agent/- Process monitoringprivateactionrunner/- Executing actions
-
/pkg/- Core Go packages and librariesaggregator/- Metrics aggregationcollector/- Check scheduling and executionconfig/- Configuration managementlogs/- Log collection and processingmetrics/- Metrics types and handlingnetwork/- Network monitoringsecurity/- Security monitoring componentstrace/- APM tracing components
-
/comp/- Component-based architecture modulescore/- Core componentsmetadata/- Metadata collectionlogs/- Log componentstrace/- Trace components
-
/tasks/- Python invoke tasks for development- Build, test, lint, and deployment automation
-
/rtloader/- Runtime loader for Python checks
This project uses extensive custom Go build tags. Most source files are ignored
by the standard Go toolchain unless the correct tags are passed. The dda inv
wrapper tasks (defined in tasks/) compute the right build tags automatically.
Never run these commands directly:
| Instead of | Use |
|---|---|
go build … |
dda inv agent.build, dda inv cluster-agent.build, etc. |
go test … |
dda inv test --targets=./pkg/… |
go mod tidy |
dda inv tidy |
go vet … |
dda inv linter.go |
golangci-lint run … |
dda inv linter.go |
This also applies to indirect usage — do not shell out to go build or
go test for compilation checks. If you need to verify that code compiles,
build the relevant component with dda inv *.build.
# install dda on mac OS
brew install --cask dda
# Install development tools
dda inv install-tools
# Build the main agent
dda inv agent.build --build-exclude=systemd
# Build specific components
dda inv dogstatsd.build
dda inv trace-agent.build
dda inv system-probe.build# Run all tests
dda inv test
# Test specific package
dda inv test --targets=./pkg/aggregator
# Run Go linters
dda inv linter.go
# Run all linters
dda inv linter.all# Create dev config with testing API key
echo "api_key: 0000001" > dev/dist/datadog.yaml
# Run the agent
./bin/agent/agent run -c bin/agent/dist/datadog.yamlThe development configuration file should be placed at dev/dist/datadog.yaml. After building, it gets copied to bin/agent/dist/datadog.yaml.
- Checks are Python or Go modules that collect metrics
- Located in
cmd/agent/dist/checks/ - Can be autodiscovered via Kubernetes annotations/labels
- Main config:
datadog.yaml - Check configs:
conf.d/<check_name>.d/conf.yaml - Supports environment variable overrides with
DD_prefix
- Checks using eBPF probes require system-probe module running
- Examples: tcp_queue_length, oom_kill, seccomp_tracer
- Module code (system-probe):
pkg/collector/corechecks/ebpf/probe/<check>/ - Check code (agent):
pkg/collector/corechecks/ebpf/<check>/ - System-probe modules:
cmd/system-probe/modules/ - Configuration: Set
<check_name>.enabled: truein system-probe config - See
pkg/collector/corechecks/ebpf/AGENTS.mdfor detailed structure - Quick reference:
.cursor/rules/system_probe_modules.mdcfor common patterns and pitfalls
eBPF programs, runtime compilation bundles, and cgo godefs type definitions
are built with Bazel. Two convenience targets in pkg/ebpf/BUILD.bazel
cover the most common workflows:
# Build every eBPF .o program and runtime flattened .c file at once
bazel build //pkg/ebpf:all_ebpf_programs
# Verify all committed cgo godefs files are up to date.
# Covers both Linux and Windows targets; incompatible tests are
# skipped automatically via target_compatible_with.
bazel test //pkg/ebpf:verify_generated_filesWhen a verify_generated_files test fails, run the corresponding
write_source_file target to update the committed file:
# Update a single cgo godefs output
bazel run //pkg/ebpf:types_godefsRuntime compilation integrity hash files (pkg/ebpf/bytecode/runtime/*.go) are
.gitignored and generated during the build by bazel_build_ebpf(). To update
one locally: bazel run //pkg/ebpf/bytecode:<name>_verify.
Key Bazel macros:
ebpf_prog/ebpf_program_suite(bazel/rules/ebpf/ebpf.bzl) — compile.c→.ocgo_godefs(bazel/rules/ebpf/cgo_godefs.bzl) —go tool cgo -godefs+write_source_fileverificationruntime_compilation_bundle(bazel/rules/ebpf/runtime_compilation.bzl) — flatten headers + generate integrity hash.gofile
- Go tests run via
dda inv test(not rawgo test) - Python tests using pytest
- Run with
dda inv test --targets=<package>
- E2E tests live in
test/new-e2e/tests/and use the framework intest/e2e-framework/ - Tests provision real AWS, GCP or Azure infrastructure, deploy the agent, and assert payloads
arrive in fakeintake (a mock Datadog intake). By default the fakeintake forwards payloads to
dddevorg account. - Key docs:
test/e2e-framework/AGENTS.md(framework),test/fakeintake/AGENTS.md(intake mock),docs/public/how-to/test/e2e.md(setup & running) - Use
/write-e2eskill or read those docs directly to write new E2E tests - Run locally:
dda inv new-e2e-tests.run --targets=./tests/<area>/...
- Go: golangci-lint via
dda inv linter.go - Python: various linters via
dda inv linter.python - YAML: yamllint
- Shell: shellcheck
The project uses Python's Invoke framework with custom tasks. Main task categories:
agent.*- Core agent taskstest- Testing taskslinter.*- Linting tasksdocker.*- Docker image tasksrelease.*- Release management
Go build tags control feature inclusion, some examples are:
kubeapiserver- Kubernetes API server supportcontainerd- containerd supportdocker- Docker supportebpf- eBPF supportpython- Python check support- and MANY more, refer to ./tasks/build_tags.py for a full reference.
datadog.yaml- Main agent configurationmodules.yml- Go module definitionsrelease.json- Release version information.gitlab-ci.yml- CI/CD pipeline configuration
/docs/- Internal documentation/docs/dev/- Developer guidesREADME.md- Project overviewCONTRIBUTING.md- Contribution guidelines
- Primary CI system
- Defined in
.gitlab-ci.ymland.gitlab/directory - Runs tests, builds, and deployments
- Secondary CI for specific workflows
- Tests about the pull-request settings or repository configuration
- Release automation workflows
PRs should follow .github/PULL_REQUEST_TEMPLATE.md and the guidelines in
docs/public/guidelines/ (contributing, coding style, components, etc.). When
a PR changes behavior, configuration options, or APIs, update the corresponding
documentation in the same PR — not as a follow-up.
Code reviewer plugins for Go and Python are available from the Datadog Claude Marketplace:
/go-review,/go-improve- Go code review and iterative improvement/py-review,/py-improve- Python code review and iterative improvement
See the marketplace README for installation instructions.
- Never commit API keys or secrets
- Use secret backend for credentials
The project uses Go modules with multiple sub-modules. TODO: Describe specific strategies for managing modules, including any invoke tasks.
- Linux: Full support (amd64, arm64)
- Windows: Full support (Server 2016+, Windows 10+)
- macOS: Supported
- AIX: No support in this codebase
- Container: Docker, Kubernetes, ECS, containerd, and more
- Always run linters before committing:
dda inv linter.go - Always test your changes:
dda inv test --targets=<your_package> - Follow Go conventions: Use gofmt, follow project structure
- Update documentation: Keep docs in sync with code changes
- Check for security implications: Review security-sensitive changes carefully
- Missing tools: Run
dda inv install-tools - CMake errors: Remove
dda inv rtloader.clean
- Flaky tests: Check
flakes.yamlfor known issues - Coverage issues: Use
--coverageflag
The following are areas of particular concern for this codebase. They highlight project-specific risks that have led to production bugs in the Datadog Agent.
The E2E framework (test/new-e2e/) uses fakeintake, a mock Datadog intake
that captures metrics, logs, traces, and check runs. When a change affects
user-visible behavior (new metrics, changed log output, modified payloads),
check whether an E2E test asserts the expected data arrives in fakeintake. Unit
tests alone are not sufficient for validating the agent's end-to-end data
pipeline.
Most E2E tests only run on main, release branches (N.N.x), and RC tags —
not on PR branches. This means some classes of bugs cannot be caught before
merge. Be extra careful reviewing:
- Packaging or installation changes (MSI, deb, rpm, BUILD.bazel)
- Agent startup/shutdown sequences
- Cross-component communication (e.g. system-probe ↔ agent)
These changes are likely to need qa/rc-required.
The agent ships on Linux, Windows, and macOS. Platform-specific code paths (via
runtime.GOOS, build tags, OS-specific file paths) are a frequent source of
bugs — typically the "other" platform is untested. The same applies to
packaging: Windows MSI and Linux deb/rpm have independent logic that can
silently diverge.
The agent runs many concurrent goroutines with explicit Start()/Stop()
lifecycles. The most common bugs are send-on-closed-channel during shutdown and
goroutine leaks. Changes that introduce goroutines or modify component lifecycle
should have tests exercising startup and graceful shutdown.
Components initialize in stages — some dependencies may not be ready when others start. Functions exposed to UIs or APIs should return safe defaults when a dependency is unavailable, not propagate errors or panic.
If a PR changes behavior but doesn't update the corresponding docs, comments, or doc strings, flag it. Stale docs lead to bugs: contributors build on incorrect assumptions.
AI agents read AGENTS.md, CLAUDE.md, and skill files to understand the
codebase. These files must stay accurate — stale guidance causes recurring
mistakes across sessions.
AGENTS.md ← repo-wide: architecture, workflow, review guidelines
├── bazel/AGENTS.md ← Bazel build system: conventions, pitfalls, rule writing
├── test/e2e-framework/AGENTS.md ← E2E framework: environments, provisioners, agentparams
├── test/fakeintake/AGENTS.md ← fakeintake: endpoints, client API, extension guide
├── pkg/.../AGENTS.md ← package-level: structure, patterns, pitfalls
└── .claude/skills/*/SKILL.md ← task-specific: step-by-step procedures
Each level inherits context from its parent via CLAUDE.md (@../../CLAUDE.md
→ @AGENTS.md). Keep information at the right level — don't duplicate
repo-wide rules in sub-project files.
| File | Update when |
|---|---|
AGENTS.md (root) |
Architecture, workflow, build commands, or review guidelines change |
Sub-project AGENTS.md |
APIs, conventions, or extension patterns in that sub-project change |
.claude/skills/*/SKILL.md |
A skill's steps, examples, or recommendations become outdated |
Keep rules generalizable. A good guideline covers a class of bugs, not a single incident. Think bias/variance: too specific and it only catches one bug; too generic and it's noise.
AI agents: when working on any task (reviewing, writing code, running
tests), if you notice a gap or inaccuracy in an AGENTS.md or skill file, fix
it — either in the same PR or as a follow-up. Small, incremental improvements
are preferred over large rewrites. This creates a feedback loop where every
session leaves the context more accurate for the next one.