194 lines (116 loc) · 4.85 KB

Agent Readiness Self-Assessment Worksheet

Use this worksheet before adding new features to your agent.
It takes 5–10 minutes. The goal is to find your weakest link — not to score well.

How to Use This

Read each dimension.
Circle or mark the description that best fits your system right now.
Be honest. "Partial" is a real answer.
Find the dimension where you marked Low — that's where to focus next.

If you have more than one Low, fix the one that would hurt most in production.

The Dimensions

1. Mental Model

Do you treat the agent as a smart prompt, or as a system that will evolve?


🔴 Low	One large prompt doing everything. Behavior changes are confusing or surprising.
🟡 Partial	Some separation of concerns, but it's informal or inconsistent.
🟢 Ready	Clear separation between reasoning, tools, and state. Change is expected and planned for.

My rating: _______________

Notes / what's missing:

2. Replaceability

Can you swap parts of the agent faster than you can debug them?


🔴 Low	Model, prompt, and logic are tightly coupled. Touching "working" parts feels risky.
🟡 Partial	Some parts are swappable, but others are deeply entangled.
🟢 Ready	Prompts and models are treated as swappable. Replacement is cheaper than repair.

My rating: _______________

Notes / what's missing:

3. Failure Legibility

When the agent fails, do you know why?


🔴 Low	Failures are silent. Explanations are "it just didn't work."
🟡 Partial	Some failures are visible, but others disappear without a trace.
🟢 Ready	Failures leave artifacts (logs, traces, outputs). You can tell whether the model failed or the environment did.

My rating: _______________

Notes / what's missing:

4. Observability

Can you see what the agent actually did, step by step?


🔴 Low	Only final outputs are visible. Debugging relies on intuition and guesswork.
🟡 Partial	Some steps are logged, but intermediate decisions are opaque.
🟢 Ready	Intermediate decisions and tool calls are visible. You can compare runs over time.

My rating: _______________

Notes / what's missing:

5. Tool Boundaries

What happens when a tool changes or misbehaves?


🔴 Low	Assumption that tools always work. Agent confidently reports success even when tools fail.
🟡 Partial	Some tool failures are caught, but not consistently handled.
🟢 Ready	Tool failures are explicitly handled. "Agent logic failed" is distinguishable from "environment failed."

My rating: _______________

Notes / what's missing:

6. Cost Awareness

Do you know what this agent costs to run — and why?


🔴 Low	Costs are only noticed after surprise bills or problems. No limits in place.
🟡 Partial	Rough sense of cost, but no explicit limits or per-behavior tracking.
🟢 Ready	Explicit cost limits exist. You know which behaviors are expensive and why.

My rating: _______________

Notes / what's missing:

7. Drift Awareness

Do you expect behavior to change over time?


🔴 Low	"It used to work" is a common explanation. Changes are patched reactively.
🟡 Partial	Drift is acknowledged, but there's no systematic way to detect or explain it.
🟢 Ready	Behavior change is expected. You can detect drift and explain what changed.

My rating: _______________

Notes / what's missing:

8. Human-in-the-Loop

Where do humans step in — and why?


🔴 Low	Full autonomy by default. Humans only get involved when things break badly.
🟡 Partial	Some handoff points exist, but they're informal or inconsistent.
🟢 Ready	Clear handoff points are defined. Humans act as a stabilizing mechanism, not a last resort.

My rating: _______________

Notes / what's missing:

Summary

Fill this in after rating all dimensions.

Dimension	Rating
1. Mental Model
2. Replaceability
3. Failure Legibility
4. Observability
5. Tool Boundaries
6. Cost Awareness
7. Drift Awareness
8. Human-in-the-Loop

My lowest-readiness dimension: _______________

What I'll address before the next feature: _______________

A Note on Scoring

There's no passing score.
There's no certification.

A single Low on the right dimension — especially Failure Legibility or Observability — will cause more pain than five Partials elsewhere.

Fix your weakest link first. Then reassess.

Part of the Agent Readiness Rubric.