Agentization cookbook

This cookbook shows a practical, tool-agnostic way to prepare a software repository for development by AI coding agents.

Agents are not deterministic in their nature – They may guess missing requirements, widen scope, retry failures forever, or “fix” something without proving it. The goal of agentization is to remove uncertainty from the workflow so outcomes become predictable.

This guide does that by making the process deterministic at every stage:

Deterministic inputs: Tasks are normalized into a standard Task Brief with scope, acceptance criteria, constraints, and verification.
Deterministic execution: The repo has canonical commands and a single verification entrypoint.
Deterministic boundaries: Agents have explicit permissions, stop conditions, and protected areas.
Deterministic evidence: Every run produces durable artifacts (status log, PR evidence, escalation notes) so work can be reviewed, resumed, or reverted.

The result is an agent workflow that behaves like a disciplined developer – small changes, clear proof, and safe escalation when uncertain.

0. Preconditions

Agentic workflows amplify whatever is already true about your repo:

If builds or tests are flaky or environment-dependent, agents will “fix” symptoms, churn diffs, and burn time in loops that never converge.
If the repo isn’t runnable from a clean checkout, agents will invent missing steps, misdiagnose failures, and you’ll get commits that only work on the agent’s machine/CI.
If verification isn’t explicit, agents will ship unverified code (because they can’t feel uncertainty, they need gates).

How to make a repo “agent-executable”

You’re done with Preconditions when a stranger (or agent) can:

Clone the repo.
Run one command to verify.
Get deterministic results.
And if it fails, the failure is actionable.

1. Define a single canonical “verify” entrypoint

Your repo must have one command that represents “is this change safe to merge?”

Create one of these (or similar) based on your stack:
- ./scripts/verify.sh (recommended, cross-language)
- make verify
- task verify (Taskfile)
- npm run verify/pnpm verify
verify MUST include (as applicable):
- format (or format-check)
- lint
- typecheck
- unit tests
- minimal integration/smoke test (golden path)
Try applying the rule: CI runs exactly the same verify entrypoint (no secret extra steps).

Example (Node/TS repo)

# scripts/verify.sh
set -euo pipefail
pnpm -v
pnpm install --frozen-lockfile
pnpm lint
pnpm typecheck
pnpm test
pnpm build

Example (Python)

# scripts/verify.sh
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
ruff check .
mypy .
pytest -q

(Optional) Make verify output machine-readable

If you want agents (or tools) to route failures correctly, make verify print a small JSON summary at the end. In this case, an agent can read the result without parsing text logs.

Rules:

Keep human logs as usual.
Print JSON as the last line (or to a separate file like artifacts/verify.json).
CI should keep the JSON as an artifact when verify fails.

Minimum shape:

{
  "status": "pass|fail",
  "steps": [
    { "name": "lint", "status": "pass|fail" },
    { "name": "typecheck", "status": "pass|fail" },
    { "name": "tests", "status": "pass|fail" }
  ]
}

2. Make the repo runnable from a clean checkout

There should be no tribal knowledge in your project.

Make sure that the following works from a fresh clone on a clean machine (or a container):
1. Install dependencies.
2. Run the verify script.
3. (Optional) Run the app.
Requirements:
- No manual steps like export FOO=bar unless documented and checked
- No “it works if you already have X running locally” unless you provide docker compose up
(Optional) Apply hardening moves:

Add .env.example.
Add scripts/bootstrap.* if setup is complex.
Prefer containerized dev environments for multi-service systems.

3. Pin the environment (versions + dependencies)

The goal: Same inputs always lead to same outputs.

Implement the required minimum in your project:
- Runtime version pinned (e.g., .tool-versions, .nvmrc, engines in package.json, python-version, go.mod).
- Dependency lockfile enforced (package-lock.json, pnpm-lock.yaml, poetry.lock, Cargo.lock, etc.).
- CI uses the same runtime version.
(Optional) Use containerized dev environments whose configuration is strictly set by Dockerfile, devcontainer.json, etc. This allows to avoid the “works on my machine” drift.

4. Get rid of flaky tests

Kill or quarantine flakiness before you scale agents. Tests must be a trustworthy signal.

What causes flakiness most often:

Timing/sleeps vs proper waits.
Shared mutable state (DB, filesystem, globals).
Network dependencies.
Order dependence.
Clock/timezone randomness.

When facing flaky tests:

Do not normalize retries as a fix. It hides instability and erodes trust. Process you want:
1. Identify flaky tests (re-run the same commit N times, or use CI flake detection).
2. Assign ownership (someone must fix or quarantine).
3. Remove nondeterminism (time, network, async waits, shared state).
4. Mark tests as flaky with an issue link + TTL if you must keep shipping.

Example policy

If a test flakes twice in 24h, quarantine it with a ticket + owner + “fix-by” date.
Quarantined tests do not gate merges, but run in a separate job and are tracked.

5. Make CI a source of truth

A CI server must be a source of truth (cheap enough to run often), so that agents and humans can verify quickly and consistently.

Implement minimum CI requirements:
- Triggers on PR + main
- Runs verify.sh
- Caches dependencies
- Surfaces artifacts/logs clearly
Make sure your CI build runs fast enough. E.g., if CI takes 45 minutes, agents will either avoid running it or generate speculative changes that fail late.

6. Implement deterministic rollback

Your project configuration must allow rollback, so that any agent change is reversible.

Implement minimum requirements:
- Merge strategy defined (squash or merge commits).
- Revert procedure documented (git revert <merge_commit> + follow-up).
- Migrations (DB and schema) must include downgrade/compatibility notes.

7. (Optional) Provide a standard container dev environment

If possible, make your project runnable in a container (locally or remotely). This provides a number of benefits:

Isolated and reproducible environment
Easy agent parallelization (one env per task)
Resource isolation (CPU/RAM/disk)
Fewer “works on my machine” issues

Possible options (pick one):

Dev Container (.devcontainer/)
Docker dev env (Dockerfile + docker compose)
Nix (flake.nix)

Requirements:

Environment is defined in repo (config-as-code).
Uses the same runtime versions as CI.
No interactive setup.
verify runs inside the environment.

Ephemeral environments (“ghost env” / CDE)

Use the same container setup to create an environment per task/branch, for example, in a cloud dev environment (CDE).

1. Governance

Governance only works if rules are hard to ignore. If you only write rules in a document, they’ll be violated by agents and by humans, because nothing physically prevents it. So you don’t start by “telling agents to behave.” You start by installing guardrails.

How to implement governance

A minimal governance package:

AGENTS.md policy file in the repo – Explains the workflow to humans and agents.
Branch protection – Forces “PR + review + passing checks” before merge.
CODEOWNERS – Ensures the right people are notified when sensitive areas change.
Pull Request (PR) template – Forces every PR to include “what changed / commands run / rollback”.

Once the baseline works, you can add more controls:

Stop conditions + escalation rules – Define when the agent must stop and ask a human (e.g., touching auth, adding dependencies, flaky tests).
Agent permissions/tool allowlist – Prevent “oops I ran a destructive command” and keep agent work reversible by default.
Protected areas policy – “These folders require stricter review / extra approvals / explicit sign-off.”
Audit hygiene – Commit signing/traceability where your org requires it.

1. Create AGENTS.md

Agents need a single place to read “how work is done here”: what’s allowed, what “done” means, and when to stop.

This step intentionally uses a starter role model (Planner / Implementer / Reviewer / QA). Don’t overthink it yet. It’s not “the only correct set of roles”; it’s the smallest set that gives you separation of responsibilities:

Planner – someone who defines the scope
Implementer – someone who changes code
Reviewer – someone who checks the diff
QA – someone who verifies and breaks it

In a single-agent workflow, one agent can play multiple roles sequentially. In a multi-agent workflow, these roles are often separate agents, allowing parallelization and more independent reviews. We’ll explain role design and how to customize it in the Multi-agent workflow chapter later.

How to

Create AGENTS.md in repo root with this starter content (copy/paste):
# AGENTS.md ## Purpose This repository uses an agent-assisted workflow. Agents may propose changes, but merges follow enforced gates. ## Workflow (high level) 1) Planner ensures a Task Brief is implementation-ready (creates/normalizes; decomposes if needed) 2) Implementer makes a scoped change on a branch and opens a PR 3) Reviewer checks scope + policy compliance 4) QA verifies (commands + tests) 5) Human (maintainer) merges after gates pass ## Roles (responsibilities, not artifacts) ### Planner Goal: make the task implementation-ready before coding starts. Responsibilities: - Create or normalize the Task Brief. - Decompose if the task is too broad. - Escalate when requirements are unclear. Restrictions: - Does NOT edit production code ### Implementer Responsibilities: - Implement only what is in the approved Task Brief. - Keep diffs small and scoped. - Add/update tests for behavior changes. ### Reviewer Responsibilities: - Block scope creep and policy violations. - Check maintainability and boundary discipline. ### QA Responsibilities: - Prove the change works by running verification. - Add missing tests or request them explicitly. ## Definition of Done (PR-ready criteria) - Change matches Task Brief scope (no unrelated edits) - Verification passes (CI + required checks) - Tests added/updated for behavior changes - PR includes risk note + rollback plan
(Claude Code only) Claude Code reads project instructions from CLAUDE.md. This file is tool-specific. AGENTS.md stays the source of truth.
Create CLAUDE.md in repo root with:
# CLAUDE.md - Governance rules: `AGENTS.md`

You now have one place that tells agents and humans “how we work here,” and it’s written in enforceable terms (roles, DoD, stop conditions).

2. Configure branch protection

A written workflow is easy to ignore. Branch protection makes it physically hard to merge unsafe changes.

This step enforces three things on your default branch (usually main):

Merges happen via PR.
PRs need approval.
PRs can’t merge unless required checks pass.

How to

For example, this is how you can configure branch protection in GitHub:

Open Branch protection settings: Repo → Settings → Branches → Add branch protection rule.
Set Branch name pattern to main (or whatever your default branch is).
Turn on the minimum required toggles:
- Require a pull request before merging
- Require approvals
- Dismiss stale approvals when new commits are pushed (recommended)
- Require status checks to pass before merging
  - Select your check(s): typically your verify/CI job.
- Restrict who can push to matching branches (so nobody can bypass PRs)
- If you’re using CODEOWNERS and want owners to be mandatory reviewers, also enable Require review from Code Owners
GitHub only lets you select checks that have run successfully recently in the repo (the docs mention a 7-day window). If you can’t find your verify check, trigger the workflow once, then come back.

After this step, agents (and humans) can’t “accidentally” merge unreviewed or failing code because the platform blocks it.

3. Add CODEOWNERS

Agentic development dies when risky changes don’t reach the right people. CODEOWNERS is the simplest way to automatically route reviews. For example:

Change touches /auth → security people get requested
Change touches /infra → platform people get requested
Change touches /db/migrations → backend owners get requested

GitHub supports this natively: a CODEOWNERS file assigns owners to paths, and GitHub uses it to request reviews.

How to

Create the file in one of the supported locations (pick one location, don’t create multiple):
- .github/CODEOWNERS (most common)
- CODEOWNERS (repo root)
- docs/CODEOWNERS (less common)
Start with a small, coarse file, then iterate. Start with:
- Default owners for *
- A few “sensitive zones” (/auth, /infra, /migrations, /ci)
For example:
# Default owners for everything * @your-org/core # Sensitive areas /auth/ @your-org/security /infra/ @your-org/platform /.github/workflows/ @your-org/platform /db/migrations/ @your-org/backend # Frontend / Backend (if you split) /frontend/ @your-org/frontend /backend/ @your-org/backend # Docs /docs/ @your-org/docs
Use either:
- users: @username
- teams: @org/team
GitHub supports listing multiple owners per pattern.
Learn the two rules that matter:
1. Last match wins – if multiple patterns match, the last matching rule takes precedence.
2. Multiple owners must be on the same line – If you want two owners for one path, put them on the same line. If you split them across lines, only the last one applies.
# Correct: both owners apply /auth/ @your-org/security @your-org/platform
Decide if “owner review” is required or just requested. CODEOWNERS always helps route review requests, but whether approval is mandatory depends on branch protection settings. GitHub’s branch protection rule has an option “ Require review from Code Owners”. When enabled, approval from a code owner is required for affected files. Also note a subtle behavior: if you don’t require code owner review, GitHub may convert a team request into individual requests based on org/team review settings.

Example (what “good enough” looks like in practice)

For most repos, “good enough” on day 1 is:

* default owner (so every PR has an owner)
protected owners for auth, security, infra, etc.

4. Define stop conditions and escalation rules

Agents are good at pushing forward. They’re bad at knowing when to stop. Without explicit stop conditions, an agent will keep trying: widening the scope, retrying failures, or touching risky areas. This leads to churn (infinite loops) and unsafe changes.

How to

Add a “Stop conditions” section to AGENTS.md. Those numbers (500 LOC / 10 files) are defaults. Adjust later, but pick something. Without size limits, “small PRs” is just a wish.
## Stop conditions (agents must escalate to a human) Stop immediately and ask for human input if any of the following is true: ### Scope / requirements - The task brief is missing acceptance criteria or expected behavior is unclear - The change requires modifying files not listed in the Task Brief - You discover a second feature request "while you're here" ### Risky areas (default protected) - Any change under: /auth, /infra, /.github/workflows, /db/migrations - Anything involving secrets, credentials, tokens, encryption, permissions - Any external network/security-sensitive behavior ### Dependencies & tooling - Adding or upgrading dependencies (runtime/build/dev) not explicitly requested - Changing build tooling, linters, formatters, test frameworks ### Verification failures - CI or local verify fails twice and the cause is not obvious - A test appears flaky (same commit passes and fails) - The fix would require weakening or removing tests/lints (disabling rules, skipping tests) ### Size limits - Diff exceeds 500 LOC or touches more than 10 files (unless explicitly approved)
Define what “escalate” means. Add this to AGENTS.md right below the stop conditions:
## Escalation protocol (what to do when you stop) When a stop condition triggers, do NOT keep coding. Instead, produce a short escalation note containing: 1) What you were trying to do (1-2 sentences) 2) What you found (facts, errors, constraints) 3) Options (2-3 paths forward) 4) Recommended next step 5) What you need from a human (a decision, credentials, clarification, approval) Save the note as: `docs/agent/escalations/<date>-<topic>.md` Work resumes after a human records the decision (in the task brief or the escalation note) and the agent is re-run.
The escalation note in the repo is the minimum required mechanism (durable + auditable). How humans get notified depends on your setup: it can be as simple as a agent chat message, a Slack notification, etc. We cover notification/automation options in the next chapters.
For multi-agent workflows, define coordination rules. This avoids the classic multi-agent failure: two agents make different assumptions and diverge silently. For example:
## Coordination rule (multi-agent) Agents may not coordinate through chat history. Shared state must be written to the repo. If a decision matters, write it into the Task Brief, PR, or escalation note (not only in chat). Minimum shared artifacts: - Task Brief: `docs/agent/tasks/<task-id>.md` - Status log: `docs/agent/tasks/<task-id>.status.md` (what's done / next / blockers) - Escalations: `docs/agent/escalations/...`

Example (how escalation may look)

Say the implementer hits a stop condition: “need new dependency.”

Escalation note:

Goal: add feature X
Blocker: requires library Y; currently, no dependency policy exists
Options: (a) add Y, (b) implement minimal subset without Y, (c) use existing lib Z
Recommendation: add Y because it reduces code and is actively maintained
Need from human: approve adding dependency Y; confirm license is acceptable

5. Set agent permissions

Keep it simple: least privilege by default. This prevents the agents from the following failures:

Running a destructive shell command
Pushing/merging without you noticing
Reading secrets or exporting them into logs
and others

Claude Code supports permission rules for tools (allow/ask/deny) and provides /permissions UI plus project-level configuration (.claude/settings.json) that you can check into the repo.

How to

Decide on your baseline policy. Use this default unless you have a reason not to:
- Allow - Safe, local, reversible actions (editing files).
- Ask - Anything that can publish, destroy, or alter environments (shell commands, pushes, running containers).
- Deny - Obviously dangerous or out-of-scope actions (deleting large paths, reading secrets directories).
Claude Code’s permission model is explicitly built around this: allow, ask, and deny rules.
Put the rules in the repo (so the team shares them). Create a project-level config file .claude/settings.json. You can also manage these interactively via the /permissions command in Claude Code.
Start with a minimal, conservative configuration. Below is a starter configuration that biases toward safety:
{ "permissions": { "allow": [ "Edit" ], "ask": [ "Bash(*)" ], "deny": [ "Bash(rm -rf:*)", "Bash(sudo:*)", "Bash(curl:*)", "Bash(wget:*)" ] } }
Notes:
- Keep shell commands on “ask” until you trust the setup.
- Don’t allow git push/deploy commands by default.
Gradually allow safe commands (only after you see good behavior). Once things are stable, you can move a few safe commands from ask to allow, for example:
- Bash(git status:\*)
- Bash(git diff:\*)
- Bash(git commit:\*) (only if you’re comfortable)
Anthropic explicitly gives examples of adding Edit and Bash(git commit:\*) to the allowlist. Do this gradually. One command family at a time.

Example

What this looks like in practice:

Day 1: The agent can edit files without prompts, but every shell command prompts (ask).
Day 3: You allowlist a couple of read-only git commands.
Later: You still keep git push, container runs, network fetches, and deployment commands on ask or deny.

This keeps the workflow usable while preventing irreversible accidents.

2. Project agentization

Governance defines rules (“what is allowed”). Project agentization makes the repo executable for agents (“how to work here without guessing”).

An agent fails when it has to guess the basics:

Verification - How to run tests and verification.
Project context - Where modules live and what boundaries exist.
Output - What outputs it must produce (status, PR evidence).
Workflow - How it coordinates with other roles (handoffs).

In theory, you can paste this context into every prompt. In practice, it does not scale. Project agentization stores these facts and templates in the repo, so every agent run starts from the same source of truth.

Project agentization consists of two parts:

Working model (AGENTS.md + templates)
- Role model (Planner / Implementer / Reviewer / QA).
- Gates (DoR / DoD), stop conditions, escalation.
- Required outputs (task status + PR evidence).
- Handoffs (shared artifacts, no hidden state).
- Task and PR templates (e.g., in docs/agent/templates/*referenced from AGENTS.md).
Context pack (repo knowledge)
A small set of files under docs/agent/ that describes how to work in this repo:
- Canonical commands (install/test/build/verify)
- Module map and boundaries
- Repo-specific conventions and constraints

This chapter is tool-agnostic by default. As an optional section, we show how to mirror the same info into Claude Code’s expected file (CLAUDE.md) if you use it.

Minimal package

A minimal package may look like this:

.
├── AGENTS.md # working model: roles, gates, stop conditions, escalation, required outputs
├── scripts/
│   └── verify.sh # single verification entrypoint (run locally/CI)
└── docs/
    └── agent/
        ├── CONTEXT.md # repo facts for agents
        ├── commands.md # canonical commands (install/test/build/verify)
        ├── architecture.md # module map + boundaries (what touches what)
        └── templates/
            ├── task-status.md # template for docs/agent/tasks/<task-id>.status.md
            └── pr-evidence.md # template text to paste into PR description

1. Define agent roles

Roles are a control mechanism. They separate responsibilities so the same agent does not plan, implement, and approve its own work without checks.

A role is not a “person”. A role is a responsibility boundary. One agent can play multiple roles sequentially, or multiple agents can play different roles.

Default role set

This guide assumes you have already adopted the default roles in AGENTS.md:

Planner: turns an incoming task into an implementation-ready Task Brief (and decomposes it if needed).
Implementer: changes code on a branch/PR according to the Task Brief.
Reviewer: checks the PR against scope and repo rules.
QA: proves the change works (verification + tests).

Keep this set unless you have a specific failure mode you want to prevent. It is the smallest set that provides separation of concerns.

Customizing roles

You can customize roles in two ways:

Change who performs the role (human vs agent) without changing the role model.
Add new roles only when they enforce a real gate or produce a required output.

Rule for adding a role: If the role does not block a merge and does not produce an artifact, do not add it.

Possible additional roles:

Tech Writer – Docs gate. Add when changes are user-facing, and documentation is part of “done”. This prevents the common failure: “code shipped, docs later”.
This is how it may look in AGENTS.md:
### Tech Writer (optional) Use when: - The Task Brief includes doc changes in scope, or the change is user-facing. Responsibility: - Ensure required docs/examples are updated and correct. Merge gate: - If docs are required by the Task Brief, the PR cannot be approved until docs changes are present and reviewed.
Security – Risk gate. Add when tasks touch auth/permissions/crypto/secrets.
Release – Rollout gate. Add when merges affect deployment, migrations, feature flags, or backwards compatibility constraints.

2. Define the outputs of an agent run

After an agent run, a human must be able to answer in 30 seconds: what was changed, whether it was verified, and what remains.

Every agent run must produce:

A branch or PR with the code change.
Verification evidence (exact commands run + result).
A short risk note and rollback plan.
An updated task status in the repo (so the work can be restarted).

How to

Create a shared template for task status updates. For example, at docs/agent/templates/task-status.md. Add the following content to it:
# <task-id> status ## Current State: planning / implementing / verifying / reviewing / blocked / ready-for-pr Next: <one line> ## Iteration log ### <YYYY-MM-DD HH:MM> - QA Commands: - `scripts/verify.sh` Result: PASS/FAIL Notes: ... ### <YYYY-MM-DD HH:MM> - Review Verdict: PASS/FAIL Findings: - ... Required changes (if FAIL): - ...
Create a shared template for PR evidence. For example, at docs/agent/templates/pr-evidence.md:
Task Brief: - `docs/agent/tasks/<task-id>.md` Summary: - ... Verification (commands run): - `scripts/verify.sh` - <other commands> Result: PASS/FAIL Risk: low / medium / high Rollback: - <1-2 lines>
Add a rule to AGENTS.md that points to these templates:
## Execution outputs (required) After working on a task, the Implementer must leave durable evidence: 1) A branch or PR with the code change. 2) A task status file: - Create/update: `docs/agent/tasks/<task-id>.status.md` - Use template: `docs/agent/templates/task-status.md` 3) PR evidence: - Add the PR evidence section to the PR description. - Use template: `docs/agent/templates/pr-evidence.md`
(GitHub only, optional) Add PR evidence to the PR template.
If you use GitHub, copy the content of docs/agent/templates/pr-evidence.md into .github/pull_request_template.md.

Example

After the agent finishes work on YT-3812, you should have:

docs/agent/tasks/YT-3812.status.md created from docs/agent/templates/task-status.md
PR description includes the section from docs/agent/templates/pr-evidence.md

3. Define role artifacts and handoffs

In a multi-agent loop, agents must hand work to each other without relying on chat history. The simplest way is to define where shared state lives (files + PR) and what each role is allowed to update.

This prevents two common failures:

Agents diverge because they assumed different facts.
Work cannot be continued because progress only exists in chat.

How to

Create a handoff rules section in AGENTS.md:
## Handoffs (shared artifacts) Agents coordinate through repo artifacts (files/PR), not chat history. Shared artifacts: - Task Brief: `docs/agent/tasks/<task-id>.md` - Task status: `docs/agent/tasks/<task-id>.status.md` - PR description: must include PR evidence - Escalations: `docs/agent/escalations/<date>-<topic>.md` Rule: If a decision matters, record it in the Task Brief, PR, or escalation note (not only in chat).
(Optional) Clarify who updates what. Keep it short.
If you want explicit ownership, add this under the section above:
Ownership: - Planner updates: Task Brief (+ subtasks/child briefs) - Implementer updates: code, task status, PR evidence - Reviewer/QA update: PR review/comments (and tests if needed)

Example

Planner writes/updates docs/agent/tasks/YT-3812.md.
Implementer updates docs/agent/tasks/YT-3812.status.md and fills PR evidence in the PR description.
Reviewer/QA leave review notes in the PR and request fixes if needed.

4. Create the context pack

Agents waste most time and tokens on basic repo questions: “Which command is the real one?”, “Where is the backend vs the frontend?”, and so on. You can fix this with a “context pack”.

How to

Create the following folder structure:
docs/agent/ CONTEXT.md commands.md architecture.md
Add the following content to CONTEXT.md:
# Agent context (tool-agnostic) ## What this repo is <1-3 sentences> ## Golden rules (repo-specific) - <rule 1> - <rule 2> ## Where things are - `<dir>`: <what it contains> - `<dir>`: <what it contains> ## What is risky / protected - `/auth` - `/infra` - `/db/migrations` (Adjust to your repo) ## Where to look first - Commands: `docs/agent/commands.md` - Architecture/boundaries: `docs/agent/architecture.md` - Governance rules: `AGENTS.md`
Add the following content to commands.md:
# Commands (source of truth) ## Install - `<command>` ## Verify (must pass before PR) - `scripts/verify.sh` (or `<command>`) ## Test - `<command>` ## Lint - `<command>` ## Typecheck (if applicable) - `<command>` ## Build - `<command>` ## Run locally - `<command>`
Add the following content to architecture.md:
# Architecture & boundaries (minimal map) ## System shape (in plain words) - <Example: The frontend calls the backend API. The backend reads/writes the database> - <Example: A background worker processes queued jobs> ## Main modules - `/frontend`: <what it does> - `/backend`: <what it does> - `/db`: <what it does> - `/infra`: <what it does> (protected) ## Dependencies (what calls what) - `<module A>` depends on `<module B>` because <reason> - `<module C>` must not depend on `<module D>` because <reason> ## Protected paths (high risk) - `/auth` -- <why protected> - `/infra` -- <why protected> - `/db/migrations` -- <why protected>
Note: Static architecture maps rot.
Keep architecture.md minimal. If you need a repo map, prefer generated or tool-based context (script/CI or MCP), so it stays up to date.

5. (Claude Code only) Update CLAUDE.md to point to the context pack

If you use Claude Code, you already have CLAUDE.md from the Governance chapter. Now update it to point to the context pack as well.

Open CLAUDE.md and make sure it lists both governance rules and context pack files:

Read these files before making changes:
- Governance rules: `AGENTS.md`
- Repo context (entry point): `docs/agent/CONTEXT.md`
- Commands: `docs/agent/commands.md`
- Architecture/boundaries: `docs/agent/architecture.md`
  
Rules:
- Run `scripts/verify.sh` before opening a PR.
- If a stop condition triggers, follow `AGENTS.md` and write an escalation note.

6. (Optional) Add agent skills

Agent Skills are a standard way to store reusable instructions in the repo as skills/<skill-name>/SKILL.md. The open specification is available at https://agentskills.io/specification.

How skills are different from the context pack

The context pack (docs/agent/*) is repo facts that apply to almost every task: commands, module boundaries, conventions.
Skills are how-to playbooks for specific tasks: coding standards, patterns, review checklists, doc style, migration rules. Keeping these as Skills prevents the context pack from turning into a wiki.

How agents use Skills

Some agents can find and load Skills automatically (for example, Claude Code and GitHub Copilot). In that case, you can simply tell the agent: “Use the coding-standards skill”. Other agents do not automatically load Skills. In that case, you point the agent to the file path in the prompt or task brief: “Read skills/coding-standards/SKILL.md and follow it.”

Essentials (the minimum)

Start with:

a skills/ directory
1–3 skills that you repeat often (do not create 20 skills on day one)
each skill contains:
- SKILL.md (required)
- optional references/ or templates/ only if you really need them

The spec defines the minimal structure as “one folder per skill, containing at minimum SKILL.md”, with optional supporting directories like scripts/, references/, assets/. For example, this is how coding standard skills might look (see below).

How to

Create a similar folder structure for required skills:
skills/ coding-standards/ SKILL.md examples.md
Add content to SKILL.md. For example:
--- name: Coding standards description: Repo-specific coding rules and examples. Follow this for all code changes. --- ## Naming - Classes: `PascalCase` - Functions/methods: `camelCase` - Constants: `UPPER_SNAKE_CASE` - Files: `<your rule here>` ## Formatting - Use the formatter: `<command>` (do not hand-format) - Max line length: <number> - Imports: <sorted/how> ## Patterns (do / don't) ### Object creation Do: - Prefer factories for complex objects: `Foo.create(...)` Don't: - Don't call constructors with 6+ params directly ### Errors Do: - Use typed errors / error codes: `<example>` Don't: - Don't throw raw strings ## Examples See: - `skills/coding-standards/examples.md`
Add content to examples.md. For example:
# Coding standards: examples ## Naming ### Good ~~~ts const createdAtMs = Date.now(); function isAdult(user: User): boolean { return user.age >= 18; } ~~~ ### Bad ~~~ts const d = Date.now(); function f(u: any) { return u.a >= 18; } ~~~ ## Object creation ### Good (factory + named params) ~~~ts const user = User.create({ id, email, roles: ["admin"], }); ~~~ ### Bad (long constructor argument list) ~~~ts const user = new User(id, email, true, false, "admin", Date.now()); ~~~ ## Errors ### Good (typed error with context) ~~~ts throw new ValidationError("Email is invalid", { email }); ~~~ ### Bad (throw raw string) ~~~ts throw "invalid email"; ~~~ ## Functions ### Good (small function, single responsibility) ~~~ts function formatUserLabel(user: User): string { return `${user.firstName} ${user.lastName}`.trim(); } ~~~ ### Bad (does too much, unclear intent) ~~~ts function doStuff(u: any) { const s = u.f + " " + u.l; if (!u.e) throw "no email"; return s; } ~~~

7. (Optional) Add an MCP interface (tools, not docs)

If you can, expose repo actions as MCP tools. This removes guessing and keeps one source of truth.

Goal: Agents do not figure out how to run tests. They call run_tests(...). They do not guess how to run verification. They call verify().

How to

Add an MCP spec file:
- docs/agent/mcp.md (tool list + parameters + output format)
Add an MCP server implementation in the repo (code + start command). Requirement: the repo must provide one canonical way to start it:
- ./scripts/mcp-start.sh (recommended), or
- docker compose up mcp, or
- npm run mcp:start/pnpm mcp:start
Define a minimal tool set (start small):
- verify(mode?, scope?)
  Runs the canonical verify entrypoint (scripts/verify.sh).
- run_tests(target?, pattern?, changed_only?)
  Runs a subset of tests.
Optional (only if useful for your repo):
- build(target?)
- lint(scope?)
- typecheck(scope?)

Rules

Treat tool names + parameters as an API (stable, reviewed).
Tools must call canonical entrypoints (do not re-implement logic).
Tool outputs must be machine-readable (JSON summary + logs).
Keep the file-based context pack (docs/agent/*) as the fallback when MCP is not available.

3. Task normalization

Most incoming tasks are vague. That is bad for humans (unclear scope, unclear “done”) and even worse for agents (they will guess missing details, widen scope, or loop).

In practice, you cannot run an agent directly from a raw tracker ticket and expect consistent results. Before an agent starts, the task must be rewritten into a clear, standard format that removes ambiguity and defines how to verify the change.

How to implement task normalization

To do this, we use two layers:

Issue/task templates: improve the quality of incoming tasks at the source, so you (as a human) have enough information to work with.
Task Brief: a normalized task file in the repo docs/agent/tasks/<task-id>.md, so an agent has enough information to work with.

A Task Brief can be written manually or drafted by a Planner agent and then approved by a human (Definition of Ready). This chapter shows how to set up both layers and the Task Brief format.

1. Define task source

This is an organizational decision. The agent should not care whether the task came from Jira, GitHub, Slack, or email. Implement a simple rule. For example: Agents do not pull tasks from trackers. A human (or an automation) must create a Task Brief file in the repo and start the agent from it.

Pick your “source of truth” for incoming work (Jira/GitHub/etc.), then enforce this contract:

Every task must have docs/agent/tasks/<task-id>.md.
The Task Brief must link to the original ticket.
Agents only start work when given the Task Brief path.

Add this rule to AGENTS.md. For example:
Task intake Agents do not pull tasks from Jira/GitHub/Slack. A human (or automation) creates a Task Brief in the repo: docs/agent/tasks/<task-id>.md (must link to the original ticket) A Task Brief can be written by a human or drafted by the Planner, but it must pass the Definition of Ready before implementation.

2. Create issue/task templates

An issue/task template is a fixed set of questions/fields shown when someone creates a new ticket. It exists to stop low-quality tasks at the source.

This step does not make tasks “agent-ready”. It makes them “workable”: a developer can understand the request and later write a proper Task Brief without guessing. The agent still starts from the Task Brief file in the repo (next step).

Minimum input principle

For most teams, you need two templates:

Bug report that includes:
- Steps to reproduce
- Expected vs actual
- Relevant info: environment, version, logs, etc.
Feature/change request that includes:
- What problem do we solve and for whom
- Acceptance criteria
- Constraints (what we must not change)

How to

Do this in your issue tracking system. For example:

YouTrack: There is no “issue form YAML” in the repo, so you enforce minimum input in YouTrack itself:
- Add the fields you need as custom fields, and mark the critical ones as mandatory (required) for the project.
- If you need “templates”, use Issue Template URLs (pre-filled new issue form) and share/bookmark them.
GitHub Issues: Add Issue Templates / Issue Forms under .github/ISSUE_TEMPLATE/. GitHub supports both Markdown templates and YAML issue forms with required fields.
Jira: Create matching fields in your issue type(s) or use a template app/automation (the exact mechanics depend on your Jira setup).

For example, for GitHub Issues, you can create two files:

.github/ISSUE_TEMPLATE/bug.yml
name: Bug report description: Report a reproducible bug body: - type: textarea id: repro attributes: label: Steps to reproduce description: Exact steps. Include inputs and links if needed. validations: required: true - type: textarea id: expected attributes: label: Expected result validations: required: true - type: textarea id: actual attributes: label: Actual result validations: required: true - type: textarea id: env attributes: label: Environment description: Version, OS, browser, logs, config, etc. validations: required: false
.github/ISSUE_TEMPLATE/feature.yml
name: Feature / change description: Request a feature or a change body: - type: textarea id: problem attributes: label: Problem description: What problem are we solving and for whom? validations: required: true - type: textarea id: acceptance attributes: label: Acceptance criteria description: Bullet list. What must be true for this to be "done"? validations: required: true - type: textarea id: constraints attributes: label: Constraints / non-goals description: What must not change? What is explicitly out of scope? validations: required: false

3. Normalize tasks into Task Briefs

A tracker issue/task (even created from a template) is still not a good input for an agent. It is written for tracking and discussion. It often lacks a strict scope, clear “done” criteria, and a verification plan.

A Task Brief is the agent-ready form of the task. It is stricter than an issue template:

The issue template collects minimum information at intake (enough to understand and triage).
The Task Brief defines the implementation boundaries (scope, acceptance criteria, constraints, and verification).

You can create a Task Brief in two ways:

Manually: a human rewrites the ticket into the Task Brief template.
Agent-assisted: a Planner agent drafts the Task Brief from the ticket and repo context, then a human reviews and approves it (Definition of Ready).

Depending on your choice, follow either 3.1 or 3.2.

3.1 Manual normalization

For every accepted task:

Create a Task Brief file docs/agent/tasks/<task-id>.md.
Link to the original ticket (YouTrack/Jira/GitHub).
Treat this file as the single input for agent work – the standard format your team provides to agents. You can use the following Task Brief template:
# <task-id>: <short title> ## Source - Ticket: **<link>** ## Problem **<What>** is wrong or missing? 1-3 sentences. ## Reproduction / Scenario (must be self-contained) For bugs (repro): 1. ... 2. ... Expected: - ... Actual: - ... For features (scenario): - User: ... - Goal: ... - Flow: ... Environment (if relevant): - Version/build: - OS/browser: - Role/permissions: - Data/setup prerequisites: ## Scope In scope: - ... Out of scope: - ... ## Acceptance criteria - [ ] ... - [ ] ... ## Constraints - Must not change: **<paths>** / modules / APIs - Risky areas: **<auth>**/infra/migrations/etc. - Dependencies: **<allowed>** / needs approval ## Plan (high level) - ... - ... ## Verification Run: - *`scripts/verify.sh`* Add/Update tests: - **<what>** test proves the change ## Rollback - **<how>** to revert safely in 1-2 lines

3.2 Agent-assisted normalization

The Planner produces a Task Brief draft. This can happen in two places:

Outside the dev loop (manual trigger): a human runs the Planner, reviews the draft (DoR), commits the Task Brief, then starts implementation.
Inside the dev loop (recommended for automation): the loop starts with the Planner drafting the Task Brief, then a human approves it (DoR), and only then implementation begins.

Either way, the output is the same file: docs/agent/tasks/<task-id>.md. If the Planner decides the task is too broad, it may also add a ## Subtasks section or produce additional child Task Brief files (parent + children).

Decide how the Planner gets the ticket content:
- If the Planner has tracker access via your tooling, it can fetch the ticket.
- If not, the runner (human or automation) must pass the ticket text to the Planner (copy/paste into the prompt is fine).
Planner action: draft the Task Brief. Give the Planner the ticket content (by access or paste) and require it to output a self-contained Task Brief.
Planner prompt skeleton (works in both access/no-access cases):
You are the Planner. Repo rules: - Read AGENTS.md - Read docs/agent/CONTEXT.md, docs/agent/commands.md, docs/agent/architecture.md Input ticket: **<EITHER:** "Fetch ticket <ID>" OR paste ticket text here> Task: Draft a Task Brief at: docs/agent/tasks/<TASK-ID>.md If the task is too broad (likely >1 PR, exceeds size limits, crosses boundaries, unclear verification, multiple risks): - Include a `## Subtasks` section with 3–8 verifiable subtasks, each with: Output, Target, Do not touch, Verify OR, if multiple PRs are clearly needed: - Propose child Task Brief filenames and include their full Markdown content. Output requirements: - Self-contained - Include scope, acceptance criteria, constraints, verification, rollback - If unclear: add "Open questions"
Note (optional): If possible, write acceptance criteria as executable steps (for example, Gherkin). This helps QA generate test scaffolds and makes “done” unambiguous.
Example:
- Given ...
- When ...
- Then ...
DoR gate (human approval): Review the Task Brief with Definition of Ready.
- If it fails DoR: update the brief (or answer open questions and re-run Planner).
- If it passes: approve and continue to implementation.
Make the Task Brief visible to the rest of the loop
- If you run work through PRs, commit the Task Brief to a branch / PR (often a “brief-only PR”).
- If you run locally, commit to the working branch before implementation.

4. Set Definition of Ready

A Task Brief exists, but it may still be too weak to start work. Definition of Ready (DoR) is a simple gate: the task is not allowed to enter implementation (human or agent) until the Task Brief meets a minimum quality bar.

This prevents the most common failure mode: starting work with missing acceptance criteria or no verification plan, then looping.

How to

Note: Definition of Ready (DoR) and Definition of Done (DoD) are different things. DoR is a pre-work gate for the Task Brief. DoD is a pre-merge gate for the change. DoR requires a verification plan; DoD requires executed verification and resulting evidence.

Add a DoR checklist to AGENTS.md (or to the Task Brief template itself). Use it as a hard rule: if any item is missing, the task goes back for clarification.

## Definition of Ready
- [ ] Task Brief exists: *`docs/agent/tasks/<task-id>.md`*
- [ ] Source ticket is linked
- [ ] Scope is written (in scope + out of scope)
- [ ] Acceptance criteria are clear (testable, not "works better")
- [ ] Constraints are listed (what must not change / risky areas)
- [ ] Verification is defined (*`scripts/verify.sh`* + what tests prove the change)
- [ ] Rollback is defined (how to revert safely)
- [ ] Verification plan includes what will prove the change (tests to add/update, and commands to run)

Example: raw ticket > templated task > Task Brief

Human-written ticket

Title: Export sometimes fails

Hey, exporting a report fails for me. It worked last week, now I get "Export failed".
Not sure what changed. Please fix.

(I'm using Chrome. I clicked Export in the Reports page.)
Ticket created with a template
Title: Export fails with "Export failed" on Reports page

Steps to reproduce
1. Open /reports
2. Select report "Q4 Revenue"
3. Click Export → PDF
4. Wait ~5-10 seconds

Expected result
A PDF file is downloaded.

Actual result
A toast appears: "Export failed".
No file is downloaded.

Frequency
Happens 3/3 times for me.

Environment
- App version: 2.14.3 (build 9812)
- Browser: Chrome 121, macOS
- User role: Analyst
- Network: corporate VPN

Logs / screenshots
- Screenshot attached
- Browser console shows: POST /api/export 500

Task Brief file

# YT-3812: Fix report export failure (500 from /api/export)

## Source
- Ticket: <link to YT-3812>

## Problem
Exporting a report as PDF fails on /reports. The UI shows "Export failed". The browser console shows `POST /api/export` returns HTTP 500.

## Reproduction
Steps:
1. Open `/reports`
2. Select report “Q4 Revenue”
3. Click Export → PDF
4. Wait ~5–10 seconds
Expected:
- A PDF file is downloaded.
Actual:
- UI toast: “Export failed”
- Network: `POST /api/export` returns 500
Environment (if relevant):
- App version: 2.14.3 (build 9812)
- Browser: Chrome 121, macOS
- Role: Analyst

## Scope
In scope:
- Identify and fix the server-side cause of HTTP 500 for PDF export.
- Add/adjust tests to prevent regression.

Out of scope:
- UI redesign of the export flow.
- New export formats.
- Performance optimizations not required to fix the failure.

## Acceptance criteria
- [ ] Export PDF for "Q4 Revenue" succeeds and downloads a file.
- [ ] `/api/export` does not return 500 for this case.
- [ ] A regression test covers the failing case.

## Constraints
- Must not change: authentication flow, user permissions.
- Protected areas: `/db/migrations` (do not touch), `/.github/workflows` (do not touch).
- Dependencies: do not add new dependencies.

## Plan (high level)
- Reproduce locally using the ticket steps.
- Find the exception behind the 500 (server logs).
- Implement minimal fix.
- Add regression test for the failing input/case.

## Verification
Run:
- `scripts/verify.sh`
Add/Update tests:
- Add a test that calls the export path with the same report selection and asserts HTTP 200 + file response.

## Rollback
- Revert the PR commit(s). No schema changes.

4. Task decomposition

Task decomposition means splitting one Task Brief into smaller work items. Decomposition is optional. If a Task Brief is already small enough, implementation can start immediately.

Decomposition can happen at different levels:

Tracker-level (YouTrack/Jira/GitHub Issues) – when you need separate owners or separate prioritization.
Repo-level – when you mainly need smaller, safer implementation steps.

This chapter shows the repo-level approach: how to split a Task Brief into small, verifiable subtasks that humans and agents can follow.

Important: If you use a Planner agent to draft Task Briefs, decomposition can be handled during planning: the Planner decides whether to add subtasks or create child Task Briefs. So, you can ignore this step at all.

1. Decide whether you need decomposition

Start from the Task Brief. Do not decompose by default.

Decompose only if the Task Brief is still too broad. Use these signals:

You expect more than one PR.
The change will likely exceed your size limits (for example, >500 LOC or >10 files).
The change crosses multiple boundaries (for example, frontend and backend) or touches protected areas like migrations/auth/CI.
Verification cannot be expressed as a clear, repeatable check.
The task contains multiple independent risks (e.g., behavior change and auth change).

How to

Read the Task Brief and answer: “Can this be delivered as one small PR with a clear verification step?”

If yes, do not decompose. Start implementation.
If no, decompose before implementation.

2. Choose the decomposition form

If you decided to decompose, choose how you will represent subtasks.

Use one of these two forms:

Subtasks section in a Task Brief – Use this when you still expect a single PR. You keep the existing Task Brief file and add a checklist of subtasks.
Multiple Task Brief files – Use this when you expect multiple PRs, parallel work, or separate review/merge gates. You keep one “parent” Task Brief and create smaller “child” Task Briefs for each deliverable. For example:

docs/agent/tasks/
  YT-3812.md                      (parent: overview + links to children)
  YT-3812-1-repro-test.md          (child: add failing test / repro)
  YT-3812-2-fix.md                 (child: implement minimal fix)
  YT-3812-3-hardening.md           (child: extra coverage / edge cases)

3. Create subtasks

Now that you have chosen the decomposition form, create 3–8 subtasks.

How to

Write subtasks as verifiable deliverables (not “areas”). Good subtask types:
- Add a failing test that reproduces the bug
- Implement the minimal fix to make the test pass
- Add/adjust integration coverage
- Add a feature flag scaffolding, then implement behavior, then enable it
- Add migration in a backward-compatible way
Bad subtask types:
- “Backend work”
- “Frontend work”
- “QA”
For each subtask, include four fields:
- Output: what will be produced (test/code/config/doc).
- Target: which files/directories are expected to change.
- Do not touch: protected or unrelated areas.
- Verify: command(s) or test(s) that prove it is done.
Write the subtasks in the form you chose in the previous step:
- If you chose “one Task Brief + Subtasks section”: Add a \## Subtasks checklist to the existing Task Brief.
- If you chose “multiple Task Brief files”: Create one child Task Brief per subtask. Each child brief must still be a complete Task Brief (scope, acceptance criteria, verification, etc.).
Stop if subtasks reveal hidden scope. If any of the following is true, stop and escalate before implementation:
- Requirements/acceptance criteria are unclear
- A new dependency is needed
- You must touch the protected areas not explicitly approved
- You need more than 8 subtasks (the task is too big; split into multiple Task Briefs)

Example of a decomposed task

Below is the same task “Export PDF fails with 500” decomposed in both forms.

Option 1: One Task Brief + Subtasks section

(inside docs/agent/tasks/YT-3812.md)

## Subtasks
- [ ] 1) Add failing test that reproduces the 500
  Output: regression test fails on current code
  Target: `backend/src/export/*`, `backend/tests/export/*`
  Do not touch: `db/migrations/*`, `.github/workflows/*`
  Verify: run the new test and confirm it fails

- [ ] 2) Implement minimal server-side fix
  Output: export endpoint no longer throws for this case
  Target: `backend/src/export/*`
  Do not touch: `db/migrations/*`, auth/permissions code
  Verify: the new regression test passes

- [ ] 3) Add/adjust integration coverage (if needed)
  Output: higher-level test coverage for export flow
  Target: `backend/tests/integration/*`
  Do not touch: UI code
  Verify: integration tests pass

- [ ] 4) Run full verification
  Output: all checks green
  Target: (no additional code)
  Do not touch: (n/a)
  Verify: `scripts/verify.sh`

Option 2: Multiple Task Brief files

docs/agent/tasks/
  YT-3812.md               (parent overview + links)
  YT-3812-1-repro-test.md   (child: reproduce via test)
  YT-3812-2-fix.md          (child: implement fix)
  YT-3812-3-hardening.md    (child: integration/edges)

In the parent brief (YT-3812.md), add:

## Decomposition
This task is delivered via:
- `YT-3812-1-repro-test.md`
- `YT-3812-2-fix.md`
- `YT-3812-3-hardening.md`

This is how YT-3812-2-fix.md may look:

# YT-3812-2: Fix export endpoint (minimal server-side fix)

## Source
- Parent Task Brief: `docs/agent/tasks/YT-3812.md`
- Ticket (reference): <link>

## Problem
`POST /api/export` returns HTTP 500 when exporting report "Q4 Revenue" as PDF.

## Reproduction / Scenario (self-contained)
Precondition:
- The regression test from `YT-3812-1` exists and fails on current code.

Repro:
- Run the regression test added in `YT-3812-1`.

Expected:
- Test passes and export returns a successful response.

Actual:
- Test fails because the endpoint returns 500.

## Scope
In scope:
- Fix the server-side cause of the 500 for this case.
- Keep changes minimal and local to export logic.

Out of scope:
- UI changes.
- Schema changes / migrations.
- Adding new dependencies.
- Performance optimizations not required to fix the failure.

## Acceptance criteria
- [ ] The regression test from `YT-3812-1` passes.
- [ ] Export returns success for report "Q4 Revenue" PDF (no 500).
- [ ] No changes outside the target area.

## Constraints
- Target: `backend/src/export/*`
- Do not touch: `db/migrations/*`, `.github/workflows/*`, auth/permissions modules
- Dependencies: do not add new dependencies

## Plan (high level)
- Inspect the failing test output and server logs/stack trace.
- Identify the exception and its trigger input.
- Implement the smallest fix that makes the failing case succeed.
- Keep behavior unchanged for other export types/inputs.

## Verification
Run:
- `scripts/verify.sh`
Must pass:
- regression test from `YT-3812-1`
Add/Update tests:
- none (this task relies on the test added in `YT-3812-1`)

## Rollback
- Revert this PR/commit(s). No schema changes.

5. Development loop

So far, you have prepared the repo (governance rules, working model, context pack, task format). This chapter explains how work is executed: how an agent takes a Task Brief, changes code, verifies the result, and produces something a human can review and merge.

In real life, there are two nested loops:

Inner loop: plan → implement → verify → review → iterate Runs on your or a remote machine until the change is “PR-ready”.
Outer loop (PR): PR review + checks → fix commits → repeat → merge Starts after you open a PR.

You can run the inner loop in two modes:

Single-agent mode: One agent plays roles sequentially.
Multi-agent mode: Each role runs in a new agent session (fresh context). This gives a couple of advantages:
- Fresh context – less “anchoring” on earlier reasoning. Each agent works more independently, which is, e.g., very important for the Reviewer role.
- Optional parallelism – different tasks (or different roles) can run at the same time without sharing chat context.

If you do not need parallelism or independent review yet, start with single-agent mode.

1. Start a run

Confirm your task input:
- Task Brief exists at docs/agent/tasks/<task-id>.md. The Planner won’t be needed.
- No Task Brief yet: You have a tracker ticket (or pasted text), and the Planner will create the brief.
Create a branch for this task. E.g., with:
git checkout -b <task-id>-<short-name>
If your policy allows the agent to run git commands, it may perform this step instead.
Start your agent tool in the repo root.

2. Inner loop

In the single-agent mode, you run each phase in the same agent session. In the multi-agent mode, each phase is in a new session.

Inner-loop goal

The inner loop ends when the change meets Definition of Done as far as it can be proven locally:

Scope matches Task Brief.
Verification plan executed locally (scripts/verify.sh and task-specific checks).
Tests added/updated where behavior changed.
PR evidence prepared (risk + rollback + commands/results).
CI/required checks are validated in the outer loop after opening the PR.

How artifacts evolve across iterations

Task Brief (docs/agent/tasks/<task-id>.md): stays intact on new iterations. Maybe overwritten if new details are required.
Status (docs/agent/tasks/<task-id>.status.md): appended per iteration (this is a run log).
PR evidence (docs/agent/tasks/<task-id>.pr.md): overwritten on each iteration (this is the latest PR description text).
Escalations: new file per escalation (docs/agent/escalations/...).

(Optional) Phase 1 – Planner
1. Start with the Planner to create the Task Brief.
  Important: Skip this step if the Task Brief is already approved and stable.
  Prompt example:
  Act as Planner. Read: - AGENTS.md - docs/agent/CONTEXT.md, docs/agent/commands.md, docs/agent/architecture.md Input task: - <paste the raw ticket text here> OR "Use tracker ticket <id>" (if your setup has tracker access) Task: - Create the Task Brief at docs/agent/tasks/<task-id>.md so it meets Definition of Ready. - If the task is too broad, add `## Subtasks` or propose child Task Brief files. - If requirements are unclear, add "Open questions" and stop. Output: Only the Task Brief markdown (or child briefs content if you split). Do not edit production code.
2. Commit the Task Brief before implementation with:
  git add docs/agent/tasks/<task-id>.md git commit -m "<task-id>: task brief"
  If your policy allows the agent to run git commits, it may perform this step instead.
Phase 2 – Implementer
1. Start the Implementer. Prompt example:
  Act as Implementer. Read: - AGENTS.md - docs/agent/tasks/<task-id>.md Implement exactly what the Task Brief requires on the current branch. Follow stop conditions. Outputs required: - Update task status in docs/agent/tasks/<task-id>.status.md using docs/agent/templates/task-status.md as a template. - Create/update docs/agent/tasks/<task-id>.pr.md with PR text using docs/agent/templates/pr-evidence.md as a template. Fill it with real commands/results (initially may be partial until QA runs).
2. Commit work as usual (your team’s commit style). For example:
  git add -A git commit -m "<task-id>: implement"
  If your policy allows the agent to run git commits, it may perform this step instead.
Phase 3 – QA
1. Run verification. Prompt example:
  Act as QA. Read: - AGENTS.md - docs/agent/tasks/<task-id>.md Task: Run the verification defined in the Task Brief (at minimum scripts/verify.sh). If verification fails: - Fix only if the fix is clearly within scope. - Otherwise, escalate. Update: - Append commands run and PASS/FAIL to docs/agent/tasks/<task-id>.status.md.\ Update docs/agent/tasks/<task-id>.pr.md with final commands/results.
2. Commit the status:
  git add docs/agent/tasks/<task-id>.status.md git commit -m "<task-id>: QA"
  If your policy allows the agent to run git commits, it may perform this step instead.
Phase 4 – Reviewer
1. Run self-review. The Reviewer writes its verdict to docs/agent/tasks/<task-id>.status.md. Prompt example:
  Act as Reviewer. Read: - AGENTS.md - docs/agent/tasks/<task-id>.md - docs/agent/tasks/<task-id>.status.md - the current git diff Task: Review the current diff against: - Task Brief scope + acceptance criteria - AGENTS.md policies, including Definition of Done (PR-ready criteria) Update: Append a "Review" entry to docs/agent/tasks/<task-id>.status.md: - Verdict: PASS/FAIL - Findings: bullets - Required changes (if FAIL): bullets
2. Commit the review:
  git add docs/agent/tasks/<task-id>.status.md git commit -m "<task-id>: review"
  If your policy allows the agent to run git commits, it may perform this step instead.
Phase 5 – Iterate
Iterate if any of these is true:
1. Reviewer requested changes
2. QA verification failed
3. Task Brief open questions were resolved
4. You made additional commits after review
To iterate, rerun all the phases with the same prompts as before.
Iteration rules
- If Reviewer requested changes, rerun Implementer, then QA, then Reviewer.
- If QA failed verification, rerun Implementer to fix, then QA (and Reviewer if the fix changed behavior/scope).
- If you changed the Task Brief (scope/criteria), rerun Planner (to update it), commit it, then rerun Implementer.
Exit condition (PR-ready)
Stop the inner loop when:
- QA PASS is recorded in status.md (latest iteration)
- Reviewer PASS is recorded in status.md (latest iteration)
- The change meets DoD locally (scope, tests, verification evidence, risk + rollback)
Then open a PR (step 4) and let CI/required checks run in the outer loop.

3. Stop and escalate (inner loop)

When a stop condition triggers (unclear requirements, risky area, repeated test failures), the agent must stop, record the problem in the repo, and resume only after a human decision is written down. The way the human is triggered depends on the team: agent chat, Slack, PR comments, etc.

The main rule is the the escalation is logged in the escalation file in the repo. In our case, we use docs/agent/escalations/<YYYY-MM-DD>-<task-id>-<topic>.md
Human replies in the same channel where the agent asked.
After the human answers, the agent records the decision to the same file.
Depending on the flow, either the human reruns the blocked phase or the agent continues its work right after receiving the answer.

4. Open a PR (handoff to the outer loop)

The inner loop ends when the change is “PR-ready”. Opening a PR moves the work to the outer (team-facing) loop – review, checks, and merge.

Check the exit condition:
- Task Brief exists and meets Definition of Ready
- Verification passes locally
- No unresolved escalations
- Required artifacts exist:
  1. docs/agent/tasks/<task-id>.status.md with QA and Review PASS
  2. docs/agent/tasks/<task-id>.pr.md
Make sure the changes are committed and push the branch
git status git push -u origin <task-id>-<short-name>
Open a PR using your normal platform (GitHub/GitLab/etc.), targeting your main branch.
Fill the PR description by copying the contents of docs/agent/tasks/<task-id>.pr.md.
(Optional) Update the Task Brief to include the PR link and commit it.

After this, you are in the outer loop (next step).

5. Outer loop

After you open a PR, you are in the outer loop. Work is no longer “local-only”: it’s driven by two inputs:

Review feedback (comments / requested changes)
Checks (CI/status checks required by branch protection)

Automated PR-based iteration (agent runs from PR comments)

Use this if your agent is integrated with your PR system (example: GitHub Copilot coding agent).

Flow:

Reviewer leaves feedback on the PR (batch comments if possible).
A trigger (e.g., review status) triggers the agent to iterate (for GitHub Copilot: mention it in PR comments).
Agent pushes follow-up commits to the same PR branch.
Checks rerun automatically on new commits; repeat until approved and green.

Manual iteration (human runs agent locally using review/check output)

Use this if your agent is local (CLI/IDE) and does not “see” PR comments / CI logs.

Flow:

Reviewer finishes and you collect the feedback (review comments + failing check output).
You run the agent manually on the PR branch and provide the feedback as input.
Agent makes fixes, you push commits, checks rerun, repeat until approved and green.

Minimum input you provide to the agent (copy/paste):

PR link
Reviewer feedback (bullets)
Failing check output (error summary + key logs)
Reminder: “stay within Task Brief scope”

Merge condition (common to both modes)

Merge when:

Required checks are green (branch protection)
Review requirements are satisfied
status.md ends with latest QA PASS + Review PASS
pr.md reflects the final verification + rollback

6. Multi-agent workflow patterns

Describe how to scale it beyond “one person runs agents locally”.

What this chapter should cover (short, practical):

Running roles as separate sessions
1. when it’s worth it (independence + parallelism)
2. common setup: Planner/Implementer/QA/Reviewer as separate runs
Parallel execution patterns
1. “Planner pipeline”: Planner prepares briefs for N tasks ahead
2. “Verification queue”: QA runs verification across multiple branches/PRs
3. “Review assist”: Reviewer agent runs on demand before requesting human review
Shared state discipline
1. what must be in repo files vs what can live in chat
2. how to avoid divergence when multiple agents touch adjacent areas
Conflict handling
1. two agents touch same files
2. rebasing/updating branches
3. stop conditions for merge conflicts / flakey checks

If you want the next chapter to be more valuable immediately, we can instead do:

7. Remote/automated agent runs (orchestration)

Once we have Air Cloud, we can add this chapter.

local vs remote runners
triggers (PR comment, label, slash command, Slack bot)
minimal automation contract: “Task Brief in repo + verify command + write artifacts”
security: tokens, secrets, least privilege

24 February 2026