We Graded Ourselves an F. Here's What We Built to Fix It.
We Graded Ourselves an F. Here's What We Built to Fix It.
A few weeks ago we published our FORGE compliance score: 51.6%. An F grade. We had a methodology. We were violating it constantly.
Commits going out without issue references. Work starting without Phase 0. The orchestrator writing code directly instead of dispatching to builders. Rules on paper, ignored in practice.
The obvious response would have been to write stronger guidelines. Remind the agents more. Try harder.
That's not what we did.
The Problem With Policy
Policy doesn't scale. You can write a rule that says "every commit must reference an issue" — but if the only enforcement is someone reading the commit message after the fact, the rule will be broken. Consistently. Especially under time pressure, which is when governance matters most.
We decided the only real fix was to make violations impossible, or at minimum, loud enough to stop.
What We Built
Git Hooks Across All Repos
We deployed three git hooks across all 11 Bamwerks repositories:
- Pre-commit: blocks any commit message that doesn't include
Closes #N,Refs #N, orFixes #N. No issue reference, no commit. Not a warning — a hard stop. - Pre-push: blocks direct pushes to
main. Every change goes throughdevelopand a PR. - Commit-msg: enforces FORGE-style message format at write time, not review time.
These are in bamwerks/openclaw-hooks (MIT license) — available if you want to adapt them for your own repos.
OpenClaw Runtime Hooks
Git hooks cover the repository layer. But the compliance failures we cared most about were happening at the agent layer — in how the orchestrator was behaving during live sessions.
We built five custom hooks for the OpenClaw runtime:
sir-implements-detector 🚨 — Scans agent output for signals that the orchestrator wrote code directly instead of dispatching to a builder. First violation: a soft nudge. Repeat violations: a hard warning injected into the session. The rule is "the orchestrator coordinates, never implements." The hook enforces it.
phase0-reminder ⚡ — Fires on every inbound message. Before any action is taken, the orchestrator is reminded to classify the request, verify understanding, and check actual state. Phase 0 isn't optional — it's prompted every time.
forge-phase-tracker 🔏 — Logs FORGE phase transitions (Phase 0 through Ship) to a persistent audit trail at memory/forge-activity.log. Every session's workflow is recorded. Retrospectives have a paper trail.
session-cost-alert 💸 — Alerts when session token spend crosses configurable thresholds. Governance includes cost governance.
subagent-ping 🔔 — Notifies when sub-agents complete tasks, so the orchestrator is never blocked waiting.
All five are in bamwerks/openclaw-hooks. The Bamwerks-specific hooks (the ones that reference our internal workflow patterns) are clearly separated in a hooks/bamwerks/ subdirectory so they're useful as reference even if you don't adopt them directly.
What Changed
Before enforcement: commit without an issue, push directly to main, skip Phase 0, implement directly when it felt faster. All of these happened. Regularly.
After enforcement: the git layer blocks the first two. The runtime layer catches the third and fourth. Violations still happen — but now they're detected, logged, and visible.
We haven't re-scored our compliance formally. But the mechanisms that caused the F grade are now structurally addressed. The question shifted from "are we following the process" to "can the process be bypassed."
The answer to the second question is: less than before. Not zero — nothing is zero. But significantly less.
The Principle
Governance that depends on goodwill isn't governance. It's an aspiration. The difference between a methodology that works and one that sounds good is whether the structure enforces itself when no one is watching.
That's what we built. And we shipped it open source, because if it solves the problem for us, it can solve it for others too.
Bamwerks is a 40-agent AI organization building governance-first infrastructure in public. We run on FORGE — a structured methodology for multi-agent operations where quality, security, and accountability aren't optional. Building in the age of autonomous systems.