Policy as Code
The deterministic layer beneath agentic development.
AI rules describe behavior in natural language and trust the model to follow them. Policy as Code describes behavior in executable terms and does not trust anything. It simply runs. The two systems are complements, not substitutes, and the codebases that combine them reach a ceiling neither layer hits alone.
Drawing on 30+ client engagements building wixor_policy and its predecessors, the paper argues that Policy as Code is the missing layer beneath modern agentic development: project-specific rules, evidence-based execution, deterministic outcomes. Combined with AI directives, the system reaches 80-90% accuracy on the invariants teams care about. AI rules alone top out at 60-75%. Generic static analysis tops out at 40-55%.
What's inside
- The deterministic floor problem: why linters, type systems, tests, and AI review all miss the same class of project-specific invariants
- The complementarity principle: why AI rules and policy rules catch different failures, and why the combined system reliably hits 80-90% accuracy
- The evidence hierarchy: runtime introspection, AST, framework metadata, filesystem, and regex. And why regex is a last resort, not a source of truth
- The rule contract, finding schema, gate model, waiver system, and evidence cache that make policy systems maintainable at scale
- Drift as the primary failure mode: four metrics. Evidence freshness, rule hit rate, waiver age, convention coverage. And the quarterly rebaseline pattern
- Operational patterns: policy in CI, the developer loop, AI sessions, and review. Plus the organizational investment model for long-term health
Who this is for
Platform leads, QE owners, and senior engineers building project-specific rule systems alongside AI agents. Teams past the point where everyone can hold the codebase's conventions in their heads. Typically five-plus engineers or 100k-plus lines. Leaders evaluating whether deterministic enforcement belongs in their AI-assisted development stack.
Drift: the primary failure mode
In our engagement data, 72% of policy system failures after the first six months trace to drift. Not initial authoring. A policy that describes the codebase as it existed six months ago is worse than no policy at all. The paper introduces four drift metrics and a quarterly rebaseline checklist that teams can adopt immediately.
Free Download
Policy as Code
The deterministic floor methodology paper