Loop · Agentic QA · 2026

Do more with less in QA.

Reduce low-value testing. Apply AI where it actually compounds. Reposition QA around quality value instead of test execution.

AI-Native QE Readiness

Is your QA team set up for success in the age of AI?

Walk the full tiered checklist. Critical prerequisites first, then the eight readiness categories. Get a gap report your team can act on this week.

Who Loop is for

QA leaders being asked to do more with less.

If any of these sound like the conversation in your head this quarter, you're in the right place.

01
“My team is shrinking, but the regression suite isn't. We're drowning in flaky tests.”
. SarahQA Director at

Series-B fintech · ~50 engineers

We name the 30% of your suite eating 80% of CI minutes. And the move that gets it back without breaking confidence.

See the audit brief
02
“Leadership keeps asking about AI. I don't yet have a defensible answer.”
. MarcusHead of Quality at

Healthcare SaaS · 120 engineers

We separate AI leverage from AI theater. And give you the boss-ready memo before you sign another vendor contract.

See the entry course
03
QA feels less valuable every quarter. I need to reposition the function.”
. PriyaQA Director at

E-commerce platform · 200+ engineers

We turn QA from test execution into the quality intelligence layer your CTO will forward to the board.

See the operating-model reset

Names + companies anonymized at the speakers' request.

Watch · Latest

From the channel

Subscribe on YouTube · @benfellows-dev
Set Up Policy as Code in 1 Hour (Control AI Code Fast)

Apr 28, 2026

Set Up Policy as Code in 1 Hour (Control AI Code Fast)

If you want to start controlling AI-generated code today, this is the simplest way I’ve found to do it. In the previous videos, I talked about why agentic development breaks at scale and introduced the concept of policy as code as a way to fix it. In this video, I’m showing how to actually get started. The idea is straightforward. Instead of relying only on prompts, rules, or memory to guide AI, you introduce a deterministic layer that scans your codebase and flags violations. Think of it as a much more comprehensive, fully customizable linting system that works alongside tools like Claude. What surprised me is how easy it is to get a first version working. In this walkthrough, I show how you can go from zero to a basic policy as code setup in a very short amount of time. We start by generating a small set of rules, wire up a simple scanner, and immediately run it against a real codebase. Even with a basic setup, you’ll start catching issues and inconsistencies right away. This is not the full system I use in production. At scale, this turns into hundreds or even thousands of rules, with more advanced concepts like evidence layers, caching, and reporting. But the goal of this video is to show that you don’t need any of that to begin. If you’re using AI to write code and you’re starting to see drift, inconsistency, or quality issues over time, this is a practical way to start putting guardrails in place. Over time, what I’ve found is that as you add more rules, the amount of drift drops significantly, and the system becomes more reliable without slowing development down. If you haven’t watched the earlier videos in this series, I’d recommend starting with those for more context on why this approach exists and how it fits into a larger agentic workflow. If you try this yourself, I’d be interested to hear what kinds of rules you end up writing and what it catches in your codebase.

Watch on YouTube →
I Tried Building with Agentic Factories. They Failed. Here’s What Worked Instead.

Apr 27, 2026

I Tried Building with Agentic Factories. They Failed. Here’s What Worked Instead.

I spent time building with “agentic factories” - multi-agent pipelines that promise fully autonomous workflows. On paper, they look like the future. In practice, they broke down in ways that matter: reliability, coordination, and real-world constraints. In this video, I break down where these systems failed, why they fail structurally, and what actually worked instead in production. If you're building with AI agents, this will save you time (and probably some pain).

Watch on YouTube →
How We Use Policy as Code to Control Claude and AI Agents

Apr 24, 2026

How We Use Policy as Code to Control Claude and AI Agents

Claude and other AI agents are incredibly good at writing code. The problem is they don’t stay consistent over time. In the first few iterations, everything looks great. Output is fast, patterns are mostly correct, and it feels like you’ve unlocked a new level of development speed. But as the codebase grows, small inconsistencies start to compound. Patterns drift, structure degrades, and eventually the system becomes harder to maintain than it was before. That’s the problem this video is about. In this walkthrough, I break down how we use a concept called policy as code to control AI-generated code in real systems. Instead of relying only on prompts, rules files, or memory, we introduce a deterministic layer that enforces how code is allowed to be written. Every time an agent makes changes, those changes are checked against a large set of rules. If something doesn’t match the expected patterns, it fails. The agent has to fix it before moving forward. This ends up acting like a much more comprehensive version of linting, but tailored specifically to your architecture, your patterns, and your codebase. The result is that we’re able to keep the speed benefits of AI while dramatically reducing drift and long-term degradation. This video focuses on how the system works in practice. What kinds of rules we write, how they’re structured, and how they integrate into an agentic workflow using tools like Claude. If you’re experimenting with AI coding and running into issues with inconsistency or quality over time, this is one approach that has worked well for us. I’ll also be doing follow-up videos on how to implement this from scratch and how it fits into larger agentic pipeline systems. If you’ve tried something similar or have different approaches to controlling AI-generated code, I’d be interested to hear about it.

Watch on YouTube →

Track record

What Loop's last year of engagements looks like in numbers.

30+

Engagements shipped

94%

On-time releases

−42%

Avg. regression CI minutes

0

Critical escapes (last 12 mo)

Numbers reflect engagements where Loop ran the operating-model reset or the transformation sprint. See the client roster for the full case set.

Resources

Templates, calculators, and guides we use with our clients.

Drop your email and we'll send the asset. No drip funnel, no sales calendar. One email, the file, and you're done.

TemplateEditable doc + Notion template

90-Day QA Leverage Plan

Coming soon

The exact week-by-week plan QA leaders use to defend headcount and prove output in a single quarter.

TemplateSheets + Looker Studio

QA Metrics Dashboard

Coming soon

Six metrics your CTO actually cares about. Escape rate, regression drag, recovery time, leverage ratio, AI yield, ownership clarity.

TemplateRACI worksheet

Quality Ownership Matrix

Coming soon

Stop QA-as-bottleneck. Map every test layer to a named owner so engineering can't push everything down to your team.

DiagnosticPersonalized PDF report

QA Leverage Scorecard

Coming soon

12 questions. Honest score. Tells you whether your team is a cost center, a guardrail, or a leverage multiplier.

DiagnosticQuarterly + annual loss model

Flaky Test Cost Calculator

Coming soon

Plug in your CI minutes, retry rate, and team size. Get the dollar figure flaky tests are costing you this quarter.

Guide32-page PDF, 25-minute read

The QA Director's Guide to Doing More With Less

Coming soon

How to keep release safety from collapsing when your team is shrinking and your scope is growing.

Three doors

Pick the one that matches where you are.

Template

90-Day QA Leverage Plan

Coming soon