The Pre-Merge Eval Gate (For Code That Touches AI)
Code that touches AI features should not merge without an eval pass. The gate, the latency, and the team behaviours it changes.
The gate
On every PR that touches a prompt, an LLM call, or a related code path, the eval suite must pass.
Failure blocks merge by default. Override is allowed but logged.
Latency budget: eval suite runs in under 5 minutes. Anything longer slows the team; tune.
Scope
All prompt files. All files that call LLM APIs. All files that parse LLM output.
Not: pure infrastructure code. The gate is for AI behaviour changes, not refactors.
Reviewer enforces scope. Disagreements escalate.
What it changes
Engineers think about evals before writing prompts. Evals stop being an afterthought.
Regressions are caught before merge, not in production.
Quality compounds. Every PR moves the eval scores; the suite grows; the system gets reliable.