The Pre-Merge Eval Gate (For Code That Touches AI)

Code that touches AI features should not merge without an eval pass. The gate, the latency, and the team behaviours it changes.

The gate

The gate runs on every AI-touching PR. Eval suite must pass; failure blocks merge by default; override allowed but logged so the team can audit. Latency budget under 5 minutes per suite keeps the gate from becoming the thing engineers route around. Named owner per suite prevents stale or noisy evals from accumulating.

Scope

The gate is for AI behaviour changes, not pure infrastructure refactors. Prompts trigger it because they directly shape model output; files that call LLM APIs or parse output trigger it because both the call and the parsing surface affect behaviour. Reviewer enforces scope; disagreements escalate to the suite owner.

What it changes

The gate changes engineering culture more than it changes any single PR. Evals stop being an afterthought because they are now the thing standing between the PR and merge. Regressions get caught pre-merge rather than in production. Quality compounds as the suite grows alongside features.