Feature: Eval Harness

Testing framework.

Overview

The Nova eval harness is the testing framework that gates AI model changes. Single benchmarks measure capability at a moment; the framework produces continuous quality assurance by gating every model change against per-task suites that match the actual workload.

The approach

The practical approach: per-task suites match each capability, threshold gating blocks under-performing promotions, CI integration runs eval on every change, regression detection per version, documented per-task eval criteria. The team’s discipline produces predictable AI quality rather than vibes-based promotion.

Why this compounds

Eval harness discipline compounds across model changes. Each evaluated model preserves quality; the team’s AI engineering grows; new capabilities inherit the eval framework.

Eval harness discipline is an engineering discipline that pays off across years. Nova AI Ops invests in model quality as a first-class engineering surface.