Test Coverage Floor
Minimum coverage required.
Set
Test coverage is a floor, not a target. The goal is to prevent the codebase from drifting into untested territory, not to hit some specific high number. Set the floor at a level you can defend, enforce it ruthlessly, and resist the temptation to set it higher than you can sustain.
How to set a coverage floor that holds up:
- Per service, not per repo.: Different services have different coverage profiles based on their nature. A pure logic library can plausibly hit 95%; an integration-heavy service that talks to many backends might max out at 65%. The floor is per-service, calibrated to what the service can actually achieve with reasonable testing investment.
- 70 to 80% typical for most services.: The sweet spot for application code. High enough to force most non-trivial code to be exercised by tests; low enough to avoid forcing useless tests on trivial getters or framework-driven boilerplate. Anything below 60% is too loose; anything above 85% becomes an exercise in chasing percentage points.
- PR coverage, not just baseline.: The floor applies to coverage of the lines in the PR, not the whole codebase. A PR that drops the global coverage from 80.5% to 80.1% is fine; a PR whose own changes are at 30% coverage is not. The discipline is on each change, not the aggregate.
- Below the floor: PR rejected.: When a PR's own coverage is below the floor, the CI pipeline blocks the merge. This is non-negotiable. Authors must add tests or document why specific lines are not testable. The block is the enforcement mechanism.
- Set once, maintain constantly.: Picking the floor is a one-time conversation per service. Defending it is a continuous practice. Each PR that would lower the average gets the friction; the average drifts up over time as quality improves.
The floor is a discipline, not a metric. It exists to prevent a specific failure mode (untested code creeping in), not to measure overall health.
Enforce
A coverage floor that is not enforced is a wish. Most teams have an aspirational coverage number that nobody checks at PR time, so coverage gradually decays as untested code lands. The fix is mechanical enforcement at the CI gate.
- CI reports coverage on every PR.: The PR check shows the coverage of the changed lines and compares it to the floor. The check passes or fails; the team does not have to read a report to know whether they are inside policy.
- Below floor blocks merge.: A PR that fails the coverage check cannot be merged until the author adds tests. The block is at the platform level (GitHub branch protection, GitLab merge request rules) so it cannot be bypassed casually.
- Specific files exempted explicitly.: Some files genuinely cannot be unit-tested (autogenerated code, framework boilerplate, integration-only test scaffolding). Exempt them by name in the coverage config, with a comment explaining why. Exemptions are deliberate, documented, and reviewed.
- Coverage trend visible.: The team's coverage over time is on a dashboard. A team that is gradually drifting down is a team about to find out their floor is too low. A team that is gradually rising is doing the work. The trend is the leading indicator.
- Discipline, not heroics.: The enforcement is constant and small. Each PR adds the tests it needs. Nobody is doing a "coverage sprint" to get the number up; the number stays up because the floor is enforced on each merge.
Enforcement turns coverage from a wish into a property. The team learns to treat tests as part of the change, not as a separate optional artifact.
Avoid
The most common coverage mistakes are treating coverage as a goal rather than a floor, and chasing percentage points without thinking about what is being tested. Coverage is game-able and the team that games it loses both ways: the metric stays high, the testing quality stays low.
- Avoid coverage as a goal in itself.: "We need to get to 95% coverage" is not a useful goal. The pursuit produces tests that exercise lines without verifying behavior, just to bump the number. The metric improves; the actual safety against regressions does not.
- Quality matters more than quantity.: A test that exercises a function but does not assert the function's behavior is worse than no test, because it gives false confidence. Reviewers should ask "what does this test prove?" not "does this test exist?" The latter is easy to answer; the former is the actual signal.
- Game-able metric.: Coverage can be inflated with no-op tests that touch many lines but verify nothing. With auto-generated code that bumps the lines-under-test count. With test of trivial getters and setters. The team that wants to hit a high number can do so without improving anything.
- Mutation testing as a counter.: Mutation testing (tools that introduce small bugs and check whether tests catch them) measures whether the tests would actually fail if the code were wrong. The mutation score is a much harder metric to game and a much better signal of test quality. Use it as the qualitative complement to the quantitative coverage floor.
- Avoid 100%-coverage culture.: A team that demands 100% coverage on every line ends up writing tests for the autogenerated import statements. The cost in engineer time is real and the benefit is zero. Set a sane floor; let the floor be the floor.
Test coverage is a floor that prevents drift, not a target that measures health. Nova AI Ops integrates with code coverage providers (Codecov, Coveralls, native CI reports) to enforce per-PR and per-service floors, surfaces mutation testing scores alongside line coverage, and tracks the qualitative trend that distinguishes real test improvement from metric-chasing.