Build Determinism Discipline
Same input, same output.
What deterministic builds mean
Deterministic builds produce byte-identical artifacts from byte-identical inputs. Independent of machine, time, or parallelism. The property catches supply-chain and reproducibility failures and is increasingly required by SLSA Level 3 and regulated industries.
- Same inputs, same outputs. Byte-identical artifact regardless of machine, time, or parallel-build order.
- Catches supply-chain and reproducibility issues. Audit-trail confidence per build; surfaces "works on my machine" surprises before they ship.
- SLSA Level 3+ requirement. Increasingly required by regulated industries; supply-chain attestation depends on it.
- Documented hash per build. Published artifact hash supports verification by downstream consumers.
Common sources of non-determinism
Non-determinism creeps in from predictable places. Timestamps embedded in artifacts, ordering of file globs and parallel outputs, and unpinned dependency versions are the usual culprits.
- Timestamps in artifacts. Embedded build time and file mtime in tarballs; the most common source by a wide margin.
- Random orderings. File globs, hash map iteration, parallel build outputs all produce different orders run-to-run.
- Dependency versions at build time. Unpinned dependencies resolve differently as upstream changes; pin everything via lockfiles or hermetic builds.
- Locale and environment leaks. LC_ALL, TZ, hostname all leak into builds in subtle ways; the discipline catches environment-driven differences.
How to fix non-determinism
Fixes are mechanical. Set the standard env vars, sort lists explicitly, pin dependencies to hashes rather than versions, and treat the toolchain as part of the input.
- SOURCE_DATE_EPOCH. Standard environment variable per build; most modern build tools honour it for embedded timestamps.
- Sort file lists explicitly.
find ... | sort | xargs ...pattern per build removes filesystem-order leakage. - Pin to hash, not version. Hash-locked references per dependency; Bazel MODULE.bazel.lock, Nix flake.lock, Cargo.lock with frozen mode.
- Hermetic toolchain. Explicit compiler and linker version per build; toolchain drift is the failure mode without it.
Validate determinism
Validation is the proof. Build twice on different machines, compare hashes, investigate any drift with diffoscope. CI job catches latent regressions.
- Build twice on different machines. Cross-machine build per validation; compare hashes between runs.
- diffoscope shows differences. The standard diff tool for binary artifacts; usually reveals one stray timestamp or sort order.
- CI job for drift detection. Build, hash, rebuild, compare flow per CI run flags any drift before it ships.
- Alarm on drift without source change. Hash change without corresponding source change is a latent regression worth alerting.
When to invest
Investment is selective. Required for external artifacts customers will hash-verify, recommended for production deploys, skip for internal-only tools where reproducibility is not load-bearing.
- External artifacts. npm, Docker Hub, container-registry publication; required because customers verify hashes.
- Production deploys. Recommended for supply-chain risk reduction; pays back at the next audit.
- Internal-only tools. Skip when reproducibility is not load-bearing; spend engineering hours where they pay back.
- Documented driver per investment. Named "why now" rationale catches premature investment in determinism for tools that do not need it.