Secret Leak Detection
Secrets leak in code, logs, configs. The detection.
Repo scanning
Secret leak detection is the security discipline of finding credentials that have escaped into places they should not be. Source repositories, log archives, public web pages, public chat channels. Each is a possible leak path; each requires its own detection mechanism. Layered scanning catches what any single mechanism misses.
What repository scanning provides:
- GitHub secret scanning native.: GitHub scans every public repo and many private ones for known secret patterns. AWS keys, GitHub tokens, API keys for major SaaS providers all get detected. The scan runs continuously; detections trigger notifications to the repo owner and the credential issuer.
- trufflehog for deep history scanning.: Trufflehog scans the full git history of a repo, not just current state. Secrets that were committed once and then deleted live forever in git history; trufflehog finds them. Repository migrations or audits use trufflehog routinely.
- gitleaks for CI integration.: Gitleaks runs as a pre-commit hook and in CI. Each commit is scanned before merge; detected secrets block the merge. The catch happens at the earliest point; the leak never reaches the published repo.
- Catches commits.: The scanning catches secrets when they are committed. With pre-commit, the catch is local. With CI, the catch is at PR time. With repository-level scanning, the catch is post-commit. Each layer catches what the earlier ones missed.
- Pattern-based detection.: Most scanners use regex patterns plus entropy analysis. Patterns catch known-shape secrets (AWS access keys start with AKIA); entropy catches high-randomness strings that look like credentials. The combination produces good coverage with low false-positive rate.
Repo scanning is the primary detection layer. Most accidental commits get caught here; the cases that escape go to the secondary layers.
Log scanning
Even when a secret is not committed to source control, it can leak into log archives. An application logs a request that contains an API key; an error log includes a stack trace with a credential; a debug statement prints a sensitive value. Log scanning is the discipline of catching these.
- Periodic scan of logs for secret patterns.: A scanner runs against the log archives looking for the same patterns that repo scanners look for. Logs containing matches are flagged; the team investigates and remediates.
- Catches accidental logging.: The most common log leak is an application that logged a value it should have redacted. The redaction layer had a gap; the value made it to logs. The scan catches this; the redaction layer gets fixed.
- Sample-based for performance.: Scanning every byte of every log archive is expensive. Sample-based scanning (1% random sample) catches most leaks at fraction of the cost. Coverage extends over time as samples rotate.
- Cloud-native log scanning.: AWS Macie, GCP DLP, Microsoft Purview scan cloud-stored logs for sensitive patterns. The integration is operational; the team configures rules and receives findings.
- Findings feed remediation.: Each finding triggers two actions. Rotate the leaked credential. Fix the redaction layer that allowed the leak. The remediation is structured; the same kind of leak does not recur.
Log scanning is the catch-up layer. It does not prevent leaks; it surfaces them after the fact so the team can rotate and harden.
Respond
Detection is the input to response. The response decides what to do about the detected leak. The instinct is wrong on the most common case: deleting the leak is not enough. The credential has been compromised and must be treated as such.
- Found leak: rotate immediately.: The leaked credential is rotated as soon as the leak is confirmed. The new value is generated; consumers are updated; the old value is invalidated. The window during which the leaked credential is usable should be measured in minutes.
- Investigate scope.: Where else might the credential have leaked? When did the leak start? Has the credential been used by anyone other than authorized parties since the leak began? The investigation is structured; the audit log is the foundation.
- Don't just delete the leak.: Removing the commit that contained the secret does not remove the secret from the world. Anyone who pulled the repo before deletion has it. Anyone who indexed the commit (GitHub's API, search engines, automated scrapers) has it. The credential is compromised regardless of whether you deleted the commit.
- Document the incident.: Each detected leak produces a record: which credential, when leaked, when detected, when rotated, what was investigated, what remediation was done. The record feeds the postmortem and the audit trail.
- Postmortem if sensitive.: Some leaks warrant postmortems: leaked production credentials, leaked customer data, leaked encryption keys. The postmortem produces structural improvements (better redaction, better scanning, better access control) that prevent recurrence.
Secret leak detection is one of those security disciplines where the cost of detection is small compared to the cost of an undetected leak. Nova AI Ops integrates with secret scanning across repos, logs, and other surfaces, surfaces leak detections as security incidents with structured response workflow, and produces the audit artifacts compliance frameworks expect.