Watermarking AI Output
AI-generated text and images need to be detectable. Watermarking embeds invisible signatures that prove a piece of content came from a specific model.
Text watermarking
The dominant approach: at each generation step, partition the vocabulary into a “green” and “red” set using a hash of the prior tokens. Bias generation toward green tokens. Detection: re-derive the partition, count green-token frequency, compare to chance.
Image watermarking
Embed a low-amplitude pattern across the image during diffusion. Detection: a model trained to recognise the pattern. Robust to mild compression and cropping; brittle to substantial transformation.
Removal attacks
- Paraphrasing strips text watermarks reliably.
- Image regeneration via a different model removes image watermarks.
- Substantial editing degrades detection signal in both cases.
Practical reality
Watermarking raises the cost of fraud but doesn’t prevent it. Combined with content credentials (cryptographic signatures attached to media), it deters casual misuse but loses to motivated attackers. The 2026 regulatory tilt is toward provenance signatures, not bulletproof detection.