Watermarking AI Output
AI-generated text and images need to be detectable. Watermarking embeds invisible signatures that prove a piece of content came from a specific model.
Text watermarking
Text watermarking embeds a detectable signal in AI-generated text without changing the text's apparent meaning. The signal is statistical, certain token sequences are slightly more likely than others, in ways the watermark detector can find but humans can't notice. State-of-the-art text watermarking has 90-95% detection accuracy with low false-positive rates on un-watermarked text.
The mechanism. The model's sampling is biased: at each token, the next-token distribution is slightly modified to prefer tokens in a "green list" (a pseudorandom subset). Detector counts green-list tokens in the text; if substantially above chance, text is watermarked.
The "without changing meaning" claim. The bias is small enough that the text reads naturally. Single sentences may show no statistical signal; longer texts (a paragraph or more) show the signal clearly.
The detection threshold. Detector computes a z-score: how many green-list tokens vs expected by chance. Above threshold = watermarked. Threshold tuning trades false positives for false negatives. Production systems aim for <1% false positive rate, accepting some false negatives.
The state of deployment. Several major model providers have deployed text watermarking in 2024-2025: Google's SynthID Text, OpenAI experimental, Anthropic experimental. None mandates watermarking by default for all outputs; the deployment is partial and contested.
Image watermarking
Image watermarking is more mature than text. Approaches: invisible perturbations in pixel patterns, metadata embedded in image files, neural-network-based watermarking that survives compression and editing. Adobe and Google's SynthID (image) are production-deployed; many AI image generators add watermarks by default.
The pixel-perturbation approach. Modify pixel values in patterns invisible to humans. The perturbations are robust to compression, cropping, color adjustments. Detector reads the pattern; flags as AI-generated. SynthID Image works this way; survives most casual edits.
The C2PA standard. Coalition for Content Provenance and Authenticity. Cryptographically-signed metadata embedded in image files. Tracks provenance: who made it, when, with what tools. Doesn't survive metadata stripping but provides strong signal when present. Adobe, Microsoft, BBC are members.
The neural-network approach. Train a model to embed watermarks survival of common edits. Robust to crops, rotations, color changes, even mild edits. The state of the art for media that will be edited heavily.
The deployment reality. Many AI image generators add watermarks by default (DALL-E 3, Imagen, some Stable Diffusion variants). Open-source models often don't. Determined adversaries can strip watermarks; casual sharing typically preserves them. The deterrent is real for casual misuse.
Removal attacks
Watermarks can be defeated. Paraphrasing AI text often removes watermarks. Image watermarks survive most casual edits but can be defeated with deliberate processing. Watermarks raise the cost of misuse rather than preventing it. The honest framing: watermarks are useful but not bulletproof.
The text-paraphrase attack. Run watermarked text through another LLM with "paraphrase this". The output preserves meaning but loses the green-list bias. Detection drops to chance. The attack is cheap and effective; reflects on what watermarks can and can't do.
The image-edit attack. Heavy crop, re-encode, color shift, JPEG compression at low quality. Some watermarks survive each individually; few survive the combination. Adversaries with technical skill can defeat any current image watermark.
The defense ladder. No single watermark is sufficient. Combinations (multiple independent watermarks, cryptographic signatures, content fingerprints) raise the bar. Detection of "ANY" watermark or fingerprint provides better coverage than detection of one specific scheme.
The threat-model framing. Watermarks deter casual misuse (most users). They don't stop motivated, technically-skilled adversaries. Choose your investment based on which threats matter; both are valid use cases for different products.
Practical reality
Don't promise users that AI content will be detected. Watermarks help; they don't solve. Build detection into pipelines as one signal among many; combine with other heuristics (writing style, factual patterns, behavioural signals). The "detect AI content" feature is overpromised across the industry.
The honest UX. "We add watermarks to outputs. Other AI providers may not. Watermarks may be removed by editing. Detection is a signal, not a guarantee." Users who understand the limitations make better decisions; users who think AI is reliably detectable make worse ones.
The platform-level perspective. Major platforms (social media, news) increasingly accept C2PA-signed content with provenance. Unsigned content gets less trust. The "show your work" credentialing is more useful than detection-after-the-fact.
The regulatory direction. EU AI Act and similar regulations push for watermarking of AI-generated content. The push is toward providers watermarking by default; consumers can detect and inform themselves. Regulatory direction supports watermarking deployment broadly.
The "labelling" vs "watermark" distinction. Labelling is visible disclosure ("this is AI-generated"). Watermarking is hidden detectable signal. Labelling is honest and reliable; watermarking is robust to label removal. Production systems should do both.
Common antipatterns
Marketing AI detection as reliable. Watermarks are removable. Don't promise detection that you can't deliver.
Single-watermark deployments. One scheme is one defense to defeat. Layer multiple.
Skipping cryptographic signatures (C2PA). Signatures provide strong provenance when preserved. Easy to add at generation time.
Treating "no watermark detected" as proof of human origin. Many AI outputs aren't watermarked (open-source models, watermark stripping). Absence of watermark doesn't prove absence of AI.
What to do this week
Three moves. (1) If you generate AI content, audit which of your outputs are watermarked. The gap between "we plan to watermark" and "we watermark" is often substantial. (2) Add C2PA signing to image generation if you can. The provenance signal is strong when preserved. (3) Don't promise users that AI content can be reliably detected. Be honest about what watermarking can and can't do.