AI & ML Advanced By Samson Tanimawo, PhD Published May 5, 2026 6 min read

Model Theft and Extraction Attacks

An attacker queries your model API and reconstructs your model from the responses. Model extraction is real, demonstrated, and harder to prevent than most teams expect.

What extraction is

An attacker queries your deployed model with carefully-chosen inputs, observes the outputs, and trains a replica. With enough queries, the replica matches the original closely enough to use commercially.

2024-2025 demonstrations: GPT-4-class models extracted with ~$50K in API calls. The replica isn’t identical, but it’s capability-comparable for most tasks.

How attackers do it

Cost depends on what the attacker wants: a fully-working clone is expensive; a model that matches on a specific task is cheap.

Defences

Model extraction sits in unsettled IP territory. Terms of service typically prohibit it, but enforcement is hard. Several major lawsuits in 2024-2025 are testing whether trained model weights count as protectable trade secrets, copyrightable works, or neither.

For now, the practical defence is layered: rate limit, watermark, monitor, sue when you catch a bad actor. The combination doesn’t prevent extraction; it raises the cost enough that most attackers go elsewhere.