Differential Privacy in ML
Differential privacy gives mathematical guarantees that no individual’s data measurably affects the model’s output. The cost is accuracy; the benefit is provable privacy.
What DP guarantees
Differential privacy is a mathematical condition on a training procedure: the inclusion or exclusion of any single individual’s data must change the model’s output distribution by at most a small factor. Formally: e^epsilon for some chosen epsilon.
The implication: an attacker observing the trained model can’t determine, with high confidence, whether a specific person’s data was in training.
The epsilon parameter
- epsilon < 1: strong privacy. Significant accuracy cost.
- epsilon = 1-3: moderate. Common in research deployments.
- epsilon = 8-10+: weak. Often the only level achievable in production. Some critics consider this barely meaningful privacy.
Epsilon is a budget consumed across queries; it doesn’t reset.
How DP-SGD works
The standard implementation:
- Compute per-example gradients (not the average over a batch).
- Clip each gradient to a fixed L2 norm.
- Add Gaussian noise calibrated to the clipping norm and target epsilon.
- Aggregate (sum) the clipped, noised gradients.
- Step.
The noise is what prevents any single example from dominating the gradient signal.
Accuracy cost
Empirical pattern: at epsilon = 8 with millions of training examples, modern DP-SGD reaches within 5-10% of non-private accuracy. At epsilon = 1, the gap can be 20-30%. Smaller datasets pay more.
Real uses
Apple keyboard prediction, Google location data, US Census Bureau, several healthcare consortia. The combination of regulatory pressure and improved DP-SGD techniques is making DP increasingly standard for sensitive-data ML.