Federated Learning: Training Without Data Movement
Send the model to the data instead of the data to the model. Federated learning is the architecture for training when data can’t leave its origin.
The core idea
Instead of collecting all data centrally and training, send a model to each data source. Each source trains locally on its own data, sends back only the gradient updates. Aggregate the updates centrally; the data never moves.
Mechanics
Each round:
- Server sends current model weights to N participating clients.
- Each client trains for a few steps on its local data.
- Each client sends back the gradient update (or new weights).
- Server averages updates (FedAvg) and produces the new global model.
- Repeat.
Variations: FedProx (regularises against the global), FedNova (corrects for heterogeneous local steps), and many more.
The privacy reality
Federated learning is privacy-preserving in spirit, not by default. Gradient updates can leak training data via reconstruction attacks. To get real guarantees, you combine federated learning with differential privacy (clip gradients, add noise) and secure aggregation (cryptographic protocols so even the server can’t see individual updates).
Plain federated learning without those additions is “data minimisation, not privacy.”
Where it fits
- Mobile keyboards (Google Gboard).
- Healthcare consortia training across hospitals.
- Financial fraud models across institutions.
- Multi-tenant SaaS where customers won’t share raw data but accept aggregate learning.
Where centralised wins: most other cases. Federated training is slower, more complex, and statistical-power-limited. Use it when data sovereignty is the binding constraint.