Feature Stores: What, Why, When
A feature store is a database optimised for serving the same features at training time and serving time. If those two paths diverge, your model breaks. The store is the fix.
The training/serving skew problem
Training data is computed offline, often in a data warehouse. Serving data is computed online, often in microseconds, against live state. If the computation differs in any way, rounding, time zones, null handling, aggregation window, the model sees different distributions in training and serving. Accuracy collapses.
This is “training/serving skew.” The single biggest source of silent ML failures in production.
A feature store solves it by being the canonical computation path for every feature, used by both training pipelines and serving pipelines. Same code, same definition, same result.
Online vs offline
Feature stores split into two storage paths:
- Offline store: large historical feature values, used for training. Typically a data warehouse or columnar lake (BigQuery, Snowflake, Iceberg).
- Online store: latest feature values per entity, served at low latency. Typically a key-value store (Redis, DynamoDB, Cassandra).
The feature store handles synchronisation: when a feature is computed offline, the latest values are streamed to the online store so serving sees the same definition.
Popular options in 2025
- Feast: open-source, lightweight, the “default starting point”. Works on top of your existing infra. Less heavyweight than enterprise alternatives.
- Tecton: commercial, full-featured, best-in-class for streaming features and complex DAGs. Enterprise-priced.
- Hopsworks: open-source + commercial, strong on European compliance, ML-native data platform.
- Vertex AI Feature Store / SageMaker Feature Store: cloud-native managed offerings. Good if you’re already on Google or AWS.
For most teams: Feast self-hosted is the first stop. Migrate when team or feature complexity demands it.
When you don’t need a feature store
Three signs you can skip it:
- You have one or two ML models in production. Direct SQL feature computation is fine.
- Your features are simple aggregates over a single source table. The skew risk is low.
- Your team is small and the operational cost of running a feature store dominates the benefit.
Most teams below 5 ML engineers don’t need one. Most teams above 15 do. The middle is judgement.
How to start
- Identify your top 5 features by impact. Document their definitions.
- Migrate those into Feast first. Confirm offline and online values match.
- Update one model’s training and serving paths to use the feature store. Measure: did skew drop?
- If yes, expand. If no, keep iterating, the feature store isn’t the bottleneck.
The mistake to avoid: a six-month feature-store project with no measurable improvement. Start small, measure, expand. The investment compounds when it works and is salvageable when it doesn’t.