Service Mesh: When and When Not
Service mesh trade-offs. When mesh is overkill.
When
Service mesh when and when not is the foundational decision. The discipline is recognizing when mesh's value justifies the complexity; many teams adopt mesh without considering whether they actually need it.
What when-to-use looks like:
- mTLS between services required.: When the team needs encrypted service-to-service traffic, mesh provides it. Implementation without mesh is complex; mesh handles it; the discipline matches the requirement.
- Many services that need traffic policies.: Complex traffic management (canary, traffic splitting, fault injection) at scale benefits from mesh. The investment in mesh tooling produces value across many services.
- Compliance or security need.: Some compliance regimes expect specific traffic controls. The mesh's policies can be the implementation; the discipline matches the compliance requirement.
- Cross-cluster traffic.: When traffic spans clusters, mesh's federation patterns help. The discipline produces consistent identity and policy across clusters.
- Mature platform team.: Mesh requires platform team capacity. Teams with the capacity benefit; teams without it struggle.
The when-to-use is specific. Mesh's value is real but bounded.
When not
Many teams should not adopt mesh. The complexity exceeds the benefit; simpler patterns work.
- Small clusters (less than 10 services).: Small clusters do not have mesh-justifying complexity. The discipline is using simpler tools; the cost-benefit favors simplicity.
- Limited operational capacity.: Mesh requires platform team capacity to operate. Teams without dedicated platform engineers struggle; the discipline accommodates the team's capacity.
- Mesh overhead exceeds benefit.: The mesh's resource consumption (sidecars, control plane) is real. For small clusters or simple needs, the overhead exceeds the benefit; the discipline avoids it.
- Application-level patterns work.: Many mesh features can be implemented at the application level. Service discovery, retries, circuit breakers all are library-level concerns; the team's discipline considers these.
- Recognize the gap.: The team that recognizes mesh is overkill saves significant cost. The discipline includes saying no to mesh when simpler patterns suffice.
The when-not-to-use is broader than the when-to-use. Most teams should not adopt mesh.
Alternatives
Beyond mesh, simpler patterns cover many use cases. Network policies plus cert-manager handle security; service discovery handles discovery; the discipline matches needs.
- Network policies plus cert-manager for security only.: When security is the only mesh-justifying need, simpler tools cover it. Network policies for isolation; cert-manager for certificates; the discipline is bounded.
- Lighter.: The simpler tools have less operational overhead. The team's capacity is preserved; the discipline produces working security without mesh's weight.
- Covers many use cases.: Most teams' needs are covered by the simpler tools. The discipline is recognizing when simpler is enough; the team's resources go to other priorities.
- Migrate to mesh later if needed.: Teams that start without mesh can adopt it later. The discipline accommodates growth; the path is open.
- Document the decision.: The team's choice is documented. New engineers understand the rationale; future re-evaluation has context.
Service mesh when and when not is one of those architectural decisions that benefits from clear thinking. Nova AI Ops integrates with cluster networking, surfaces patterns, and supports the team's mesh-or-not decision.