Measuring SLOs on Mobile and Edge
If your users are on mobile, your server-side SLO is a partial view. The client-side measurement is the rest.
Why server SLOs miss mobile
Server SLOs can show 100% success while every mobile user sees a frozen UI. The gap between server and user is full of failure modes a server-only metric cannot see.
- Server-side blindness. 100% server success while users stare at frozen UIs; the metric flatters reality.
- Network gap. Mobile networks drop, throttle, switch towers; the server never knows.
- Device gap. Old phones, slow CPUs, low memory; client-side rendering and JS execution dominate the experience.
- App gap. Crashes, OOM, render bugs; the server is healthy, the app is broken.
Four client-side patterns
- 1. Cold-start time on app launch.
- 2. Time-to-interactive on key screens.
- 3. Crash-free sessions.
- 4. API call success rate measured client-side.
RUM integration
Real-User Monitoring is the way to close the gap. Sentry, Datadog RUM, Honeycomb's mobile SDK collect from real devices; the data shape is different but the principle is the same.
- Sentry. Strong on crash and error reporting; mature mobile SDK across iOS, Android, web.
- Datadog RUM. Performance metrics, session replay, integrated with server-side telemetry.
- Honeycomb mobile SDK. Wide-event model fits mobile traces; high-cardinality query.
- Sample, never zero. Cost control via sampling; long-tail issues live in the long-tail data.
Budget allocation
Customer SLOs need explicit allocation between server-side and client-side. The split is opinionated; document it so the team does not argue at every breach.
- Standard split. 60% to server-side, 40% to client-side; tune by traffic mix.
- Owners per side. Server team owns server slice; mobile team owns client slice; cross-debugging is deliberate.
- Cross-team debugging. Some incidents span both; pre-positioned playbook for joint investigation.
- Annual review. Mobile traffic share grows; the split needs revisiting as the user mix shifts.
Antipatterns
- Server-only SLO for mobile-heavy products. Wrong picture.
- Client SLOs without sampling control. Cost.
- No correlation server-client failures. Cannot diagnose.
What to do this week
Three moves. (1) Apply the pattern to your most-impactful service. (2) Measure adherence for 30 days. (3) Rewrite the policy or the SLO if the gap is durable.