SLOs for Async and Batch Workloads

Most SLO advice assumes request-response. Async + batch need different SLI shapes; the patterns are well-known but rarely written down.

Why request-SLOs do not fit

Async services process messages eventually; batch jobs run on a schedule. "Request success rate" does not describe either; the question for async and batch is "did the work get done in time?"

Four async/batch SLI shapes

Examples per shape

Each SLI shape maps cleanly to a workload pattern. The mapping makes SLO definition mechanical; pick the workload, pick the matching shape.

Combining shapes per service

Most async services need 2-3 SLI types simultaneously. Latency alone misses the actual user-visible failures; combining freshness, completeness, and correctness covers the workload shape.

Antipatterns

What to do this week

Three moves. (1) Apply the pattern to your most-impactful service. (2) Measure adherence for 30 days. (3) Rewrite the policy or the SLO if the gap is durable.