Back to glossary
GLOSSARY · T

Toil

The manual, repetitive, automatable work that scales linearly with service growth, the operational debt SRE was invented to eliminate.

Definition

Toil, in Google's SRE definition, is operational work that is manual, repetitive, automatable, tactical, devoid of enduring value, and that scales linearly as the service grows. Examples: manually rotating IAM keys monthly, hand-editing a runbook every time a service is added, paging the on-call to scale replicas every Black Friday. Google's SRE book recommends keeping toil under 50% of an SRE's time; teams above that threshold cannot keep up with growth and burn out.

Why it matters

Every hour an SRE spends on toil is an hour not spent on work that compounds, automation, postmortem fixes, capacity planning. A toil budget of 50% sounds generous; in practice most teams are closer to 80% and shrinking the share by even 10 points unlocks dramatically more reliability investment per quarter.

How Nova handles it

See the part of the platform that handles toil in production.

Nova autonomous remediation