Instrumenting Your SRE Agent: What to Log
Token usage. Tool calls. Decisions. Failures. The structured log schema that makes debugging tractable, with the field-by-field rationale.
The log schema
The log schema is fixed and enforced. Every step emits a structured log: timestamp, run_id, step_index, agent_role, action_type, action_args, tool_name, tool_args, model_name, tokens_in, tokens_out, latency_ms, cost_usd, status. The schema is enforced by a validator and logs that do not match are dropped or repaired; add fields conservatively because each field is a maintenance burden.
- 14-field structured log. Timestamp through status; the canonical schema.
- Validator-enforced. Logs that don’t match are dropped or repaired; schema does not bend.
- Conservative addition. Each field is maintenance burden; only add when a known consumer needs it.
- Per-field documented purpose. Every field has a documented consumer; supports continued discipline.
Cardinality discipline
Cardinality discipline keeps the bill in check. Run_id is high cardinality (one per run, fine); tool_name is low cardinality (a handful per agent, fine); action_args if logged raw can explode cardinality. Hash high-cardinality string fields when used as labels and keep the raw value in the body for debugging; watch the cardinality of the labels you use in dashboards because blown-up cardinality breaks the dashboard and your bill.
- Run_id and tool_name fine. High and low cardinality respectively; both natural.
- Action_args risk. Raw values explode cardinality; hash for labels.
- Hash for labels, raw for body. Hash supports query cardinality control; raw supports debugging.
- Per-dashboard cardinality watch. Blown cardinality breaks dashboard and bill.
Retention policy
Three retention tiers cover the lifecycle. Hot logs (last 7 days) carry full detail and are queryable from the dashboard; warm logs (7-90 days) are downsampled and sometimes anonymised, queryable but slower; cold logs (90+ days) are archived to object storage and used only for compliance audits or deep historical analysis.
- Hot: 0-7 days. Full detail; queryable from dashboard; the active surface.
- Warm: 7-90 days. Downsampled, sometimes anonymised; queryable but slower.
- Cold: 90+ days. Archived to object storage; compliance audits and deep history.
- Per-tier query cost. Hot fast and cheap; cold slow and expensive; matches usage patterns.
What to redact
Three categories of data deserve redaction. Customer identifiers (hash before logging); internal hostnames (hash if they identify private services); secrets (never log). Compliance frameworks (SOC2, HIPAA, PCI) constrain what can be logged so build the redaction layer to be pluggable because redactors are frequently updated; test redaction periodically by auditing logs for fields that should have been redacted.
- Customer IDs hashed. Before logging; the standard PII protection.
- Internal hostnames hashed. If they identify private services; protect topology.
- Secrets never logged. No exception; the absolute rule.
- Pluggable redactors. Compliance frameworks update; the layer must too.
Common queries you should optimise for
Three queries dominate agent debugging. “All steps in this run” needs index on run_id; “cost by agent role over the last day” needs aggregate index on agent_role plus timestamp; “tool call failures by tool name” needs index on tool_name plus status, which is the single most useful query for debugging agent behaviour.
- All steps in run. Index on run_id; the basic replay query.
- Cost by agent role over time. Aggregate index on agent_role + timestamp; the cost view.
- Tool failures by tool name. Index on tool_name + status; the most useful debugging query.
- Per-query index design. Each common query has a supporting index; supports performant ad-hoc analysis.