kubectl Cheats for Incident Response

20 kubectl one-liners for incident response. Each with a real use case and what it catches.

Describe and get

describe is the first investigation tool because it surfaces events that get quietly hides. CrashLoopBackOff, ImagePullBackOff, FailedScheduling: the explanation is in the event stream, not the pod status. Reach for describe before logs, before exec, before anything else.

kubectl describe pod $p -n prod. Full event log plus current state per pod. Reveals image-pull and scheduling failures the status field omits.
Describe-first order on CrashLoop or Pending. Diagnostic order matters. Skipping describe means missing the FailedScheduling event five minutes longer.
kubectl get events -n prod --sort-by=.lastTimestamp. Cluster-level event stream sorted newest-last. Catches signals across pods that per-pod describe misses.
Structured -o yaml dump. Full spec and status as YAML. Captures the snapshot for postmortem reconstruction.

Logs

Logs is the daily debugging surface and the four switches below cover most incident-response use. Follow, scope by time, read the previous container after a crash, aggregate across replicas.

kubectl logs -f deployment/api -n prod. Live tail across the deployment's pods. Standard incident view.
--previous. Prior-container logs after a crash. Captures the moment-of-crash output that the live container has already lost.
--since=1h. Time-bounded scope per deployment. Cuts noise during long incidents.
kubectl logs -l app=api --tail=100. Multi-pod aggregate by label. Supports multi-replica investigation in one command.

Exec carefully

exec is the last-resort tool. Useful when nothing else gives a clear answer, but it leaves no audit trail by default and any side effects are invisible to the next responder. Use sparingly and document every invocation.

kubectl exec -it pod/$p -n prod -- /bin/sh. Interactive shell into the pod. Investigation of last resort.
Exec-cost rule per incident. Production exec leaves no audit trail without extra setup. Default-deny in your head before reaching for it.
Document every exec. Capture "I ran X for reason Y" in the incident channel. Compliance posture and future investigation both need the breadcrumb.
Cluster-level audit-policy hook. Kubectl audit logging at the cluster level. Catches "who exec'd into prod when" after the fact even if the operator forgot to document.