kubectl Debug Cheat Sheet
Every kubectl command an on-call reaches for under a 3am page. Grouped by use case so you don't have to think, just type.
Triage in 30 seconds
Start wide, then narrow. The first three commands tell you whether the cluster is actually broken or just one pod misbehaving.
kubectl get pods -A --field-selector=status.phase!=Running, every non-Running pod across all namespaceskubectl get events -A --sort-by=.lastTimestamp | tail -40, last 40 events cluster-wide, freshest at the bottomkubectl get nodes -o wide, node Ready state, kubelet version, IPskubectl describe pod <pod> -n <ns>, events, conditions, container restart countkubectl get pod <pod> -n <ns> -o yaml, full spec when describe isn't enoughkubectl rollout status deploy/<name> -n <ns>, is the rollout actually finished?kubectl rollout history deploy/<name> -n <ns>, revisions you can roll back tokubectl rollout undo deploy/<name> -n <ns>, instant rollback to previous revision
Logs
Most incidents resolve in this section. Tail the right container, the right time window, and you're 80% done.
kubectl logs <pod> -n <ns>, current container logskubectl logs <pod> -n <ns> -c <container>, specific container in a multi-container podkubectl logs <pod> -n <ns> -p, previous container's logs (after a crash/restart)kubectl logs <pod> -n <ns> -f --tail=200, follow with last 200 lines as contextkubectl logs <pod> -n <ns> --since=15m, last 15 minutes onlykubectl logs -l app=api -n <ns> --max-log-requests=10 --tail=100, fan out across all matching podskubectl logs deploy/<name> -n <ns>, picks one ready pod from the deployment
Exec and debug
When logs don't tell you enough, get inside. kubectl debug is criminally underused.
kubectl exec -it <pod> -n <ns> -- sh, shell in (trybashfirst, fall back tosh)kubectl exec <pod> -n <ns> -- env, print environment without an interactive shellkubectl exec <pod> -n <ns> -- cat /etc/resolv.conf, DNS sanity checkkubectl debug -it <pod> -n <ns> --image=nicolaka/netshoot --target=<container>, ephemeral debug container sharing the target's process namespace; works on distrolesskubectl debug node/<node> -it --image=nicolaka/netshoot, root shell on a node via debug podkubectl cp <ns>/<pod>:/path/file ./file, pull a file out for offline analysis
Networking
Half of "the app is down" pages are actually networking. Verify the service routes, the endpoints exist, and DNS resolves before you blame the app.
kubectl get svc -A, services, ClusterIPs, portskubectl get endpoints <svc> -n <ns>, empty endpoints means selector mismatchkubectl describe svc <svc> -n <ns>, selector, ports, session affinitykubectl port-forward svc/<svc> 8080:80 -n <ns>, bypass ingress to test the service directlykubectl port-forward pod/<pod> 8080:8080 -n <ns>, bypass the service to test the pod directlykubectl get netpol -A, NetworkPolicies that might be blocking traffickubectl get ingress -A, ingress rules and host mappings
Resource pressure
OOMKills, throttling, evictions. Look here when pods restart on their own or latency spikes for no reason.
kubectl top pods -A --sort-by=memory, heaviest memory consumerskubectl top pods -A --sort-by=cpu, heaviest CPU consumerskubectl top nodes, node-level pressurekubectl describe node <node>, Allocatable, Allocated, conditions, taintskubectl get pods -A -o jsonpath='{range .items[*]}{.metadata.namespace}{"/"}{.metadata.name}{"\t"}{.status.containerStatuses[*].restartCount}{"\n"}{end}' | sort -k2 -nr | head, restart leaderboardkubectl get events -A --field-selector reason=OOMKilling, recent OOMKillskubectl get events -A --field-selector reason=Evicted, recent evictions
Flags worth memorising
The flags that turn a 5-command investigation into a 1-command answer.
-o wide, node, IP, and image columns added toget-o yaml/-o json, full object for grep/jq-o jsonpath='{.spec.containers[*].image}', surgical field extraction--watch/-w, stream changes in place--field-selector, server-side filter (faster thangrep)-l key=value, label selector; chain with commas for AND--context/--namespace, set explicitly when juggling clusters;kubens+kubectxare worth installing--dry-run=client -o yaml, generate manifests without applying