Reliability Engineering

Every container, every dependency,
one graph that updates as you deploy

Container Graph is the live dependency map of your container fleet. Every pod, every service, every call edge. Use it to see what would break if you drained a node, which pods will be impacted by an upcoming maintenance, and which services have no redundancy. The graph updates as deploys happen, no manual edges to maintain.

Get Started Talk to Sales

app.novaaiops.com / container-graph

● LIVE

Cluster · prod-us-east

pods

412

services

edges

1.4k

single-replica

nodes28 ec2 (m5.2xl)

top fan-out servicepayments-api (54 callers)

most replicatedcart-api (24 replicas)

Auto-Discovery

No manual edges

Container Graph discovers edges automatically. Two sources: eBPF probes on the nodes (sees every TCP flow regardless of mesh), and service-mesh telemetry (Linkerd, Istio, Consul) when present. Both sources reconcile so an edge is only present if at least one source confirms it. New deploys show up within 30 seconds.

✓
eBPF + service mesh: two independent sources that reconcile; an edge is real only if at least one observes it
✓
30-second propagation: new pods and new edges appear in the graph within 30s of the first traffic
✓
No manual config: no annotations, no IaC declarations, discovery is observational

app.novaaiops.com / container-graph · discovery

Discovery · sources

ebpf probes28 nodes

linkerdconnected

edges (ebpf only)142

edges (mesh only)28

edges (both)1,210

What-If Drain

Simulate a node drain before you click drain

Hover any node and the graph highlights every pod that would have to reschedule. Hover any pod and the graph highlights every service that would lose a replica. Use it before maintenance: see what is about to break, decide whether you have enough headroom, then drain.

✓
Hover-to-highlight: hover a node → highlights affected pods; hover a pod → highlights services that lose replicas
✓
Replica count overlay: each service shows current replicas / minimum required so headroom is visible
✓
No-redundancy badge: services with one replica get a red badge so you do not drain their host without thinking

app.novaaiops.com / container-graph · drain

What-if · drain ip-10-0-2-14

pods to reschedule14

services losing 1 replica9 (all redundant)

services losing only replica1 (rare-job-runner)

recommendationscale rare-job-runner to 2 first

Service Health Overlay

Color the graph by SLO compliance

Toggle the service-health overlay and every node colors by its SLO compliance: green for healthy, yellow for fast-burning, red for over budget. The graph becomes a map of where reliability work is concentrated. Useful in weekly reviews to see whether the unhealthy services are clustered (one team) or scattered (platform-wide).

✓
SLO color per node: pulls live from Service Health Matrix; same color encoding everywhere
✓
Cluster pattern detection: when unhealthy nodes cluster around one service, that is a likely root cause
✓
Alternate overlays: cost per service, traffic share, replica count, same graph, different colors

app.novaaiops.com / container-graph · overlay

Overlay · SLO compliance

green88 services

yellow6 (payments + 5 downstream)

red2 (fulfillment, analytics)

clusterpayments-graph cluster

Time Travel

Replay the graph at the time of any incident

For postmortems, replay the graph at the moment of an incident. Pods that were running, edges that were active, the service-health overlay at that minute. The replay is sourced from the same eBPF/mesh data so it shows what was actually happening, not a reconstruction.

✓
Per-minute replay: rewind to any minute in the last 30 days; longer windows on request
✓
True state: replay reads the same observation source, so no reconstruction artifacts
✓
Cite from postmortems: every incident page links to the graph replay for its open timestamp

app.novaaiops.com / container-graph · replay

Replay · 14:22 inc-4821

pods running408

edges active1.4k

health overlaypayments yellow, cart turning

cite frompostmortem · inc-4821

Video walkthrough coming soon

Subscribe to Nova AI Ops on YouTube for demos, tutorials, and feature deep-dives.

Stop discovering dependencies during incidents

The graph is the answer to "what depends on this?" before you take it down, not after.

Get Started Request a Demo

Every container, every dependency,one graph that updates as you deploy