Reliability Engineering

Network problems are service problems,
caught at the link layer first

Network Monitoring is the network-layer observability slice. Per-link bandwidth and packet loss, per-flow retransmits, per-VPC traffic split, per-DNS-zone failure rate. Network issues often manifest first as confusing service degradations; this page surfaces them as the network problems they are.

Get Started Talk to Sales
app.novaaiops.com / network-monitoring
● LIVE
4
Network signal types
eBPF
or flow-log source
Auto
correlate with services
< 60s
detection latency
Four Signal Types

Bandwidth, packet, DNS, NAT

Four primary signals: per-link bandwidth (and saturation), per-flow packet retransmits and packet loss, per-DNS-zone failure rates, per-NAT-gateway port-allocation saturation. Each signal is correlated with the services that use it so a network issue shows up as "payments-api is degraded because its NAT gateway is full."

  • Bandwidth + saturation: per-link and per-VPC bandwidth with saturation thresholds tied to alerts
  • Packet loss + retransmits: per-flow detection of TCP-level network distress before app errors appear
  • DNS failures: per-zone NXDOMAIN, SERVFAIL, slow-response detection
  • NAT saturation: per-NAT-gateway port allocation; saturation here breaks outbound everywhere
app.novaaiops.com / network-monitoring · signals
Service Correlation

Network signals tied to services

Every network signal is automatically tied to the services it affects. A packet-loss spike on a peer link surfaces on the affected services' incident pages with "network: 2.4% loss on this peer." The agent fleet sees the network signal alongside service signals so runbooks consider both.

  • Tied to services: network signals appear on service-specific views, not just on a generic network page
  • Visible in incidents: incident pages show network signals if relevant; agents read both layers
  • Cross-signal correlation: feeds into Cross-Signal Correlation as a first-class signal type
app.novaaiops.com / network-monitoring · correlation
eBPF or Flow Logs

Two source paths, same data

Two source options. eBPF: a kernel probe on each host captures every flow. Best fidelity. Flow logs: AWS / GCP / Azure flow log ingestion. Less precise but works without host agents. Pick one or both. Reconciliation when both are present catches gaps in either.

  • eBPF: kernel-level capture; full fidelity; requires host agent (already deployed)
  • Flow logs: cloud-native; no host agent; coarser granularity
  • Reconciled when both: gaps in one source surface as missing edges; catches collection failures
app.novaaiops.com / network-monitoring · sources
DNS-Specific View

A whole subtab for DNS

DNS gets its own subtab because DNS issues are specifically painful. Per-zone failure rate, per-resolver latency, recent NXDOMAIN spikes, recent SERVFAIL spikes. When DNS goes weird, this view tells you which zone, which resolver, and which downstream service is feeling it.

  • Per-zone failure rate: baseline and spikes per DNS zone; sub-minute resolution
  • Per-resolver latency: tracking p50/p95/p99 per resolver; useful for split-horizon issues
  • Downstream impact: each DNS issue lists the services that depend on the affected zone
app.novaaiops.com / network-monitoring · dns
Video walkthrough coming soon

Subscribe to Nova AI Ops on YouTube for demos, tutorials, and feature deep-dives.

Catch the network before the service

Network monitoring stops the "we spent two hours debugging the app, it was DNS" pattern.

Get Started Request a Demo