VPC Flow Log Anomaly Detection
VPC flow logs reveal security events. The detection patterns that surface the meaningful anomalies.
Patterns
VPC flow logs record metadata about every network connection in your VPC: source, destination, port, protocol, packets, bytes, action. The volume is high; the value is in the patterns. Anomaly detection on flow logs surfaces the connections that should not be happening: data exfiltration, lateral movement, command-and-control beacons.
What patterns matter most:
- Outbound to new external IPs.: A workload that suddenly starts connecting to external IPs it has never connected to before is a strong signal. The signal could be compromise (the attacker is calling home), a configuration change, or a new feature. The investigation determines which.
- Sudden traffic spike between unusual service pairs.: Service A talking to Service B is normal; Service A talking to Service C for the first time is unusual. The graph of normal communication paths is well-defined; deviations from the graph deserve attention.
- Failed connection attempts at unusual rate.: A spike in REJECTED actions in flow logs indicates either probing (someone scanning) or misconfiguration (a service trying to reach what it cannot). Both are worth investigating; one is a security signal, the other is operational debt.
- Egress to known-bad IPs.: Connections to IP addresses on threat intel feeds (botnet C2 servers, known malicious infrastructure) are immediate red flags. The match is high-confidence; the response is fast.
- Unusual data volumes.: A workload that normally egresses 10 MB per hour and suddenly egresses 10 GB has either a feature change or a data exfiltration event. The volume change is the signal; the investigation is the response.
The patterns are well-known; the discipline is in detecting them at scale across a high-volume log stream.
Tools
The tools for flow log anomaly detection range from AWS-native managed services to custom-built SIEM pipelines. The right choice depends on team size, security maturity, and existing tooling investments.
- GuardDuty for AWS-native detection.: AWS GuardDuty consumes flow logs, CloudTrail, and DNS logs to produce findings. The findings are pre-built; the team does not have to write detection rules. The trade-off is detection breadth versus customization.
- Custom SIEM ingesting flow logs.: Mature security teams pipe flow logs into a SIEM (Splunk, Elastic, Sumo Logic) and write custom detection rules. The custom approach catches organization-specific patterns that GuardDuty does not know about. The cost is the SIEM and the rule-engineering effort.
- Cloud-native equivalents.: GCP has VPC Flow Logs and Security Command Center; Azure has NSG flow logs and Sentinel. Each cloud has its own native option; the choice usually follows the existing security tooling.
- Open-source detection rules.: Sigma rules and similar community-shared detection patterns provide a starting point for custom detection. The rules adapt to local context; the maintenance is shared with the community.
- Hybrid approaches.: Many teams run GuardDuty for the breadth and add custom detection on top for the specifics. The hybrid catches both the well-known patterns and the organization-specific ones.
The tooling decision is real; both managed and custom approaches have merit. The wrong answer is doing nothing because the choice feels overwhelming.
Act
Detection without response is wasted detection. The action layer is what turns flow log anomalies into security outcomes. The discipline is responding fast enough to matter without paging on noise.
- Alert sec ops on confirmed patterns.: Confirmed anomalies (high-confidence detections) page security operations. The on-call investigates within minutes; the response timeline matches the threat speed. Faster response means less attacker dwell time.
- Investigate within minutes.: The investigation is structured. What workload is generating the traffic? What is the destination? Is this a known pattern? Is this a new feature deployment that explains the change? The investigation produces a verdict.
- False positives are real.: Anomaly detection produces false positives. New features look like anomalies until they become normal; legitimate operational changes trigger alerts; threat intel feeds have stale data. The team accepts the false positive load and tunes over time.
- Tune the detection.: Detection that fires constantly produces alert fatigue. Detection that never fires produces complacency. The team tunes thresholds, suppression rules, and confidence levels based on operational experience. The tuning is continuous.
- Document the response.: Each investigation is documented. Was it real? What was the response? What did the team learn? The accumulated documentation makes future investigations faster.
VPC flow log anomaly detection is one of the highest-leverage security disciplines for cloud environments. Nova AI Ops integrates with flow log feeds, surfaces anomalous patterns, and produces the structured investigation queue that security operations teams work from.