Flow-based network visibility has become one of the foundational capabilities of mature enterprise security programs. If you're evaluating it for the first time or trying to make the case internally, this guide covers the full scope: what flow data is, what it enables, how to deploy it, and how to get the most from it.
What Is Flow Data?
Flow data is a record of network conversations. Every time two systems communicate over a network, they exchange packets. A "flow" is the summary of that exchange: who talked to whom, over what protocol and ports, for how long, and how much data was transferred.
Routers and switches generate flow records automatically when flow export is configured. The device maintains a flow cache, tracking active conversations, and exports a summary record when each conversation ends. These records — typically a few hundred bytes each — are sent over UDP to a collector for storage and analysis.
A typical flow record contains:
- Source and destination IP addresses
- Source and destination ports
- IP protocol (TCP, UDP, ICMP, etc.)
- Byte and packet counts
- Flow start and end timestamps
- Input and output interface indexes
- TCP flags (SYN, ACK, FIN, RST)
- DSCP/QoS markings
This is not packet capture. Flow data doesn't contain the payload — it's the metadata of network conversations. This distinction matters for both privacy/compliance reasons and for scalability: flow data is roughly 1/1000th the volume of full packet data, making it practical to collect and retain at enterprise scale.
What Flow Data Enables
Network Topology Discovery
By analyzing which systems communicate with which, flow data reveals the actual topology of your network — including dependencies that aren't in any documentation. When a new host appears in the flow data talking to your authentication servers, you know about it. When an application starts using a database it didn't previously connect to, you see it.
Traffic Baselining and Anomaly Detection
Flow data provides the raw material for building behavioral baselines. By collecting flow records over time, you build a model of what "normal" looks like — which systems talk to which, at what volumes, on what schedules. Deviations from this baseline are the signal that anomaly detection systems use to identify threats.
Capacity Planning
Aggregate flow data gives you per-link, per-segment utilization histories. You can identify which circuits are approaching saturation, which segments have headroom, and which application classes are driving growth. This is essential input for capacity planning conversations.
Incident Response
During an incident, flow data answers the key questions: what other systems did the compromised host communicate with? What data volumes were involved? What was the timing of the lateral movement? Historical flow records — retained for 30, 60, or 90 days — allow incident responders to reconstruct the attack timeline even after the fact.
Segmentation Validation
After implementing or modifying segmentation policy, flow data provides the proof. Query the flow records for connections between segments that should be blocked: if they appear, the policy has a gap. If they don't, the segmentation is working as intended.
Architecture of a Flow-Based Visibility System
A complete flow-based visibility deployment has four components:
1. Flow Exporters
Flow exporters are the network devices that generate and send flow records. This includes:
- Routers and L3 switches: The primary source of flow data in most environments. Enable NetFlow v9 or IPFIX export toward your collector.
- Firewalls: Next-generation firewalls increasingly support flow export in addition to log-based telemetry.
- Virtual switches: VMware's vSphere Distributed Switch supports NetFlow export from virtual infrastructure.
- Cloud gateways: AWS VPC Flow Logs, Azure NSG Flow Logs, and GCP VPC Flow Logs are the cloud-native equivalents.
2. The Collector
The collector receives UDP flow export packets, parses the flow records, and writes them to storage. Key characteristics:
- Must handle peak flow export rates without dropping records
- Typically receives on UDP port 2055 (NetFlow/IPFIX) or 6343 (sFlow)
- Normalizes records from different exporter vendors into a consistent schema
3. Storage
Flow records need to be retained for enough historical depth to support both anomaly detection (requires weeks to months of history) and incident response (requires at least 90 days, often longer for compliance). Storage options range from purpose-built time-series databases to columnar stores optimized for flow queries.
4. Analysis and Visualization
The analysis layer applies structure to the flow data: topology mapping, baseline modeling, anomaly detection, and query interfaces for ad-hoc investigation. This is where the operational value of flow data is realized.
Deployment Checklist
Before deploying flow-based visibility, address these key decisions:
Coverage planning. Identify all flow exporters in scope. Map which segments are covered by each exporter and where gaps exist (e.g., intra-hypervisor traffic that never crosses a physical device).
Sampling vs. full-fidelity. Most routers support both sampled and full-fidelity flow export. For security and anomaly detection use cases, full-fidelity is strongly preferred. Sampling introduces statistical uncertainty that degrades baseline accuracy.
Collection infrastructure sizing. Flow export volume is roughly proportional to connection rate, not traffic volume. A heavily utilized link with a few large flows generates fewer flow records than a lightly utilized link with many small connections (e.g., a DNS server).
Retention policy. Define how long flow records are retained. Security investigations typically require 90 days minimum; compliance requirements may mandate longer.
Export timing. Flow cache timeout settings control how quickly flow records are exported after a conversation ends. Shorter timeouts (60 seconds active, 15 seconds inactive) provide more timely data but increase export volume.
Getting Started in Practice
For most environments, the quickest path to value is:
- Enable NetFlow v9 or IPFIX on your core/border routers
- Stand up a collector (FlowSight, or an open-source option for initial testing)
- Verify you're receiving records with a basic volume check
- Let the system run for 2 weeks to build initial baselines
- Enable anomaly detection after the warmup period
The time from "first flow record received" to "first anomaly detection" is typically under two weeks. With FlowSight, the collection, storage, baseline modeling, and anomaly detection pipeline is pre-integrated — there's no assembly required. Configure your exporters, point them at the collector, and the analysis pipeline starts automatically.
Common Deployment Mistakes
Under-sampling the flow exporter list. Organizations often start with border routers only and miss the majority of east-west traffic. Map your internal routing infrastructure and enable flow export on distribution and core switches.
Using sampled flow for security use cases. sFlow or sampled NetFlow with 1:1000 or 1:4096 ratios is appropriate for capacity planning but misses short flows and underestimates volumes for anomaly detection.
Insufficient retention. 7-day flow retention is common in capacity-planning deployments but insufficient for security use cases. Size storage for at least 90 days.
Not monitoring the collector itself. If the collector drops records, your baselines and historical data have gaps. Monitor collector health as part of your operational discipline.
Flow-based network visibility is one of the highest-leverage investments available to enterprise security programs. It covers the interior of the network that perimeter tools miss, it scales to enterprise environments without agent deployment, and it provides the historical record that incident response needs. The technology is mature, the deployment path is well-understood, and the operational benefits are measurable.