Auto-Discovering Cloud Topology from Flow Data

Cloud infrastructure has a documentation problem. Not because cloud teams don't try to maintain it, but because the fundamental economics of cloud make accurate documentation structurally impossible.

In an on-premises data center, topology changes are slow, deliberate, and require physical work. Infrastructure is provisioned in weeks. The change velocity is low enough that documentation has a fighting chance of keeping up.

In a cloud environment, infrastructure is provisioned in minutes. Auto-scaling groups spin up and tear down instances dynamically. Blue-green deployments swap entire application tiers. Feature flags activate new services on a schedule. By the time a topology diagram is drawn and reviewed, the environment it depicts has changed three times.

The answer isn't better documentation — it's topology discovery that derives the actual state from network behavior, continuously.

What Topology Discovery from Flow Data Means

Flow-driven topology discovery works by analyzing the communication patterns recorded in flow data to infer the structure and relationships of the network:

What nodes exist: Every IP address that appears in flow records as a source or destination is a node in the topology.
How nodes are organized: Shared network behavior, subnet membership, VPC/VNet context, and tagging data create groupings that map to architectural tiers.
What relationships exist: Flow pairs between nodes define the edges of the topology graph. Frequent, high-volume connections represent primary relationships; infrequent connections represent secondary dependencies.
What's changed: Comparing topology snapshots over time reveals new nodes, decommissioned nodes, and relationship changes.

This topology is observed, not documented. It reflects the actual operational state of the environment at the time the analysis runs.

The Cloud-Specific Challenges

Cloud environments introduce several challenges that make topology discovery more complex than in on-premises networks:

Ephemeral IP Addresses

Cloud instances frequently receive new IP addresses when they're stopped and restarted, or when auto-scaling groups replace instances. A topology built purely on IP addresses will fragment the view of workloads that are conceptually stable but operationally ephemeral.

The solution is to anchor topology nodes to stable identifiers: cloud resource IDs (AWS instance IDs, Azure VM resource IDs), workload labels/tags, or DNS names for load-balanced services. Flow records enriched with this metadata produce a topology that's stable even when the underlying IPs change.

# Azure: List VMs with their private IPs for enrichment
az vm list-ip-addresses \
  --output table \
  --query "[].{Name:virtualMachine.name, IP:virtualMachine.network.privateIpAddresses[0]}"

NAT and Private Endpoint Complexity

Cloud-to-on-premises traffic and cross-region traffic often traverses NAT devices, load balancers, and private endpoints. A flow record that shows a connection from on-premises to an Azure Private Endpoint IP isn't directly revealing which backend service the connection is reaching — the mapping requires additional context from the cloud provider's resource metadata.

Effective topology discovery resolves these translations: private endpoint IPs are mapped to their backing services, load balancer IPs are mapped to their target pools, and NAT translations are tracked so that the topology graph shows logical service relationships rather than network infrastructure hops.

Multi-Account and Multi-VNet Architecture

Large cloud deployments often span multiple accounts (AWS) or subscriptions (Azure), with traffic routing through transit VPCs/VNets or Azure Virtual WAN. Flow telemetry is collected separately in each account or subscription, and correlating topology across these administrative boundaries requires aggregating telemetry from multiple sources.

The topology discovery system needs to ingest from all relevant accounts and subscriptions, normalize the data into a single schema, and correlate cross-boundary flows even when they appear in separate flow log streams.

What Discovered Topology Enables

Security Segmentation Validation

Once the topology is discovered, you can validate whether your intended segmentation is working as designed. Query the topology for communication paths that should be blocked: if they appear in the flow data, the segmentation has a gap. If they don't, the policy is effective.

This is particularly valuable in cloud environments where security groups and NSG rules can be complex and interdependent. The observed flow topology is the ground truth — it tells you what's actually happening, regardless of what the rules are supposed to be doing.

Anomaly Context

When an anomaly is detected in a cloud workload, the discovered topology provides the context needed to investigate it quickly. Rather than asking "what is this IP address?", the analyst can immediately see: what workload this host is part of, what it normally communicates with, what tier it belongs to, and when it was provisioned.

This context compression is significant in environments where IP addresses are meaningless identifiers and workload identity requires cross-referencing cloud provider metadata.

Change Tracking

By comparing topology snapshots over time, you can answer questions like:

What new workloads appeared in the last 24 hours?
What communication relationships are new vs. established?
Which workloads have been decommissioned and which are still generating traffic?

This change tracking is valuable for both security (detecting unexpected new assets or relationships) and operations (tracking deployment effects on the communication graph).

The Auto-Discovery Workflow

A practical auto-discovery system works as a continuous background process:

Collect flow telemetry from VPC flow logs, NSG flow logs, and on-premises exporters
Enrich with cloud metadata (instance tags, resource IDs, service mappings)
Build the topology graph from the enriched flow data
Compare to previous topology to identify changes
Surface significant changes as events for review

FlowSight runs this process continuously across both on-premises and cloud telemetry sources, maintaining a live topology graph that reflects the actual communication state of the environment. When new workloads appear — from auto-scaling, from deployment pipelines, or from unexpected sources — they surface automatically in the topology view, along with the relationships they've established.

In cloud environments where the infrastructure changes faster than anyone can document, continuous auto-discovery isn't an optimization. It's the only realistic path to knowing what you're operating.