Analyze network dependencies

This page describes how to generate and analyze the network dependencies report in Migration Center.

Overview

The network dependencies report provides daily aggregated data about the connections to your servers and databases. The network dependencies report lets you see all the connections to the assets in your infrastructure, and the number of connections per day.

To collect the network dependencies data, you let the discovery client run for several days and enable syncing the data with Migration Center. The discovery client then identifies all the network connections from the scanned assets. The target assets in the connection can be any asset in your Migration Center inventory that you discovered with the discovery client or that you manually imported, or even an unknown asset.

The network dependencies report is useful in the following scenarios:

Collecting data about connections to servers and databases, to identify assets that belong to the same application
Identifying network connections of interest within a group of assets, such as all the servers using the MySQL standard port
Identifying missing assets in your inventory

You can download the network dependencies report as a CSV file from Migration Center. You can then perform your analysis using BigQuery and the sample queries provided by Migration Center, or use any other third-party tool.

Limitations

To collect connection data in your infrastructure, use the discovery client.
Network connections data is collected only with the OS scan method only. The vSphere scan doesn't support network data collection.

Before you begin

Before you create a network dependencies report, you must have performance collection working with the discovery client.
Before you analyze the network dependencies report with BigQuery, do the following:
- Learn how to import local data to BigQuery.
- Learn how to run queries.

Generate the network dependencies report

To generate a network dependencies report, follow these steps:

In the Google Cloud console, go to the Create reports page.

Go to Create reports
Click Network dependencies exports.
From the list of groups, select the groups for which you want to generate the report, then click Export.
In the dialog that appears, select the number of days for which you want to export the data, from a minimum of 10 and up to 90, then click Export.
After your file is generated, click Download.

Analyze the network dependencies report in BigQuery

The following sections provide you with some sample queries to analyze common scenarios in BigQuery. Before you can run a query, you must upload your CSV file to BigQuery.

To use BigQuery, you are billed according to the BigQuery pricing.

Identify assets with most connections

The following query is useful to identify the assets that have the largest number of connections in the group.

SELECT
 LocalVMName, SUM(ConnectionCount) as TotalCount
FROM
 PROJECT.DATASET.TABLE
GROUP BY ALL
ORDER BY TotalCount DESC

Replace the following:

PROJECT: The Google Cloud project where you uploaded the CSV file.
DATASET: The BigQuery dataset.
TABLE: The BigQuery table.

The following is a sample output from this query:

LocalVMName	TotalCount
VM-x5ua3o2w	9970
VM-glg5np3w	9763
VM-q3z4zfp8	9557
VM-2nnsrt37	9372
VM-1oah56hn	9350

Identify connections by graph's depth

The following query is useful to identify all the assets that connect to a given one with a specific number of intermediate connections. For example:

With graph depth equal to 1, you find all the assets directly connected to the main asset.
With graph depth equal to 2, you find all the assets directly connected to other assets, which are in turn directly connected to the main asset.

DECLARE
 local_vm_name STRING DEFAULT MAIN_ASSET;
DECLARE
 depth INT64 DEFAULT DEPTH;
CREATE TEMP FUNCTION
 recursiveConnections(localVmName STRING,
   connectionsArray ARRAY<STRING>,
   depth INT64)
 RETURNS STRING
 LANGUAGE js AS r"""
 const connections = connectionsArray.map(connection => connection.split('|||'))
   .filter(connectionTuple => connectionTuple[1] !== 'Unscanned Device');
 const connectedAssets = new Set([localVmName]);
 for (let i = 0; i < depth; i++) {
   const currentSet = new Set(connectedAssets);
   for (const connection of connections) {
     /* Look for connections where the asset is the local asset */
     if (currentSet.has(connection[0])) {
       connectedAssets.add(connection[1]);
     }
     /* Look for connections where the asset is the remote asset */
     if (currentSet.has(connection[1])) {
       connectedAssets.add(connection[0]);
     }
   }
 }
 connectedAssets.delete(localVmName);
 return Array.from(connectedAssets).sort().join(', ');
""";
SELECT
 local_vm_name AS LocalVMName,
 recursiveConnections(local_vm_name,
   ARRAY_AGG(CONCAT(LocalVMName, '|||', RemoteVMName)),
   depth) AS Connections
FROM
 PROJECT.DATASET.TABLE

Replace the following:

MAIN_ASSET: The name of the asset for which you want to identify the connections.
DEPTH: The depth of the graph.

The following is a sample output from this query:

LocalVMName	Connections
VM-lv8s148f	VM-2z8wp3ey, VM-66rq2x2y, VM-94uwyy8h, VM-ccgmqqmb, VM-ctqddf0u, VM-og4n77lb, ...

Filter connections by IP and port ranges

The following query lets you identify assets that use IP addresses and ports in ranges that you define.

CREATE TEMP FUNCTION
 ipBetween(value STRING,
   low STRING,
   high STRING) AS ( NET.IPV4_TO_INT64(NET.IP_FROM_STRING(value)) BETWEEN NET.IPV4_TO_INT64(NET.IP_FROM_STRING(low))
   AND NET.IPV4_TO_INT64(NET.IP_FROM_STRING(high)) );
SELECT
 *
FROM
 PROJECT.DATASET.TABLE
WHERE
 ((LocalPort BETWEEN PORT_START
     AND PORT_END)
   OR (RemotePort BETWEEN PORT_START
     AND PORT_END))
 AND (ipBetween(LocalIP,
     IP_START,
     IP_END)
   OR ipBetween(RemoteIP,
     IP_START,
     IP_END))

Replace the following:

PORT_START: The initial port of the port range, for example 0.
PORT_END: The final port of the port range, for example 1024.
IP_START: The initial IP address of the range, for example "10.26.0.0".
IP_END: The final IP address of the range, for example "10.26.255.255".

The following is a sample output from this query:

Day	LocalVMName	LocalAssetID	LocalGroups	LocalIP	LocalPort	Protocol	LocalProcessName	RemoteVMName	RemoteAssetID	RemoteIP	RemotePort	ConnectionCount
2024-04-18	VM-0lf60off	projects/982941055174/locations/us-central1/assets/0lf60off	Group 1	10.0.45.138	272	tcp	bash	VM-0spdofr9	projects/982941055174/locations/us-central1/assets/0spdofr9	144.35.88.1	272	499
2024-04-18	VM-goa5uxhi	projects/982941055174/locations/us-central1/assets/goa5uxhi	Group 3	10.187.175.82	781	tcp	bash	VM-27i5d2uj	projects/982941055174/locations/us-central1/assets/27i5d2uj	22.99.72.109	781	980
2024-04-19	VM-7vwy31hg	projects/982941055174/locations/us-central1/assets/7vwy31hg	Group 1	10.58.166.132	21	tcp	bash	VM-2gq0fl37	projects/982941055174/locations/us-central1/assets/2gq0fl37	147.19.84.135	21	514

Identify unscanned assets in the network

The following query lets you identify any unscanned asset in your network. An unscanned asset is a connection to a remote IP address that is not associated to any asset in your Migration Center inventory. This lets you identify potentially missing assets that you need to scan for your assessment.

CREATE TEMP FUNCTION
 ipBetween(value STRING,
   low STRING,
   high STRING) AS ( NET.IPV4_TO_INT64(NET.IP_FROM_STRING(value)) BETWEEN NET.IPV4_TO_INT64(NET.IP_FROM_STRING(low))
   AND NET.IPV4_TO_INT64(NET.IP_FROM_STRING(high)) );
SELECT
 STRING_AGG(LocalIP, ', ') AS LocalIPs,
 RemoteIP
FROM
 PROJECT.DATASET.TABLE
WHERE
 RemoteVMName = 'Unscanned Device'
 AND ipBetween(LocalIP,
   IP_START,
   IP_END)
 AND ipBetween(RemoteIP,
   IP_START,
   IP_END)
GROUP BY
 RemoteIP

Replace the following:

IP_START: The initial IP address of the range, for example "10.26.0.0".
IP_END: The final IP address of the range, for example "10.26.255.255".