You can use Sensitive Data Protection to compute numerical and categorical numerical statistics for individual columns in BigQuery tables. Sensitive Data Protection can calculate the following:
- The column's minimum value
- The column's maximum value
- Quantile values for the column
- A histogram of value frequencies in the column
Compute numerical statistics
You can determine minimum, maximum, and quantile values for an individual
BigQuery column. To calculate these values, you configure a
DlpJob
,
setting the
NumericalStatsConfig
privacy metric to the name of the column to scan. When you run the
job,
Sensitive Data Protection computes statistics for the given column, returning
its results in the
NumericalStatsResult
object. Sensitive Data Protection can compute statistics for the following
number types:
- integer
- float
- date
- datetime
- timestamp
- time
The statistics that a scan run returns include the minimum value, the maximum value, and 99 quantile values that partition the set of field values into 100 equal sized buckets.
Code examples
Following is sample code in several languages that demonstrates how to use Sensitive Data Protection to calculate numerical statistics.
C#
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Go
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Java
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
PHP
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Python
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Compute categorical numerical statistics
You can compute categorical numerical statistics for the individual histogram buckets within a BigQuery column, including:
- Upper bound on value frequency within a given bucket
- Lower bound on value frequency within a given bucket
- Size of a given bucket
- A sample of value frequencies within a given bucket (maximum 20)
To calculate these values, you configure a
DlpJob
,
setting the
CategoricalStatsConfig
privacy metric to the name of the column to scan. When you run the
job,
Sensitive Data Protection computes statistics for the given column, returning
its results in the
CategoricalStatsResult
object.
Code examples
Following is sample code in several languages that demonstrates how to use Sensitive Data Protection to calculate categorical statistics.
C#
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Go
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Java
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
PHP
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Python
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.