Sensitive data discovery for Vertex AI

This page describes Sensitive Data Protection discovery for use with Vertex AI.

Sensitive Data Protection discovery helps you learn about the training data included in your Vertex AI datasets. Discovery generates data profiles that provide insights like the information types (infoTypes) detected and the sensitivity level of your training data.

To join this Preview, send an email to cloud-dlp-feedback@google.com.

Benefits

This feature offers the following benefits:

  • You can monitor your Vertex AI datasets—at the organization, folder, or project level—for sensitive data, and report on the results.
  • You can send discovery results to Security Command Center so that your workloads with potentially sensitive data are taken into account when you evaluate your organization's security posture.
  • If discovery detects sensitive training data, you can use the data profiles to identify which resources need to be further investigated. You can perform a deep inspection and find all sensitive instances in a resource.

Supported data source

This feature can profile the following types of training data referenced in your Vertex AI datasets:

  • Training data in Cloud Storage buckets. For information about the supported file types, see File clusters.
  • Training data in BigQuery tables.

How it works

When you profile a Vertex AI dataset, Sensitive Data Protection generates a file store data profile or table data profile, depending on where the training data is stored: a Cloud Storage bucket or a BigQuery table. A data profile provides insights and metadata about the training data associated with your dataset. For each Vertex AI dataset, the generated data profile includes the following information.

  • The sensitivity and data risk levels of the training data
  • The types of sensitive information found in the training data—for example, driver's license IDs and email addresses

For a full list of insights and metadata in each file store data profile, see File store data profiles.

For a full list of insights and metadata in each table data profile, see Table data profiles.

For more information about the discovery service, see Data profiles.

Pricing

When you profile Vertex AI data, you incur Sensitive Data Protection charges according to your chosen discovery pricing mode.

In addition, if your training data is in a Cloud Storage bucket, then Cloud Storage charges you for requests that Sensitive Data Protection makes to profile the training data. The following sections describe the associated Cloud Storage charges that you incur.

Class B operations

You are charged for the Class B (storage.buckets.get) operations that Sensitive Data Protection performs in the process of profiling training data in your Cloud Storage buckets.

For information about how much Cloud Storage charges for Class B operations, see Operation charges in the Cloud Storage documentation.

Retrieval fees

For objects that have a non-Standard storage class, you are charged for retrieval fees. For information about how much Cloud Storage charges for data retrieval, see Retrieval fees in the Cloud Storage documentation.

What's next