DataDiscoverySpec

Spec for a data discovery scan.

JSON representation
{
  "bigqueryPublishingConfig": {
    object (BigQueryPublishingConfig)
  },

  // Union field resource_config can be only one of the following:
  "storageConfig": {
    object (StorageConfig)
  }
  // End of list of possible types for union field resource_config.
}
Fields
bigqueryPublishingConfig

object (BigQueryPublishingConfig)

Optional. Configuration for metadata publishing.

Union field resource_config. The configurations of the data discovery scan resource. resource_config can be only one of the following:
storageConfig

object (StorageConfig)

Cloud Storage related configurations.

BigQueryPublishingConfig

Describes BigQuery publishing configurations.

JSON representation
{
  "tableType": enum (TableType),
  "connection": string,
  "location": string
}
Fields
tableType

enum (TableType)

Optional. Determines whether to publish discovered tables as BigLake external tables or non-BigLake external tables.

connection

string

Optional. The BigQuery connection used to create BigLake tables. Must be in the form projects/{projectId}/locations/{locationId}/connections/{connection_id}

location

string

Optional. The location of the BigQuery dataset to publish BigLake external or non-BigLake external tables to. 1. If the Cloud Storage bucket is located in a multi-region bucket, then BigQuery dataset can be in the same multi-region bucket or any single region that is included in the same multi-region bucket. The datascan can be created in any single region that is included in the same multi-region bucket 2. If the Cloud Storage bucket is located in a dual-region bucket, then BigQuery dataset can be located in regions that are included in the dual-region bucket, or in a multi-region that includes the dual-region. The datascan can be created in any single region that is included in the same dual-region bucket. 3. If the Cloud Storage bucket is located in a single region, then BigQuery dataset can be in the same single region or any multi-region bucket that includes the same single region. The datascan will be created in the same single region as the bucket. 4. If the BigQuery dataset is in single region, it must be in the same single region as the datascan.

For supported values, refer to https://cloud.google.com/bigquery/docs/locations#supportedLocations.

TableType

Determines how discovered tables are published.

Enums
TABLE_TYPE_UNSPECIFIED Table type unspecified.
EXTERNAL Default. Discovered tables are published as BigQuery external tables whose data is accessed using the credentials of the user querying the table.
BIGLAKE Discovered tables are published as BigLake external tables whose data is accessed using the credentials of the associated BigQuery connection.

StorageConfig

Configurations related to Cloud Storage as the data source.

JSON representation
{
  "includePatterns": [
    string
  ],
  "excludePatterns": [
    string
  ],
  "csvOptions": {
    object (CsvOptions)
  },
  "jsonOptions": {
    object (JsonOptions)
  }
}
Fields
includePatterns[]

string

Optional. Defines the data to include during discovery when only a subset of the data should be considered. Provide a list of patterns that identify the data to include. For Cloud Storage bucket assets, these patterns are interpreted as glob patterns used to match object names. For BigQuery dataset assets, these patterns are interpreted as patterns to match table names.

excludePatterns[]

string

Optional. Defines the data to exclude during discovery. Provide a list of patterns that identify the data to exclude. For Cloud Storage bucket assets, these patterns are interpreted as glob patterns used to match object names. For BigQuery dataset assets, these patterns are interpreted as patterns to match table names.

csvOptions

object (CsvOptions)

Optional. Configuration for CSV data.

jsonOptions

object (JsonOptions)

Optional. Configuration for JSON data.

CsvOptions

Describes CSV and similar semi-structured data formats.

JSON representation
{
  "headerRows": integer,
  "delimiter": string,
  "encoding": string,
  "typeInferenceDisabled": boolean,
  "quote": string
}
Fields
headerRows

integer

Optional. The number of rows to interpret as header rows that should be skipped when reading data rows.

delimiter

string

Optional. The delimiter that is used to separate values. The default is , (comma).

encoding

string

Optional. The character encoding of the data. The default is UTF-8.

typeInferenceDisabled

boolean

Optional. Whether to disable the inference of data types for CSV data. If true, all columns are registered as strings.

quote

string

Optional. The character used to quote column values. Accepts " (double quotation mark) or ' (single quotation mark). If unspecified, defaults to " (double quotation mark).

JsonOptions

Describes JSON data format.

JSON representation
{
  "encoding": string,
  "typeInferenceDisabled": boolean
}
Fields
encoding

string

Optional. The character encoding of the data. The default is UTF-8.

typeInferenceDisabled

boolean

Optional. Whether to disable the inference of data types for JSON data. If true, all columns are registered as their primitive types (strings, number, or boolean).