- 3.52.0 (latest)
- 3.50.0
- 3.49.0
- 3.48.0
- 3.47.0
- 3.46.0
- 3.45.0
- 3.44.0
- 3.43.0
- 3.42.0
- 3.41.0
- 3.40.0
- 3.38.0
- 3.37.0
- 3.36.0
- 3.35.0
- 3.34.0
- 3.33.0
- 3.32.0
- 3.31.0
- 3.30.0
- 3.29.0
- 3.28.0
- 3.25.0
- 3.24.0
- 3.23.0
- 3.22.0
- 3.21.0
- 3.20.0
- 3.19.0
- 3.18.0
- 3.17.0
- 3.16.0
- 3.15.0
- 3.14.0
- 3.13.0
- 3.12.0
- 3.11.0
- 3.10.0
- 3.9.0
- 3.8.0
- 3.7.0
- 3.6.0
- 3.5.0
- 3.4.2
- 3.3.0
- 3.2.0
- 3.0.0
- 2.9.8
- 2.8.9
- 2.7.4
- 2.5.3
- 2.4.0
public static final class InputDataConfig.Builder extends GeneratedMessageV3.Builder<InputDataConfig.Builder> implements InputDataConfigOrBuilder
Specifies Vertex AI owned input data to be used for training, and possibly evaluating, the Model.
Protobuf type google.cloud.aiplatform.v1beta1.InputDataConfig
Inheritance
Object > AbstractMessageLite.Builder<MessageType,BuilderType> > AbstractMessage.Builder<BuilderType> > GeneratedMessageV3.Builder > InputDataConfig.BuilderImplements
InputDataConfigOrBuilderStatic Methods
getDescriptor()
public static final Descriptors.Descriptor getDescriptor()
Type | Description |
Descriptor |
Methods
addRepeatedField(Descriptors.FieldDescriptor field, Object value)
public InputDataConfig.Builder addRepeatedField(Descriptors.FieldDescriptor field, Object value)
Name | Description |
field | FieldDescriptor |
value | Object |
Type | Description |
InputDataConfig.Builder |
build()
public InputDataConfig build()
Type | Description |
InputDataConfig |
buildPartial()
public InputDataConfig buildPartial()
Type | Description |
InputDataConfig |
clear()
public InputDataConfig.Builder clear()
Type | Description |
InputDataConfig.Builder |
clearAnnotationSchemaUri()
public InputDataConfig.Builder clearAnnotationSchemaUri()
Applicable only to custom training with Datasets that have DataItems and Annotations. Cloud Storage URI that points to a YAML file describing the annotation schema. The schema is defined as an OpenAPI 3.0.2 Schema Object. The schema files that can be used here are found in gs://google-cloud-aiplatform/schema/dataset/annotation/ , note that the chosen schema must be consistent with metadata of the Dataset specified by dataset_id. Only Annotations that both match this schema and belong to DataItems not ignored by the split method are used in respectively training, validation or test role, depending on the role of the DataItem they are on. When used in conjunction with annotations_filter, the Annotations used for training are filtered by both annotations_filter and annotation_schema_uri.
string annotation_schema_uri = 9;
Type | Description |
InputDataConfig.Builder | This builder for chaining. |
clearAnnotationsFilter()
public InputDataConfig.Builder clearAnnotationsFilter()
Applicable only to Datasets that have DataItems and Annotations. A filter on Annotations of the Dataset. Only Annotations that both match this filter and belong to DataItems not ignored by the split method are used in respectively training, validation or test role, depending on the role of the DataItem they are on (for the auto-assigned that role is decided by Vertex AI). A filter with same syntax as the one used in ListAnnotations may be used, but note here it filters across all Annotations of the Dataset, and not just within a single DataItem.
string annotations_filter = 6;
Type | Description |
InputDataConfig.Builder | This builder for chaining. |
clearBigqueryDestination()
public InputDataConfig.Builder clearBigqueryDestination()
Only applicable to custom training with tabular Dataset with BigQuery
source.
The BigQuery project location where the training data is to be written
to. In the given project a new dataset is created with name
dataset_<dataset-id><annotation-type><timestamp-of-training-call>
where timestamp is in YYYY_MM_DDThh_mm_ss_sssZ format. All training
input data is written into that dataset. In the dataset three
tables are created, training
, validation
and test
.
- AIP_DATA_FORMAT = "bigquery".
- AIP_TRAINING_DATA_URI = "bigquery_destination.dataset_<dataset-id><annotation-type><time>.training"
- AIP_VALIDATION_DATA_URI = "bigquery_destination.dataset_<dataset-id><annotation-type><time>.validation"
- AIP_TEST_DATA_URI = "bigquery_destination.dataset_<dataset-id><annotation-type><time>.test"
.google.cloud.aiplatform.v1beta1.BigQueryDestination bigquery_destination = 10;
Type | Description |
InputDataConfig.Builder |
clearDatasetId()
public InputDataConfig.Builder clearDatasetId()
Required. The ID of the Dataset in the same Project and Location which data will be used to train the Model. The Dataset must use schema compatible with Model being trained, and what is compatible should be described in the used TrainingPipeline's [training_task_definition] [google.cloud.aiplatform.v1beta1.TrainingPipeline.training_task_definition]. For tabular Datasets, all their data is exported to training, to pick and choose from.
string dataset_id = 1 [(.google.api.field_behavior) = REQUIRED];
Type | Description |
InputDataConfig.Builder | This builder for chaining. |
clearDestination()
public InputDataConfig.Builder clearDestination()
Type | Description |
InputDataConfig.Builder |
clearField(Descriptors.FieldDescriptor field)
public InputDataConfig.Builder clearField(Descriptors.FieldDescriptor field)
Name | Description |
field | FieldDescriptor |
Type | Description |
InputDataConfig.Builder |
clearFilterSplit()
public InputDataConfig.Builder clearFilterSplit()
Split based on the provided filters for each set.
.google.cloud.aiplatform.v1beta1.FilterSplit filter_split = 3;
Type | Description |
InputDataConfig.Builder |
clearFractionSplit()
public InputDataConfig.Builder clearFractionSplit()
Split based on fractions defining the size of each set.
.google.cloud.aiplatform.v1beta1.FractionSplit fraction_split = 2;
Type | Description |
InputDataConfig.Builder |
clearGcsDestination()
public InputDataConfig.Builder clearGcsDestination()
The Cloud Storage location where the training data is to be
written to. In the given directory a new directory is created with
name:
dataset-<dataset-id>-<annotation-type>-<timestamp-of-training-call>
where timestamp is in YYYY-MM-DDThh:mm:ss.sssZ ISO-8601 format.
All training input data is written into that directory.
The Vertex AI environment variables representing Cloud Storage
data URIs are represented in the Cloud Storage wildcard
format to support sharded data. e.g.: "gs://.../training-*.jsonl"
- AIP_DATA_FORMAT = "jsonl" for non-tabular data, "csv" for tabular data
- AIP_TRAINING_DATA_URI = "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/training-*.${AIP_DATA_FORMAT}"
- AIP_VALIDATION_DATA_URI = "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/validation-*.${AIP_DATA_FORMAT}"
- AIP_TEST_DATA_URI = "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/test-*.${AIP_DATA_FORMAT}"
.google.cloud.aiplatform.v1beta1.GcsDestination gcs_destination = 8;
Type | Description |
InputDataConfig.Builder |
clearOneof(Descriptors.OneofDescriptor oneof)
public InputDataConfig.Builder clearOneof(Descriptors.OneofDescriptor oneof)
Name | Description |
oneof | OneofDescriptor |
Type | Description |
InputDataConfig.Builder |
clearPredefinedSplit()
public InputDataConfig.Builder clearPredefinedSplit()
Supported only for tabular Datasets. Split based on a predefined key.
.google.cloud.aiplatform.v1beta1.PredefinedSplit predefined_split = 4;
Type | Description |
InputDataConfig.Builder |
clearSplit()
public InputDataConfig.Builder clearSplit()
Type | Description |
InputDataConfig.Builder |
clearStratifiedSplit()
public InputDataConfig.Builder clearStratifiedSplit()
Supported only for tabular Datasets. Split based on the distribution of the specified column.
.google.cloud.aiplatform.v1beta1.StratifiedSplit stratified_split = 12;
Type | Description |
InputDataConfig.Builder |
clearTimestampSplit()
public InputDataConfig.Builder clearTimestampSplit()
Supported only for tabular Datasets. Split based on the timestamp of the input data pieces.
.google.cloud.aiplatform.v1beta1.TimestampSplit timestamp_split = 5;
Type | Description |
InputDataConfig.Builder |
clone()
public InputDataConfig.Builder clone()
Type | Description |
InputDataConfig.Builder |
getAnnotationSchemaUri()
public String getAnnotationSchemaUri()
Applicable only to custom training with Datasets that have DataItems and Annotations. Cloud Storage URI that points to a YAML file describing the annotation schema. The schema is defined as an OpenAPI 3.0.2 Schema Object. The schema files that can be used here are found in gs://google-cloud-aiplatform/schema/dataset/annotation/ , note that the chosen schema must be consistent with metadata of the Dataset specified by dataset_id. Only Annotations that both match this schema and belong to DataItems not ignored by the split method are used in respectively training, validation or test role, depending on the role of the DataItem they are on. When used in conjunction with annotations_filter, the Annotations used for training are filtered by both annotations_filter and annotation_schema_uri.
string annotation_schema_uri = 9;
Type | Description |
String | The annotationSchemaUri. |
getAnnotationSchemaUriBytes()
public ByteString getAnnotationSchemaUriBytes()
Applicable only to custom training with Datasets that have DataItems and Annotations. Cloud Storage URI that points to a YAML file describing the annotation schema. The schema is defined as an OpenAPI 3.0.2 Schema Object. The schema files that can be used here are found in gs://google-cloud-aiplatform/schema/dataset/annotation/ , note that the chosen schema must be consistent with metadata of the Dataset specified by dataset_id. Only Annotations that both match this schema and belong to DataItems not ignored by the split method are used in respectively training, validation or test role, depending on the role of the DataItem they are on. When used in conjunction with annotations_filter, the Annotations used for training are filtered by both annotations_filter and annotation_schema_uri.
string annotation_schema_uri = 9;
Type | Description |
ByteString | The bytes for annotationSchemaUri. |
getAnnotationsFilter()
public String getAnnotationsFilter()
Applicable only to Datasets that have DataItems and Annotations. A filter on Annotations of the Dataset. Only Annotations that both match this filter and belong to DataItems not ignored by the split method are used in respectively training, validation or test role, depending on the role of the DataItem they are on (for the auto-assigned that role is decided by Vertex AI). A filter with same syntax as the one used in ListAnnotations may be used, but note here it filters across all Annotations of the Dataset, and not just within a single DataItem.
string annotations_filter = 6;
Type | Description |
String | The annotationsFilter. |
getAnnotationsFilterBytes()
public ByteString getAnnotationsFilterBytes()
Applicable only to Datasets that have DataItems and Annotations. A filter on Annotations of the Dataset. Only Annotations that both match this filter and belong to DataItems not ignored by the split method are used in respectively training, validation or test role, depending on the role of the DataItem they are on (for the auto-assigned that role is decided by Vertex AI). A filter with same syntax as the one used in ListAnnotations may be used, but note here it filters across all Annotations of the Dataset, and not just within a single DataItem.
string annotations_filter = 6;
Type | Description |
ByteString | The bytes for annotationsFilter. |
getBigqueryDestination()
public BigQueryDestination getBigqueryDestination()
Only applicable to custom training with tabular Dataset with BigQuery
source.
The BigQuery project location where the training data is to be written
to. In the given project a new dataset is created with name
dataset_<dataset-id><annotation-type><timestamp-of-training-call>
where timestamp is in YYYY_MM_DDThh_mm_ss_sssZ format. All training
input data is written into that dataset. In the dataset three
tables are created, training
, validation
and test
.
- AIP_DATA_FORMAT = "bigquery".
- AIP_TRAINING_DATA_URI = "bigquery_destination.dataset_<dataset-id><annotation-type><time>.training"
- AIP_VALIDATION_DATA_URI = "bigquery_destination.dataset_<dataset-id><annotation-type><time>.validation"
- AIP_TEST_DATA_URI = "bigquery_destination.dataset_<dataset-id><annotation-type><time>.test"
.google.cloud.aiplatform.v1beta1.BigQueryDestination bigquery_destination = 10;
Type | Description |
BigQueryDestination | The bigqueryDestination. |
getBigqueryDestinationBuilder()
public BigQueryDestination.Builder getBigqueryDestinationBuilder()
Only applicable to custom training with tabular Dataset with BigQuery
source.
The BigQuery project location where the training data is to be written
to. In the given project a new dataset is created with name
dataset_<dataset-id><annotation-type><timestamp-of-training-call>
where timestamp is in YYYY_MM_DDThh_mm_ss_sssZ format. All training
input data is written into that dataset. In the dataset three
tables are created, training
, validation
and test
.
- AIP_DATA_FORMAT = "bigquery".
- AIP_TRAINING_DATA_URI = "bigquery_destination.dataset_<dataset-id><annotation-type><time>.training"
- AIP_VALIDATION_DATA_URI = "bigquery_destination.dataset_<dataset-id><annotation-type><time>.validation"
- AIP_TEST_DATA_URI = "bigquery_destination.dataset_<dataset-id><annotation-type><time>.test"
.google.cloud.aiplatform.v1beta1.BigQueryDestination bigquery_destination = 10;
Type | Description |
BigQueryDestination.Builder |
getBigqueryDestinationOrBuilder()
public BigQueryDestinationOrBuilder getBigqueryDestinationOrBuilder()
Only applicable to custom training with tabular Dataset with BigQuery
source.
The BigQuery project location where the training data is to be written
to. In the given project a new dataset is created with name
dataset_<dataset-id><annotation-type><timestamp-of-training-call>
where timestamp is in YYYY_MM_DDThh_mm_ss_sssZ format. All training
input data is written into that dataset. In the dataset three
tables are created, training
, validation
and test
.
- AIP_DATA_FORMAT = "bigquery".
- AIP_TRAINING_DATA_URI = "bigquery_destination.dataset_<dataset-id><annotation-type><time>.training"
- AIP_VALIDATION_DATA_URI = "bigquery_destination.dataset_<dataset-id><annotation-type><time>.validation"
- AIP_TEST_DATA_URI = "bigquery_destination.dataset_<dataset-id><annotation-type><time>.test"
.google.cloud.aiplatform.v1beta1.BigQueryDestination bigquery_destination = 10;
Type | Description |
BigQueryDestinationOrBuilder |
getDatasetId()
public String getDatasetId()
Required. The ID of the Dataset in the same Project and Location which data will be used to train the Model. The Dataset must use schema compatible with Model being trained, and what is compatible should be described in the used TrainingPipeline's [training_task_definition] [google.cloud.aiplatform.v1beta1.TrainingPipeline.training_task_definition]. For tabular Datasets, all their data is exported to training, to pick and choose from.
string dataset_id = 1 [(.google.api.field_behavior) = REQUIRED];
Type | Description |
String | The datasetId. |
getDatasetIdBytes()
public ByteString getDatasetIdBytes()
Required. The ID of the Dataset in the same Project and Location which data will be used to train the Model. The Dataset must use schema compatible with Model being trained, and what is compatible should be described in the used TrainingPipeline's [training_task_definition] [google.cloud.aiplatform.v1beta1.TrainingPipeline.training_task_definition]. For tabular Datasets, all their data is exported to training, to pick and choose from.
string dataset_id = 1 [(.google.api.field_behavior) = REQUIRED];
Type | Description |
ByteString | The bytes for datasetId. |
getDefaultInstanceForType()
public InputDataConfig getDefaultInstanceForType()
Type | Description |
InputDataConfig |
getDescriptorForType()
public Descriptors.Descriptor getDescriptorForType()
Type | Description |
Descriptor |
getDestinationCase()
public InputDataConfig.DestinationCase getDestinationCase()
Type | Description |
InputDataConfig.DestinationCase |
getFilterSplit()
public FilterSplit getFilterSplit()
Split based on the provided filters for each set.
.google.cloud.aiplatform.v1beta1.FilterSplit filter_split = 3;
Type | Description |
FilterSplit | The filterSplit. |
getFilterSplitBuilder()
public FilterSplit.Builder getFilterSplitBuilder()
Split based on the provided filters for each set.
.google.cloud.aiplatform.v1beta1.FilterSplit filter_split = 3;
Type | Description |
FilterSplit.Builder |
getFilterSplitOrBuilder()
public FilterSplitOrBuilder getFilterSplitOrBuilder()
Split based on the provided filters for each set.
.google.cloud.aiplatform.v1beta1.FilterSplit filter_split = 3;
Type | Description |
FilterSplitOrBuilder |
getFractionSplit()
public FractionSplit getFractionSplit()
Split based on fractions defining the size of each set.
.google.cloud.aiplatform.v1beta1.FractionSplit fraction_split = 2;
Type | Description |
FractionSplit | The fractionSplit. |
getFractionSplitBuilder()
public FractionSplit.Builder getFractionSplitBuilder()
Split based on fractions defining the size of each set.
.google.cloud.aiplatform.v1beta1.FractionSplit fraction_split = 2;
Type | Description |
FractionSplit.Builder |
getFractionSplitOrBuilder()
public FractionSplitOrBuilder getFractionSplitOrBuilder()
Split based on fractions defining the size of each set.
.google.cloud.aiplatform.v1beta1.FractionSplit fraction_split = 2;
Type | Description |
FractionSplitOrBuilder |
getGcsDestination()
public GcsDestination getGcsDestination()
The Cloud Storage location where the training data is to be
written to. In the given directory a new directory is created with
name:
dataset-<dataset-id>-<annotation-type>-<timestamp-of-training-call>
where timestamp is in YYYY-MM-DDThh:mm:ss.sssZ ISO-8601 format.
All training input data is written into that directory.
The Vertex AI environment variables representing Cloud Storage
data URIs are represented in the Cloud Storage wildcard
format to support sharded data. e.g.: "gs://.../training-*.jsonl"
- AIP_DATA_FORMAT = "jsonl" for non-tabular data, "csv" for tabular data
- AIP_TRAINING_DATA_URI = "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/training-*.${AIP_DATA_FORMAT}"
- AIP_VALIDATION_DATA_URI = "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/validation-*.${AIP_DATA_FORMAT}"
- AIP_TEST_DATA_URI = "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/test-*.${AIP_DATA_FORMAT}"
.google.cloud.aiplatform.v1beta1.GcsDestination gcs_destination = 8;
Type | Description |
GcsDestination | The gcsDestination. |
getGcsDestinationBuilder()
public GcsDestination.Builder getGcsDestinationBuilder()
The Cloud Storage location where the training data is to be
written to. In the given directory a new directory is created with
name:
dataset-<dataset-id>-<annotation-type>-<timestamp-of-training-call>
where timestamp is in YYYY-MM-DDThh:mm:ss.sssZ ISO-8601 format.
All training input data is written into that directory.
The Vertex AI environment variables representing Cloud Storage
data URIs are represented in the Cloud Storage wildcard
format to support sharded data. e.g.: "gs://.../training-*.jsonl"
- AIP_DATA_FORMAT = "jsonl" for non-tabular data, "csv" for tabular data
- AIP_TRAINING_DATA_URI = "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/training-*.${AIP_DATA_FORMAT}"
- AIP_VALIDATION_DATA_URI = "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/validation-*.${AIP_DATA_FORMAT}"
- AIP_TEST_DATA_URI = "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/test-*.${AIP_DATA_FORMAT}"
.google.cloud.aiplatform.v1beta1.GcsDestination gcs_destination = 8;
Type | Description |
GcsDestination.Builder |
getGcsDestinationOrBuilder()
public GcsDestinationOrBuilder getGcsDestinationOrBuilder()
The Cloud Storage location where the training data is to be
written to. In the given directory a new directory is created with
name:
dataset-<dataset-id>-<annotation-type>-<timestamp-of-training-call>
where timestamp is in YYYY-MM-DDThh:mm:ss.sssZ ISO-8601 format.
All training input data is written into that directory.
The Vertex AI environment variables representing Cloud Storage
data URIs are represented in the Cloud Storage wildcard
format to support sharded data. e.g.: "gs://.../training-*.jsonl"
- AIP_DATA_FORMAT = "jsonl" for non-tabular data, "csv" for tabular data
- AIP_TRAINING_DATA_URI = "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/training-*.${AIP_DATA_FORMAT}"
- AIP_VALIDATION_DATA_URI = "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/validation-*.${AIP_DATA_FORMAT}"
- AIP_TEST_DATA_URI = "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/test-*.${AIP_DATA_FORMAT}"
.google.cloud.aiplatform.v1beta1.GcsDestination gcs_destination = 8;
Type | Description |
GcsDestinationOrBuilder |
getPredefinedSplit()
public PredefinedSplit getPredefinedSplit()
Supported only for tabular Datasets. Split based on a predefined key.
.google.cloud.aiplatform.v1beta1.PredefinedSplit predefined_split = 4;
Type | Description |
PredefinedSplit | The predefinedSplit. |
getPredefinedSplitBuilder()
public PredefinedSplit.Builder getPredefinedSplitBuilder()
Supported only for tabular Datasets. Split based on a predefined key.
.google.cloud.aiplatform.v1beta1.PredefinedSplit predefined_split = 4;
Type | Description |
PredefinedSplit.Builder |
getPredefinedSplitOrBuilder()
public PredefinedSplitOrBuilder getPredefinedSplitOrBuilder()
Supported only for tabular Datasets. Split based on a predefined key.
.google.cloud.aiplatform.v1beta1.PredefinedSplit predefined_split = 4;
Type | Description |
PredefinedSplitOrBuilder |
getSplitCase()
public InputDataConfig.SplitCase getSplitCase()
Type | Description |
InputDataConfig.SplitCase |
getStratifiedSplit()
public StratifiedSplit getStratifiedSplit()
Supported only for tabular Datasets. Split based on the distribution of the specified column.
.google.cloud.aiplatform.v1beta1.StratifiedSplit stratified_split = 12;
Type | Description |
StratifiedSplit | The stratifiedSplit. |
getStratifiedSplitBuilder()
public StratifiedSplit.Builder getStratifiedSplitBuilder()
Supported only for tabular Datasets. Split based on the distribution of the specified column.
.google.cloud.aiplatform.v1beta1.StratifiedSplit stratified_split = 12;
Type | Description |
StratifiedSplit.Builder |
getStratifiedSplitOrBuilder()
public StratifiedSplitOrBuilder getStratifiedSplitOrBuilder()
Supported only for tabular Datasets. Split based on the distribution of the specified column.
.google.cloud.aiplatform.v1beta1.StratifiedSplit stratified_split = 12;
Type | Description |
StratifiedSplitOrBuilder |
getTimestampSplit()
public TimestampSplit getTimestampSplit()
Supported only for tabular Datasets. Split based on the timestamp of the input data pieces.
.google.cloud.aiplatform.v1beta1.TimestampSplit timestamp_split = 5;
Type | Description |
TimestampSplit | The timestampSplit. |
getTimestampSplitBuilder()
public TimestampSplit.Builder getTimestampSplitBuilder()
Supported only for tabular Datasets. Split based on the timestamp of the input data pieces.
.google.cloud.aiplatform.v1beta1.TimestampSplit timestamp_split = 5;
Type | Description |
TimestampSplit.Builder |
getTimestampSplitOrBuilder()
public TimestampSplitOrBuilder getTimestampSplitOrBuilder()
Supported only for tabular Datasets. Split based on the timestamp of the input data pieces.
.google.cloud.aiplatform.v1beta1.TimestampSplit timestamp_split = 5;
Type | Description |
TimestampSplitOrBuilder |
hasBigqueryDestination()
public boolean hasBigqueryDestination()
Only applicable to custom training with tabular Dataset with BigQuery
source.
The BigQuery project location where the training data is to be written
to. In the given project a new dataset is created with name
dataset_<dataset-id><annotation-type><timestamp-of-training-call>
where timestamp is in YYYY_MM_DDThh_mm_ss_sssZ format. All training
input data is written into that dataset. In the dataset three
tables are created, training
, validation
and test
.
- AIP_DATA_FORMAT = "bigquery".
- AIP_TRAINING_DATA_URI = "bigquery_destination.dataset_<dataset-id><annotation-type><time>.training"
- AIP_VALIDATION_DATA_URI = "bigquery_destination.dataset_<dataset-id><annotation-type><time>.validation"
- AIP_TEST_DATA_URI = "bigquery_destination.dataset_<dataset-id><annotation-type><time>.test"
.google.cloud.aiplatform.v1beta1.BigQueryDestination bigquery_destination = 10;
Type | Description |
boolean | Whether the bigqueryDestination field is set. |
hasFilterSplit()
public boolean hasFilterSplit()
Split based on the provided filters for each set.
.google.cloud.aiplatform.v1beta1.FilterSplit filter_split = 3;
Type | Description |
boolean | Whether the filterSplit field is set. |
hasFractionSplit()
public boolean hasFractionSplit()
Split based on fractions defining the size of each set.
.google.cloud.aiplatform.v1beta1.FractionSplit fraction_split = 2;
Type | Description |
boolean | Whether the fractionSplit field is set. |
hasGcsDestination()
public boolean hasGcsDestination()
The Cloud Storage location where the training data is to be
written to. In the given directory a new directory is created with
name:
dataset-<dataset-id>-<annotation-type>-<timestamp-of-training-call>
where timestamp is in YYYY-MM-DDThh:mm:ss.sssZ ISO-8601 format.
All training input data is written into that directory.
The Vertex AI environment variables representing Cloud Storage
data URIs are represented in the Cloud Storage wildcard
format to support sharded data. e.g.: "gs://.../training-*.jsonl"
- AIP_DATA_FORMAT = "jsonl" for non-tabular data, "csv" for tabular data
- AIP_TRAINING_DATA_URI = "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/training-*.${AIP_DATA_FORMAT}"
- AIP_VALIDATION_DATA_URI = "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/validation-*.${AIP_DATA_FORMAT}"
- AIP_TEST_DATA_URI = "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/test-*.${AIP_DATA_FORMAT}"
.google.cloud.aiplatform.v1beta1.GcsDestination gcs_destination = 8;
Type | Description |
boolean | Whether the gcsDestination field is set. |
hasPredefinedSplit()
public boolean hasPredefinedSplit()
Supported only for tabular Datasets. Split based on a predefined key.
.google.cloud.aiplatform.v1beta1.PredefinedSplit predefined_split = 4;
Type | Description |
boolean | Whether the predefinedSplit field is set. |
hasStratifiedSplit()
public boolean hasStratifiedSplit()
Supported only for tabular Datasets. Split based on the distribution of the specified column.
.google.cloud.aiplatform.v1beta1.StratifiedSplit stratified_split = 12;
Type | Description |
boolean | Whether the stratifiedSplit field is set. |
hasTimestampSplit()
public boolean hasTimestampSplit()
Supported only for tabular Datasets. Split based on the timestamp of the input data pieces.
.google.cloud.aiplatform.v1beta1.TimestampSplit timestamp_split = 5;
Type | Description |
boolean | Whether the timestampSplit field is set. |
internalGetFieldAccessorTable()
protected GeneratedMessageV3.FieldAccessorTable internalGetFieldAccessorTable()
Type | Description |
FieldAccessorTable |
isInitialized()
public final boolean isInitialized()
Type | Description |
boolean |
mergeBigqueryDestination(BigQueryDestination value)
public InputDataConfig.Builder mergeBigqueryDestination(BigQueryDestination value)
Only applicable to custom training with tabular Dataset with BigQuery
source.
The BigQuery project location where the training data is to be written
to. In the given project a new dataset is created with name
dataset_<dataset-id><annotation-type><timestamp-of-training-call>
where timestamp is in YYYY_MM_DDThh_mm_ss_sssZ format. All training
input data is written into that dataset. In the dataset three
tables are created, training
, validation
and test
.
- AIP_DATA_FORMAT = "bigquery".
- AIP_TRAINING_DATA_URI = "bigquery_destination.dataset_<dataset-id><annotation-type><time>.training"
- AIP_VALIDATION_DATA_URI = "bigquery_destination.dataset_<dataset-id><annotation-type><time>.validation"
- AIP_TEST_DATA_URI = "bigquery_destination.dataset_<dataset-id><annotation-type><time>.test"
.google.cloud.aiplatform.v1beta1.BigQueryDestination bigquery_destination = 10;
Name | Description |
value | BigQueryDestination |
Type | Description |
InputDataConfig.Builder |
mergeFilterSplit(FilterSplit value)
public InputDataConfig.Builder mergeFilterSplit(FilterSplit value)
Split based on the provided filters for each set.
.google.cloud.aiplatform.v1beta1.FilterSplit filter_split = 3;
Name | Description |
value | FilterSplit |
Type | Description |
InputDataConfig.Builder |
mergeFractionSplit(FractionSplit value)
public InputDataConfig.Builder mergeFractionSplit(FractionSplit value)
Split based on fractions defining the size of each set.
.google.cloud.aiplatform.v1beta1.FractionSplit fraction_split = 2;
Name | Description |
value | FractionSplit |
Type | Description |
InputDataConfig.Builder |
mergeFrom(InputDataConfig other)
public InputDataConfig.Builder mergeFrom(InputDataConfig other)
Name | Description |
other | InputDataConfig |
Type | Description |
InputDataConfig.Builder |
mergeFrom(CodedInputStream input, ExtensionRegistryLite extensionRegistry)
public InputDataConfig.Builder mergeFrom(CodedInputStream input, ExtensionRegistryLite extensionRegistry)
Name | Description |
input | CodedInputStream |
extensionRegistry | ExtensionRegistryLite |
Type | Description |
InputDataConfig.Builder |
Type | Description |
IOException |
mergeFrom(Message other)
public InputDataConfig.Builder mergeFrom(Message other)
Name | Description |
other | Message |
Type | Description |
InputDataConfig.Builder |
mergeGcsDestination(GcsDestination value)
public InputDataConfig.Builder mergeGcsDestination(GcsDestination value)
The Cloud Storage location where the training data is to be
written to. In the given directory a new directory is created with
name:
dataset-<dataset-id>-<annotation-type>-<timestamp-of-training-call>
where timestamp is in YYYY-MM-DDThh:mm:ss.sssZ ISO-8601 format.
All training input data is written into that directory.
The Vertex AI environment variables representing Cloud Storage
data URIs are represented in the Cloud Storage wildcard
format to support sharded data. e.g.: "gs://.../training-*.jsonl"
- AIP_DATA_FORMAT = "jsonl" for non-tabular data, "csv" for tabular data
- AIP_TRAINING_DATA_URI = "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/training-*.${AIP_DATA_FORMAT}"
- AIP_VALIDATION_DATA_URI = "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/validation-*.${AIP_DATA_FORMAT}"
- AIP_TEST_DATA_URI = "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/test-*.${AIP_DATA_FORMAT}"
.google.cloud.aiplatform.v1beta1.GcsDestination gcs_destination = 8;
Name | Description |
value | GcsDestination |
Type | Description |
InputDataConfig.Builder |
mergePredefinedSplit(PredefinedSplit value)
public InputDataConfig.Builder mergePredefinedSplit(PredefinedSplit value)
Supported only for tabular Datasets. Split based on a predefined key.
.google.cloud.aiplatform.v1beta1.PredefinedSplit predefined_split = 4;
Name | Description |
value | PredefinedSplit |
Type | Description |
InputDataConfig.Builder |
mergeStratifiedSplit(StratifiedSplit value)
public InputDataConfig.Builder mergeStratifiedSplit(StratifiedSplit value)
Supported only for tabular Datasets. Split based on the distribution of the specified column.
.google.cloud.aiplatform.v1beta1.StratifiedSplit stratified_split = 12;
Name | Description |
value | StratifiedSplit |
Type | Description |
InputDataConfig.Builder |
mergeTimestampSplit(TimestampSplit value)
public InputDataConfig.Builder mergeTimestampSplit(TimestampSplit value)
Supported only for tabular Datasets. Split based on the timestamp of the input data pieces.
.google.cloud.aiplatform.v1beta1.TimestampSplit timestamp_split = 5;
Name | Description |
value | TimestampSplit |
Type | Description |
InputDataConfig.Builder |
mergeUnknownFields(UnknownFieldSet unknownFields)
public final InputDataConfig.Builder mergeUnknownFields(UnknownFieldSet unknownFields)
Name | Description |
unknownFields | UnknownFieldSet |
Type | Description |
InputDataConfig.Builder |
setAnnotationSchemaUri(String value)
public InputDataConfig.Builder setAnnotationSchemaUri(String value)
Applicable only to custom training with Datasets that have DataItems and Annotations. Cloud Storage URI that points to a YAML file describing the annotation schema. The schema is defined as an OpenAPI 3.0.2 Schema Object. The schema files that can be used here are found in gs://google-cloud-aiplatform/schema/dataset/annotation/ , note that the chosen schema must be consistent with metadata of the Dataset specified by dataset_id. Only Annotations that both match this schema and belong to DataItems not ignored by the split method are used in respectively training, validation or test role, depending on the role of the DataItem they are on. When used in conjunction with annotations_filter, the Annotations used for training are filtered by both annotations_filter and annotation_schema_uri.
string annotation_schema_uri = 9;
Name | Description |
value | String The annotationSchemaUri to set. |
Type | Description |
InputDataConfig.Builder | This builder for chaining. |
setAnnotationSchemaUriBytes(ByteString value)
public InputDataConfig.Builder setAnnotationSchemaUriBytes(ByteString value)
Applicable only to custom training with Datasets that have DataItems and Annotations. Cloud Storage URI that points to a YAML file describing the annotation schema. The schema is defined as an OpenAPI 3.0.2 Schema Object. The schema files that can be used here are found in gs://google-cloud-aiplatform/schema/dataset/annotation/ , note that the chosen schema must be consistent with metadata of the Dataset specified by dataset_id. Only Annotations that both match this schema and belong to DataItems not ignored by the split method are used in respectively training, validation or test role, depending on the role of the DataItem they are on. When used in conjunction with annotations_filter, the Annotations used for training are filtered by both annotations_filter and annotation_schema_uri.
string annotation_schema_uri = 9;
Name | Description |
value | ByteString The bytes for annotationSchemaUri to set. |
Type | Description |
InputDataConfig.Builder | This builder for chaining. |
setAnnotationsFilter(String value)
public InputDataConfig.Builder setAnnotationsFilter(String value)
Applicable only to Datasets that have DataItems and Annotations. A filter on Annotations of the Dataset. Only Annotations that both match this filter and belong to DataItems not ignored by the split method are used in respectively training, validation or test role, depending on the role of the DataItem they are on (for the auto-assigned that role is decided by Vertex AI). A filter with same syntax as the one used in ListAnnotations may be used, but note here it filters across all Annotations of the Dataset, and not just within a single DataItem.
string annotations_filter = 6;
Name | Description |
value | String The annotationsFilter to set. |
Type | Description |
InputDataConfig.Builder | This builder for chaining. |
setAnnotationsFilterBytes(ByteString value)
public InputDataConfig.Builder setAnnotationsFilterBytes(ByteString value)
Applicable only to Datasets that have DataItems and Annotations. A filter on Annotations of the Dataset. Only Annotations that both match this filter and belong to DataItems not ignored by the split method are used in respectively training, validation or test role, depending on the role of the DataItem they are on (for the auto-assigned that role is decided by Vertex AI). A filter with same syntax as the one used in ListAnnotations may be used, but note here it filters across all Annotations of the Dataset, and not just within a single DataItem.
string annotations_filter = 6;
Name | Description |
value | ByteString The bytes for annotationsFilter to set. |
Type | Description |
InputDataConfig.Builder | This builder for chaining. |
setBigqueryDestination(BigQueryDestination value)
public InputDataConfig.Builder setBigqueryDestination(BigQueryDestination value)
Only applicable to custom training with tabular Dataset with BigQuery
source.
The BigQuery project location where the training data is to be written
to. In the given project a new dataset is created with name
dataset_<dataset-id><annotation-type><timestamp-of-training-call>
where timestamp is in YYYY_MM_DDThh_mm_ss_sssZ format. All training
input data is written into that dataset. In the dataset three
tables are created, training
, validation
and test
.
- AIP_DATA_FORMAT = "bigquery".
- AIP_TRAINING_DATA_URI = "bigquery_destination.dataset_<dataset-id><annotation-type><time>.training"
- AIP_VALIDATION_DATA_URI = "bigquery_destination.dataset_<dataset-id><annotation-type><time>.validation"
- AIP_TEST_DATA_URI = "bigquery_destination.dataset_<dataset-id><annotation-type><time>.test"
.google.cloud.aiplatform.v1beta1.BigQueryDestination bigquery_destination = 10;
Name | Description |
value | BigQueryDestination |
Type | Description |
InputDataConfig.Builder |
setBigqueryDestination(BigQueryDestination.Builder builderForValue)
public InputDataConfig.Builder setBigqueryDestination(BigQueryDestination.Builder builderForValue)
Only applicable to custom training with tabular Dataset with BigQuery
source.
The BigQuery project location where the training data is to be written
to. In the given project a new dataset is created with name
dataset_<dataset-id><annotation-type><timestamp-of-training-call>
where timestamp is in YYYY_MM_DDThh_mm_ss_sssZ format. All training
input data is written into that dataset. In the dataset three
tables are created, training
, validation
and test
.
- AIP_DATA_FORMAT = "bigquery".
- AIP_TRAINING_DATA_URI = "bigquery_destination.dataset_<dataset-id><annotation-type><time>.training"
- AIP_VALIDATION_DATA_URI = "bigquery_destination.dataset_<dataset-id><annotation-type><time>.validation"
- AIP_TEST_DATA_URI = "bigquery_destination.dataset_<dataset-id><annotation-type><time>.test"
.google.cloud.aiplatform.v1beta1.BigQueryDestination bigquery_destination = 10;
Name | Description |
builderForValue | BigQueryDestination.Builder |
Type | Description |
InputDataConfig.Builder |
setDatasetId(String value)
public InputDataConfig.Builder setDatasetId(String value)
Required. The ID of the Dataset in the same Project and Location which data will be used to train the Model. The Dataset must use schema compatible with Model being trained, and what is compatible should be described in the used TrainingPipeline's [training_task_definition] [google.cloud.aiplatform.v1beta1.TrainingPipeline.training_task_definition]. For tabular Datasets, all their data is exported to training, to pick and choose from.
string dataset_id = 1 [(.google.api.field_behavior) = REQUIRED];
Name | Description |
value | String The datasetId to set. |
Type | Description |
InputDataConfig.Builder | This builder for chaining. |
setDatasetIdBytes(ByteString value)
public InputDataConfig.Builder setDatasetIdBytes(ByteString value)
Required. The ID of the Dataset in the same Project and Location which data will be used to train the Model. The Dataset must use schema compatible with Model being trained, and what is compatible should be described in the used TrainingPipeline's [training_task_definition] [google.cloud.aiplatform.v1beta1.TrainingPipeline.training_task_definition]. For tabular Datasets, all their data is exported to training, to pick and choose from.
string dataset_id = 1 [(.google.api.field_behavior) = REQUIRED];
Name | Description |
value | ByteString The bytes for datasetId to set. |
Type | Description |
InputDataConfig.Builder | This builder for chaining. |
setField(Descriptors.FieldDescriptor field, Object value)
public InputDataConfig.Builder setField(Descriptors.FieldDescriptor field, Object value)
Name | Description |
field | FieldDescriptor |
value | Object |
Type | Description |
InputDataConfig.Builder |
setFilterSplit(FilterSplit value)
public InputDataConfig.Builder setFilterSplit(FilterSplit value)
Split based on the provided filters for each set.
.google.cloud.aiplatform.v1beta1.FilterSplit filter_split = 3;
Name | Description |
value | FilterSplit |
Type | Description |
InputDataConfig.Builder |
setFilterSplit(FilterSplit.Builder builderForValue)
public InputDataConfig.Builder setFilterSplit(FilterSplit.Builder builderForValue)
Split based on the provided filters for each set.
.google.cloud.aiplatform.v1beta1.FilterSplit filter_split = 3;
Name | Description |
builderForValue | FilterSplit.Builder |
Type | Description |
InputDataConfig.Builder |
setFractionSplit(FractionSplit value)
public InputDataConfig.Builder setFractionSplit(FractionSplit value)
Split based on fractions defining the size of each set.
.google.cloud.aiplatform.v1beta1.FractionSplit fraction_split = 2;
Name | Description |
value | FractionSplit |
Type | Description |
InputDataConfig.Builder |
setFractionSplit(FractionSplit.Builder builderForValue)
public InputDataConfig.Builder setFractionSplit(FractionSplit.Builder builderForValue)
Split based on fractions defining the size of each set.
.google.cloud.aiplatform.v1beta1.FractionSplit fraction_split = 2;
Name | Description |
builderForValue | FractionSplit.Builder |
Type | Description |
InputDataConfig.Builder |
setGcsDestination(GcsDestination value)
public InputDataConfig.Builder setGcsDestination(GcsDestination value)
The Cloud Storage location where the training data is to be
written to. In the given directory a new directory is created with
name:
dataset-<dataset-id>-<annotation-type>-<timestamp-of-training-call>
where timestamp is in YYYY-MM-DDThh:mm:ss.sssZ ISO-8601 format.
All training input data is written into that directory.
The Vertex AI environment variables representing Cloud Storage
data URIs are represented in the Cloud Storage wildcard
format to support sharded data. e.g.: "gs://.../training-*.jsonl"
- AIP_DATA_FORMAT = "jsonl" for non-tabular data, "csv" for tabular data
- AIP_TRAINING_DATA_URI = "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/training-*.${AIP_DATA_FORMAT}"
- AIP_VALIDATION_DATA_URI = "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/validation-*.${AIP_DATA_FORMAT}"
- AIP_TEST_DATA_URI = "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/test-*.${AIP_DATA_FORMAT}"
.google.cloud.aiplatform.v1beta1.GcsDestination gcs_destination = 8;
Name | Description |
value | GcsDestination |
Type | Description |
InputDataConfig.Builder |
setGcsDestination(GcsDestination.Builder builderForValue)
public InputDataConfig.Builder setGcsDestination(GcsDestination.Builder builderForValue)
The Cloud Storage location where the training data is to be
written to. In the given directory a new directory is created with
name:
dataset-<dataset-id>-<annotation-type>-<timestamp-of-training-call>
where timestamp is in YYYY-MM-DDThh:mm:ss.sssZ ISO-8601 format.
All training input data is written into that directory.
The Vertex AI environment variables representing Cloud Storage
data URIs are represented in the Cloud Storage wildcard
format to support sharded data. e.g.: "gs://.../training-*.jsonl"
- AIP_DATA_FORMAT = "jsonl" for non-tabular data, "csv" for tabular data
- AIP_TRAINING_DATA_URI = "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/training-*.${AIP_DATA_FORMAT}"
- AIP_VALIDATION_DATA_URI = "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/validation-*.${AIP_DATA_FORMAT}"
- AIP_TEST_DATA_URI = "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/test-*.${AIP_DATA_FORMAT}"
.google.cloud.aiplatform.v1beta1.GcsDestination gcs_destination = 8;
Name | Description |
builderForValue | GcsDestination.Builder |
Type | Description |
InputDataConfig.Builder |
setPredefinedSplit(PredefinedSplit value)
public InputDataConfig.Builder setPredefinedSplit(PredefinedSplit value)
Supported only for tabular Datasets. Split based on a predefined key.
.google.cloud.aiplatform.v1beta1.PredefinedSplit predefined_split = 4;
Name | Description |
value | PredefinedSplit |
Type | Description |
InputDataConfig.Builder |
setPredefinedSplit(PredefinedSplit.Builder builderForValue)
public InputDataConfig.Builder setPredefinedSplit(PredefinedSplit.Builder builderForValue)
Supported only for tabular Datasets. Split based on a predefined key.
.google.cloud.aiplatform.v1beta1.PredefinedSplit predefined_split = 4;
Name | Description |
builderForValue | PredefinedSplit.Builder |
Type | Description |
InputDataConfig.Builder |
setRepeatedField(Descriptors.FieldDescriptor field, int index, Object value)
public InputDataConfig.Builder setRepeatedField(Descriptors.FieldDescriptor field, int index, Object value)
Name | Description |
field | FieldDescriptor |
index | int |
value | Object |
Type | Description |
InputDataConfig.Builder |
setStratifiedSplit(StratifiedSplit value)
public InputDataConfig.Builder setStratifiedSplit(StratifiedSplit value)
Supported only for tabular Datasets. Split based on the distribution of the specified column.
.google.cloud.aiplatform.v1beta1.StratifiedSplit stratified_split = 12;
Name | Description |
value | StratifiedSplit |
Type | Description |
InputDataConfig.Builder |
setStratifiedSplit(StratifiedSplit.Builder builderForValue)
public InputDataConfig.Builder setStratifiedSplit(StratifiedSplit.Builder builderForValue)
Supported only for tabular Datasets. Split based on the distribution of the specified column.
.google.cloud.aiplatform.v1beta1.StratifiedSplit stratified_split = 12;
Name | Description |
builderForValue | StratifiedSplit.Builder |
Type | Description |
InputDataConfig.Builder |
setTimestampSplit(TimestampSplit value)
public InputDataConfig.Builder setTimestampSplit(TimestampSplit value)
Supported only for tabular Datasets. Split based on the timestamp of the input data pieces.
.google.cloud.aiplatform.v1beta1.TimestampSplit timestamp_split = 5;
Name | Description |
value | TimestampSplit |
Type | Description |
InputDataConfig.Builder |
setTimestampSplit(TimestampSplit.Builder builderForValue)
public InputDataConfig.Builder setTimestampSplit(TimestampSplit.Builder builderForValue)
Supported only for tabular Datasets. Split based on the timestamp of the input data pieces.
.google.cloud.aiplatform.v1beta1.TimestampSplit timestamp_split = 5;
Name | Description |
builderForValue | TimestampSplit.Builder |
Type | Description |
InputDataConfig.Builder |
setUnknownFields(UnknownFieldSet unknownFields)
public final InputDataConfig.Builder setUnknownFields(UnknownFieldSet unknownFields)
Name | Description |
unknownFields | UnknownFieldSet |
Type | Description |
InputDataConfig.Builder |