API documentation for dataproc_v1.types
package.
Classes
AcceleratorConfig
Specifies the type and number of accelerator cards attached to the
instances of an instance. See GPUs on Compute
Engine <https://cloud.google.com/compute/docs/gpus/>
__.
AutoscalingConfig
Autoscaling Policy config associated with the cluster. .. attribute:: policy_uri
Optional. The autoscaling policy used by the cluster.
Only resource names including projectid and location (region) are valid. Examples:
https://www.googleapis.com/compute/v1/projects/[project_id]/locations/[dataproc_region]/autoscalingPolicies/[policy_id]
projects/[project_id]/locations/[dataproc_region]/autoscalingPolicies/[policy_id]
Note that the policy must be in the same project and Dataproc region.
:type: str
AutoscalingPolicy
Describes an autoscaling policy for Dataproc cluster autoscaler.
BasicAutoscalingAlgorithm
Basic algorithm for autoscaling. .. attribute:: yarn_config
Required. YARN autoscaling configuration.
:type: google.cloud.dataproc_v1.types.BasicYarnAutoscalingConfig
BasicYarnAutoscalingConfig
Basic autoscaling configurations for YARN. .. attribute:: graceful_decommission_timeout
Required. Timeout for YARN graceful decommissioning of Node Managers. Specifies the duration to wait for jobs to complete before forcefully removing workers (and potentially interrupting jobs). Only applicable to downscaling operations.
Bounds: [0s, 1d].
:type: google.protobuf.duration_pb2.Duration
CancelJobRequest
A request to cancel a job. .. attribute:: project_id
Required. The ID of the Google Cloud Platform project that the job belongs to.
:type: str
Cluster
Describes the identifying information, config, and status of a cluster of Compute Engine instances.
ClusterConfig
The cluster config. .. attribute:: config_bucket
Optional. A Cloud Storage bucket used to stage job
dependencies, config files, and job driver console output.
If you do not specify a staging bucket, Cloud Dataproc will
determine a Cloud Storage location (US, ASIA, or EU) for
your cluster's staging bucket according to the Compute
Engine zone where your cluster is deployed, and then create
and manage this project-level, per-location bucket (see
Dataproc staging
bucket <https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/staging-bucket>
__).
This field requires a Cloud Storage bucket name, not a URI
to a Cloud Storage bucket.
:type: str
ClusterMetrics
Contains cluster daemon metrics, such as HDFS and YARN stats.
Beta Feature: This report is available for testing purposes only. It may be changed before final release.
ClusterOperation
The cluster operation triggered by a workflow. .. attribute:: operation_id
Output only. The id of the cluster operation.
:type: str
ClusterOperationMetadata
Metadata describing the operation. .. attribute:: cluster_name
Output only. Name of the cluster for the operation.
:type: str
ClusterOperationStatus
The status of the operation. .. attribute:: state
Output only. A message containing the operation state.
:type: google.cloud.dataproc_v1.types.ClusterOperationStatus.State
ClusterSelector
A selector that chooses target cluster for jobs based on metadata.
ClusterStatus
The status of a cluster and its instances. .. attribute:: state
Output only. The cluster's state.
CreateAutoscalingPolicyRequest
A request to create an autoscaling policy. .. attribute:: parent
Required. The "resource name" of the region or location, as described in https://cloud.google.com/apis/design/resource_names.
For
projects.regions.autoscalingPolicies.create
, the resource name of the region has the following format:projects/{project_id}/regions/{region}
For
projects.locations.autoscalingPolicies.create
, the resource name of the location has the following format:projects/{project_id}/locations/{location}
:type: str
CreateClusterRequest
A request to create a cluster. .. attribute:: project_id
Required. The ID of the Google Cloud Platform project that the cluster belongs to.
:type: str
CreateWorkflowTemplateRequest
A request to create a workflow template. .. attribute:: parent
Required. The resource name of the region or location, as described in https://cloud.google.com/apis/design/resource_names.
For
projects.regions.workflowTemplates,create
, the resource name of the region has the following format:projects/{project_id}/regions/{region}
For
projects.locations.workflowTemplates.create
, the resource name of the location has the following format:projects/{project_id}/locations/{location}
:type: str
DeleteAutoscalingPolicyRequest
A request to delete an autoscaling policy. Autoscaling policies in use by one or more clusters will not be deleted.
DeleteClusterRequest
A request to delete a cluster. .. attribute:: project_id
Required. The ID of the Google Cloud Platform project that the cluster belongs to.
:type: str
DeleteJobRequest
A request to delete a job. .. attribute:: project_id
Required. The ID of the Google Cloud Platform project that the job belongs to.
:type: str
DeleteWorkflowTemplateRequest
A request to delete a workflow template. Currently started workflows will remain running.
DiagnoseClusterRequest
A request to collect cluster diagnostic information. .. attribute:: project_id
Required. The ID of the Google Cloud Platform project that the cluster belongs to.
:type: str
DiagnoseClusterResults
The location of diagnostic output. .. attribute:: output_uri
Output only. The Cloud Storage URI of the diagnostic output. The output report is a plain text file with a summary of collected diagnostics.
:type: str
DiskConfig
Specifies the config of disk options for a group of VM instances.
EncryptionConfig
Encryption settings for the cluster. .. attribute:: gce_pd_kms_key_name
Optional. The Cloud KMS key name to use for PD disk encryption for all instances in the cluster.
:type: str
EndpointConfig
Endpoint config for this cluster .. attribute:: http_ports
Output only. The map of port descriptions to URLs. Will only be populated if enable_http_port_access is true.
:type: Sequence[google.cloud.dataproc_v1.types.EndpointConfig.HttpPortsEntry]
GceClusterConfig
Common config settings for resources of Compute Engine cluster instances, applicable to all instances in the cluster.
GetAutoscalingPolicyRequest
A request to fetch an autoscaling policy. .. attribute:: name
Required. The "resource name" of the autoscaling policy, as described in https://cloud.google.com/apis/design/resource_names.
For
projects.regions.autoscalingPolicies.get
, the resource name of the policy has the following format:projects/{project_id}/regions/{region}/autoscalingPolicies/{policy_id}
For
projects.locations.autoscalingPolicies.get
, the resource name of the policy has the following format:projects/{project_id}/locations/{location}/autoscalingPolicies/{policy_id}
:type: str
GetClusterRequest
Request to get the resource representation for a cluster in a project.
GetJobRequest
A request to get the resource representation for a job in a project.
GetWorkflowTemplateRequest
A request to fetch a workflow template. .. attribute:: name
Required. The resource name of the workflow template, as described in https://cloud.google.com/apis/design/resource_names.
For
projects.regions.workflowTemplates.get
, the resource name of the template has the following format:projects/{project_id}/regions/{region}/workflowTemplates/{template_id}
For
projects.locations.workflowTemplates.get
, the resource name of the template has the following format:projects/{project_id}/locations/{location}/workflowTemplates/{template_id}
:type: str
GkeClusterConfig
The GKE config for this cluster. .. attribute:: namespaced_gke_deployment_target
Optional. A target for the deployment.
:type: google.cloud.dataproc_v1.types.GkeClusterConfig.NamespacedGkeDeploymentTarget
HadoopJob
A Dataproc job for running Apache Hadoop
MapReduce <https://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html>
jobs on Apache Hadoop
YARN <https://hadoop.apache.org/docs/r2.7.1/hadoop-yarn/hadoop-yarn-site/YARN.html>
.
HiveJob
A Dataproc job for running Apache
Hive <https://hive.apache.org/>
__ queries on YARN.
IdentityConfig
Identity related configuration, including service account based secure multi-tenancy user mappings.
InstanceGroupAutoscalingPolicyConfig
Configuration for the size bounds of an instance group, including its proportional size to other groups.
InstanceGroupConfig
The config settings for Compute Engine resources in an instance group, such as a master or worker group.
InstantiateInlineWorkflowTemplateRequest
A request to instantiate an inline workflow template. .. attribute:: parent
Required. The resource name of the region or location, as described in https://cloud.google.com/apis/design/resource_names.
For
projects.regions.workflowTemplates,instantiateinline
, the resource name of the region has the following format:projects/{project_id}/regions/{region}
For
projects.locations.workflowTemplates.instantiateinline
, the resource name of the location has the following format:projects/{project_id}/locations/{location}
:type: str
InstantiateWorkflowTemplateRequest
A request to instantiate a workflow template. .. attribute:: name
Required. The resource name of the workflow template, as described in https://cloud.google.com/apis/design/resource_names.
For
projects.regions.workflowTemplates.instantiate
, the resource name of the template has the following format:projects/{project_id}/regions/{region}/workflowTemplates/{template_id}
For
projects.locations.workflowTemplates.instantiate
, the resource name of the template has the following format:projects/{project_id}/locations/{location}/workflowTemplates/{template_id}
:type: str
Job
A Dataproc job resource. .. attribute:: reference
Optional. The fully qualified reference to the job, which can be used to obtain the equivalent REST path of the job resource. If this property is not specified when a job is created, the server generates a job_id.
JobMetadata
Job Operation metadata. .. attribute:: job_id
Output only. The job id.
:type: str
JobPlacement
Dataproc job config. .. attribute:: cluster_name
Required. The name of the cluster where the job will be submitted.
:type: str
JobReference
Encapsulates the full scoping used to reference a job. .. attribute:: project_id
Optional. The ID of the Google Cloud Platform project that the job belongs to. If specified, must match the request project ID.
:type: str
JobScheduling
Job scheduling options. .. attribute:: max_failures_per_hour
Optional. Maximum number of times per hour a driver may be restarted as a result of driver exiting with non-zero code before job is reported failed.
A job may be reported as thrashing if driver exits with non-zero code 4 times within 10 minute window.
Maximum value is 10.
:type: int
JobStatus
Dataproc job status. .. attribute:: state
Output only. A state message specifying the overall job state.
KerberosConfig
Specifies Kerberos related configuration. .. attribute:: enable_kerberos
Optional. Flag to indicate whether to Kerberize the cluster (default: false). Set this field to true to enable Kerberos on a cluster.
:type: bool
LifecycleConfig
Specifies the cluster auto-delete schedule configuration. .. attribute:: idle_delete_ttl
Optional. The duration to keep the cluster alive while
idling (when no jobs are running). Passing this threshold
will cause the cluster to be deleted. Minimum value is 5
minutes; maximum value is 14 days (see JSON representation
of
Duration <https://developers.google.com/protocol-buffers/docs/proto3#json>
__).
:type: google.protobuf.duration_pb2.Duration
ListAutoscalingPoliciesRequest
A request to list autoscaling policies in a project. .. attribute:: parent
Required. The "resource name" of the region or location, as described in https://cloud.google.com/apis/design/resource_names.
For
projects.regions.autoscalingPolicies.list
, the resource name of the region has the following format:projects/{project_id}/regions/{region}
For
projects.locations.autoscalingPolicies.list
, the resource name of the location has the following format:projects/{project_id}/locations/{location}
:type: str
ListAutoscalingPoliciesResponse
A response to a request to list autoscaling policies in a project.
ListClustersRequest
A request to list the clusters in a project. .. attribute:: project_id
Required. The ID of the Google Cloud Platform project that the cluster belongs to.
:type: str
ListClustersResponse
The list of all clusters in a project. .. attribute:: clusters
Output only. The clusters in the project.
:type: Sequence[google.cloud.dataproc_v1.types.Cluster]
ListJobsRequest
A request to list jobs in a project. .. attribute:: project_id
Required. The ID of the Google Cloud Platform project that the job belongs to.
:type: str
ListJobsResponse
A list of jobs in a project. .. attribute:: jobs
Output only. Jobs list.
:type: Sequence[google.cloud.dataproc_v1.types.Job]
ListWorkflowTemplatesRequest
A request to list workflow templates in a project. .. attribute:: parent
Required. The resource name of the region or location, as described in https://cloud.google.com/apis/design/resource_names.
For
projects.regions.workflowTemplates,list
, the resource name of the region has the following format:projects/{project_id}/regions/{region}
For
projects.locations.workflowTemplates.list
, the resource name of the location has the following format:projects/{project_id}/locations/{location}
:type: str
ListWorkflowTemplatesResponse
A response to a request to list workflow templates in a project.
LoggingConfig
The runtime logging config of the job. .. attribute:: driver_log_levels
The per-package log levels for the driver. This may include "root" package name to configure rootLogger. Examples: 'com.google = FATAL', 'root = INFO', 'org.apache = DEBUG'
:type: Sequence[google.cloud.dataproc_v1.types.LoggingConfig.DriverLogLevelsEntry]
ManagedCluster
Cluster that is managed by the workflow. .. attribute:: cluster_name
Required. The cluster name prefix. A unique cluster name will be formed by appending a random suffix. The name must contain only lower-case letters (a-z), numbers (0-9), and hyphens (-). Must begin with a letter. Cannot begin or end with hyphen. Must consist of between 2 and 35 characters.
:type: str
ManagedGroupConfig
Specifies the resources used to actively manage an instance group.
MetastoreConfig
Specifies a Metastore configuration. .. attribute:: dataproc_metastore_service
Required. Resource name of an existing Dataproc Metastore service.
Example:
projects/[project_id]/locations/[dataproc_region]/services/[service-name]
:type: str
NodeGroupAffinity
Node Group Affinity for clusters using sole-tenant node groups.
NodeInitializationAction
Specifies an executable to run on a fully configured node and a timeout period for executable completion.
OrderedJob
A job executed by the workflow. .. attribute:: step_id
Required. The step id. The id must be unique among all jobs within the template.
The step id is used as prefix for job id, as job
goog-dataproc-workflow-step-id
label, and in
prerequisiteStepIds
field from other steps.
The id must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), and hyphens (-). Cannot begin or end with underscore or hyphen. Must consist of between 3 and 50 characters.
:type: str
ParameterValidation
Configuration for parameter validation. .. attribute:: regex
Validation based on regular expressions.
PigJob
A Dataproc job for running Apache Pig <https://pig.apache.org/>
__
queries on YARN.
PrestoJob
A Dataproc job for running Presto <https://prestosql.io/>
queries. IMPORTANT: The Dataproc Presto Optional
Component <https://cloud.google.com/dataproc/docs/concepts/components/presto>
must be enabled when the cluster is created to submit a Presto job
to the cluster.
PySparkJob
A Dataproc job for running Apache
PySpark <https://spark.apache.org/docs/0.9.0/python-programming-guide.html>
__
applications on YARN.
QueryList
A list of queries to run on a cluster. .. attribute:: queries
Required. The queries to execute. You do not need to end a query expression with a semicolon. Multiple queries can be specified in one string by separating each with a semicolon. Here is an example of a Dataproc API snippet that uses a QueryList to specify a HiveJob:
::
"hiveJob": {
"queryList": {
"queries": [
"query1",
"query2",
"query3;query4",
]
}
}
:type: Sequence[str]
RegexValidation
Validation based on regular expressions. .. attribute:: regexes
Required. RE2 regular expressions used to validate the parameter's value. The value must match the regex in its entirety (substring matches are not sufficient).
:type: Sequence[str]
ReservationAffinity
Reservation Affinity for consuming Zonal reservation. .. attribute:: consume_reservation_type
Optional. Type of reservation to consume
:type: google.cloud.dataproc_v1.types.ReservationAffinity.Type
SecurityConfig
Security related configuration, including encryption, Kerberos, etc.
ShieldedInstanceConfig
Shielded Instance Config for clusters using Compute Engine Shielded
VMs <https://cloud.google.com/security/shielded-cloud/shielded-vm>
__.
SoftwareConfig
Specifies the selection and config of software inside the cluster.
SparkJob
A Dataproc job for running Apache
Spark <http://spark.apache.org/>
__ applications on YARN.
SparkRJob
A Dataproc job for running Apache
SparkR <https://spark.apache.org/docs/latest/sparkr.html>
__
applications on YARN.
SparkSqlJob
A Dataproc job for running Apache Spark
SQL <http://spark.apache.org/sql/>
__ queries.
StartClusterRequest
A request to start a cluster. .. attribute:: project_id
Required. The ID of the Google Cloud Platform project the cluster belongs to.
:type: str
StopClusterRequest
A request to stop a cluster. .. attribute:: project_id
Required. The ID of the Google Cloud Platform project the cluster belongs to.
:type: str
SubmitJobRequest
A request to submit a job. .. attribute:: project_id
Required. The ID of the Google Cloud Platform project that the job belongs to.
:type: str
TemplateParameter
A configurable parameter that replaces one or more fields in the template. Parameterizable fields:
- Labels
- File uris
- Job properties
- Job arguments
- Script variables
- Main class (in HadoopJob and SparkJob)
- Zone (in ClusterSelector)
UpdateAutoscalingPolicyRequest
A request to update an autoscaling policy. .. attribute:: policy
Required. The updated autoscaling policy.
UpdateClusterRequest
A request to update a cluster. .. attribute:: project_id
Required. The ID of the Google Cloud Platform project the cluster belongs to.
:type: str
UpdateJobRequest
A request to update a job. .. attribute:: project_id
Required. The ID of the Google Cloud Platform project that the job belongs to.
:type: str
UpdateWorkflowTemplateRequest
A request to update a workflow template. .. attribute:: template
Required. The updated workflow template.
The template.version
field must match the current
version.
ValueValidation
Validation based on a list of allowed values. .. attribute:: values
Required. List of allowed values for the parameter.
:type: Sequence[str]
WorkflowGraph
The workflow graph. .. attribute:: nodes
Output only. The workflow nodes.
:type: Sequence[google.cloud.dataproc_v1.types.WorkflowNode]
WorkflowMetadata
A Dataproc workflow template resource. .. attribute:: template
Output only. The resource name of the workflow template as described in https://cloud.google.com/apis/design/resource_names.
For
projects.regions.workflowTemplates
, the resource name of the template has the following format:projects/{project_id}/regions/{region}/workflowTemplates/{template_id}
For
projects.locations.workflowTemplates
, the resource name of the template has the following format:projects/{project_id}/locations/{location}/workflowTemplates/{template_id}
:type: str
WorkflowNode
The workflow node. .. attribute:: step_id
Output only. The name of the node.
:type: str
WorkflowTemplate
A Dataproc workflow template resource. .. attribute:: id
:type: str
WorkflowTemplatePlacement
Specifies workflow execution target.
Either managed_cluster
or cluster_selector
is required.
YarnApplication
A YARN application created by a job. Application information is a subset of org.apache.hadoop.yarn.proto.YarnProtos.ApplicationReportProto.
Beta Feature: This report is available for testing purposes only. It may be changed before final release.