ClusterConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)
The cluster config. .. attribute:: config_bucket
Optional. A Cloud Storage bucket used to stage job
dependencies, config files, and job driver console output.
If you do not specify a staging bucket, Cloud Dataproc will
determine a Cloud Storage location (US, ASIA, or EU) for
your cluster's staging bucket according to the Compute
Engine zone where your cluster is deployed, and then create
and manage this project-level, per-location bucket (see
Dataproc staging
bucket <https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/staging-bucket>
__).
This field requires a Cloud Storage bucket name, not a URI
to a Cloud Storage bucket.
:type: str
Attributes | |
---|---|
Name | Description |
temp_bucket |
str
Optional. A Cloud Storage bucket used to store ephemeral cluster and jobs data, such as Spark and MapReduce history files. If you do not specify a temp bucket, Dataproc will determine a Cloud Storage location (US, ASIA, or EU) for your cluster's temp bucket according to the Compute Engine zone where your cluster is deployed, and then create and manage this project-level, per-location bucket. The default bucket has a TTL of 90 days, but you can use any TTL (or none) if you specify a bucket. **This field requires a Cloud Storage bucket name, not a URI to a Cloud Storage bucket.** |
gce_cluster_config |
google.cloud.dataproc_v1.types.GceClusterConfig
Optional. The shared Compute Engine config settings for all instances in a cluster. |
master_config |
google.cloud.dataproc_v1.types.InstanceGroupConfig
Optional. The Compute Engine config settings for the master instance in a cluster. |
worker_config |
google.cloud.dataproc_v1.types.InstanceGroupConfig
Optional. The Compute Engine config settings for worker instances in a cluster. |
secondary_worker_config |
google.cloud.dataproc_v1.types.InstanceGroupConfig
Optional. The Compute Engine config settings for additional worker instances in a cluster. |
software_config |
google.cloud.dataproc_v1.types.SoftwareConfig
Optional. The config settings for software inside the cluster. |
initialization_actions |
Sequence[google.cloud.dataproc_v1.types.NodeInitializationAction]
Optional. Commands to execute on each node after config is completed. By default, executables are run on master and all worker nodes. You can test a node's role metadata to run
an executable on a master or worker node, as shown below
using curl (you can also use wget ):
::
ROLE=$(curl -H Metadata-Flavor:Google
http://metadata/computeMetadata/v1/instance/attributes/dataproc-role)
if [[ "${ROLE}" == 'Master' ]]; then
... master specific actions ...
else
... worker specific actions ...
fi
|
encryption_config |
google.cloud.dataproc_v1.types.EncryptionConfig
Optional. Encryption settings for the cluster. |
autoscaling_config |
google.cloud.dataproc_v1.types.AutoscalingConfig
Optional. Autoscaling config for the policy associated with the cluster. Cluster does not autoscale if this field is unset. |
security_config |
google.cloud.dataproc_v1.types.SecurityConfig
Optional. Security settings for the cluster. |
lifecycle_config |
google.cloud.dataproc_v1.types.LifecycleConfig
Optional. Lifecycle setting for the cluster. |
endpoint_config |
google.cloud.dataproc_v1.types.EndpointConfig
Optional. Port/endpoint configuration for this cluster |
metastore_config |
google.cloud.dataproc_v1.types.MetastoreConfig
Optional. Metastore configuration. |
gke_cluster_config |
google.cloud.dataproc_v1.types.GkeClusterConfig
Optional. BETA. The Kubernetes Engine config for Dataproc clusters deployed to Kubernetes. Setting this is considered mutually exclusive with Compute Engine-based options such as gce_cluster_config , master_config ,
worker_config , secondary_worker_config , and
autoscaling_config .
|