The cluster config.
Optional. A Cloud Storage bucket used to store ephemeral cluster and jobs data, such as Spark and MapReduce history files. If you do not specify a temp bucket, Dataproc will determine a Cloud Storage location (US, ASIA, or EU) for your cluster’s temp bucket according to the Compute Engine zone where your cluster is deployed, and then create and manage this project-level, per-location bucket. The default bucket has a TTL of 90 days, but you can use any TTL (or none) if you specify a bucket.
Optional. The Compute Engine config settings for the master instance in a cluster.
Optional. The Compute Engine config settings for additional worker instances in a cluster.
Optional. Commands to execute on each node after config is
completed. By default, executables are run on master and all
worker nodes. You can test a node’s role
metadata to run
an executable on a master or worker node, as shown below using
curl
(you can also use wget
): :: ROLE=$(curl -H
Metadata-Flavor:Google http://metadata/computeMetadata/v1/i
nstance/attributes/dataproc-role) if [[ "${ROLE}" ==
'Master' ]]; then ... master specific actions ... else
... worker specific actions ... fi
Optional. Autoscaling config for the policy associated with the cluster. Cluster does not autoscale if this field is unset.
Optional. Lifecycle setting for the cluster.