Dataproc sets special metadata values for the instances that run in your cluster:
Metadata key | Value |
---|---|
dataproc-bucket | Name of the cluster's staging bucket |
dataproc-region | Region of the cluster's endpoint |
dataproc-worker-count | Number of worker nodes in the cluster. The value is 0 for single node clusters. |
dataproc-cluster-name | Name of the cluster |
dataproc-cluster-uuid | UUID of the cluster |
dataproc-role | Instance's role, either Master or Worker |
dataproc-master | Hostname of the first master node. The value is either [CLUSTER_NAME]-m in a standard or single node cluster, or [CLUSTER_NAME]-m-0 in a high-availability cluster, where [CLUSTER_NAME] is the name of your cluster. |
dataproc-master-additional | Comma-separated list of hostnames for the additional master nodes in a high-availability cluster, for example, [CLUSTER_NAME]-m-1,[CLUSTER_NAME]-m-2 in a cluster that has 3 master nodes. |
You can use these values to customize the behavior of initialization actions.
You can also use the --metadata
flag of the gcloud dataproc clusters create
command in the gcloud CLI to provide your own custom metadata:
gcloud dataproc clusters create cluster-name \ --region=region \ --metadata=name1=value1,name2=value2... \ ... other flags ...