This page lists services that Dataproc image versions run on Dataproc cluster nodes.
All nodes
The following services run on all nodes in a cluster.
Node type | Service | Image versions | Description |
---|---|---|---|
All nodes | google-dataproc-agent | all | Receives jobs from Dataproc and launches job drivers |
google-fluentd | all | Collects and pushes logs to Logging |
Standard clusters
The following services run on standard clusters.
Node type | Service | Image versions | Description |
---|---|---|---|
All nodes | hadoop-hdfs-namenode | all | Manages the HDFS filesystem |
hadoop-hdfs-secondarynamenode | all | Checkpoints the NameNode | |
hadoop-mapreduce-historyserver | all | Serves mapreduce application history information | |
hadoop-yarn-resourcemanager | all | Schedules and manages YARN applications | |
hadoop-yarn-timelineserver | 1.3+ | Serves YARN application history information | |
hive-metastore | all | Manages Hive table metadata. As a default, uses the local
mariadb (image versions < 1.5) or
mysql (image versions 1.5+) database
on the master node as the Hive table metadata store.
Using the default database is not recommended because these databases
are tied to the cluster's lifecycle. Instead, use either of the following as
the Hive metastore database (in recommendation order):
|
|
hive-server2 | all | Serves queries received from clients (primarily beeline shell queries) against Hive | |
mariadb | < 1.5 | A relational database used as the default underlying database for Hive metastore in Dataproc < 1.5 images | |
mysql | 1.5+ | A relational database used as the default underlying database for Hive metastore in Dataproc 1.5+ images | |
nfs-kernel-server | < 1.3 | NFS is the Network File System. | |
spark-history-server | all | Serves Spark application history information | |
All Workers | hadoop-yarn-nodemanager | all | Launches and manages YARN containers |
Primary Workers only | hadoop-hdfs-datanode | all | Stores HDFS blocks |
HA Clusters
In Dataproc High Availability (HA) clusters, different services run on different master nodes, as show below. HA cluster worker node services are the same as those listed for standard clusters.
Node type | Service | Image versions | Description |
---|---|---|---|
All masters | hadoop-hdfs-journalnode | all | A quorum of journal nodes maintains an edit log of HDFS namespace modifications. If a failover occurs, the Standby NameNode reads the edit log and takes control from the Active NameNode. |
hadoop-yarn-resourcemanager | all | Schedules and manages YARN applications | |
hive-metastore | all | Manages Hive table metadata. As a default, uses the local
mariadb (image versions < 1.5) or
mysql (image versions 1.5+) database
on the master node as the Hive table metadata store.
Using the default database is not recommended because these databases
are tied to the cluster's lifecycle. Instead, use either of the following as
the Hive metastore database (in recommendation order):
|
|
hive-server2 | all | Serves queries received from clients (primarily beeline shell queries) against Hive | |
zookeeper-server | all | A ZooKeeper quorum is used for distributed coordination. In High Availability (HA) clusters, it is used for HDFS NameNodes and YARN resource managers leader election. | |
Masters 0 and 1 only | hadoop-hdfs-namenode | all | Manages the HDFS filesystem |
hadoop-hdfs-zkfc | all | ZKFC is the ZKFailoverController process, which runs
with the HDFS NameNode. It monitors the health of the NameNode, and manages leader
election via ZooKeeper in the event of a failover. |
|
Master 0 only | hadoop-mapreduce-historyserver | all | Serves mapreduce application history information |
hadoop-yarn-timelineserver | 1.3+ | Serves YARN application history information | |
mariadb | < 1.5 | A relational database used as the default underlying database for Hive metastore in Dataproc < 1.5 images | |
mysql | 1.5+ | A relational database used as the default underlying database for Hive metastore in Dataproc 1.5+ images | |
nfs-kernel-server | < 1.3 | NFS is the Network File System. | |
spark-history-server | all | Serves Spark application history information |