Dataproc Serverless pricing

Dataproc Serverless for Spark pricing is based on the number of Data Compute Units (DCUs), the number of accelerators used, and the amount of shuffle storage used. DCUs, accelerators, and shuffle storage are billed per second, with a 1-minute minimum charge for DCUs and shuffle storage, and a 5-minute minimum charge for accelerators.

Each Dataproc vCPU counts as 0.6 DCU. RAM is charged differently below and above 8GB. Each gigabyte of RAM below 8G gigabyte per vCPU counts as 0.1 DCU, and each gigabyte of RAM above 8G gigabyte per vCPU counts as 0.2 DCU. Memory used by Spark drivers and executors and system memory usage are counted towards DCU usage.

By default, each Dataproc Serverless for Spark batch and interactive workload consumes a minimum of 12 DCUs for the duration of the workload: the driver uses 4 vCPUs and 16GB of RAM and consumes 4 DCUs, and each of the 2 executors uses 4 vCPUs and 16GB of RAM and consumes 4 DCUs. You can customize the number of vCPUs and the amount of memory per vCPU by setting Spark properties. No additional Compute Engine VM or Persistent Disk charges apply.

Data Compute Unit (DCU) pricing

The DCU rate shown below is an hourly rate. It is prorated and billed per second, with a 1-minute minimum charge. If you pay in a currency other than USD, the prices listed in your currency on Cloud Platform SKUs apply.

Dataproc Serverless for Spark interactive workload is charged at Premium.

Shuffle storage pricing

The shuffle storage rate shown below is a monthly rate. It is prorated and billed per second, with a 1-minute minimum charge for standard shuffle storage and a 5-minute minimum charge for Premium shuffle storage. Premium shuffle storage can only be used with Premium Compute Unit.

If you pay in a currency other than USD, the prices listed in your currency on Cloud Platform SKUs apply.

Accelerator pricing

The accelerator rate shown below is an hourly rate. It is prorated and billed per second, with a 5-minute minimum charge. If you pay in a currency other than USD, the prices listed in your currency on Cloud Platform SKUs apply.

Pricing example

If the Dataproc Serverless for Spark batch workload runs with 12 DCUs (spark.driver.cores=4,spark.executor.cores=4,spark.executor.instances=2) for 24 hours in the us-central1 region and consumes 25GB of shuffle storage, the price calculation is as follows.

Total compute cost = 12 * 24 * $0.060000 = $17.28
Total storage cost = 25 * ($0.040/301) = $0.03
------------------------------------------------
Total cost = $17.28 + $0.03 = $17.31

Notes:

  1. The example assumes a 30-day month. Since the batch workload duration is one day, the monthly shuffle storage rate is divided by 30.

If the Dataproc Serverless for Spark batch workload runs with 12 DCUs and 2 L4 GPUs (spark.driver.cores=4,spark.executor.cores=4, spark.executor.instances=2,spark.dataproc.driver.compute.tier=premium, spark.dataproc.executor.compute.tier=premium, spark.dataproc.executor.disk.tier=premium, spark.dataproc.executor.resource.accelerator.type=l4) for 24 hours in the us-central1 region and consumes 25GB of shuffle storage, the price calculation is as follows.

Total compute cost = 12 * 24 * $0.089000 = $25.632
Total storage cost = 25 * ($0.1/301) = $0.083
Total accelerator cost = 2 * 24 * $0.6720 = $48.39
------------------------------------------------
Total cost = $25.632 + $0.083 + $48.39 = $74.105

Notes:

  1. The example assumes a 30-day month. Since the batch workload duration is one day, the monthly shuffle storage rate is divided by 30.

If the Dataproc Serverless for Spark interactive workload runs with 12 DCUs (spark.driver.cores=4,spark.executor.cores=4,spark.executor.instances=2) for 24 hours in the us-central1 region and consumes 25GB of shuffle storage, the price calculation is as follows:

Total compute cost = 12 * 24 * $0.089000 = $25.632
Total storage cost = 25 * ($0.040/301) = $0.03
------------------------------------------------
Total cost = $25.632 + $0.03 = $25.662

Notes:

  1. The example assumes a 30-day month. Since the batch workload duration is one day, the monthly shuffle storage rate is divided by 30.

Pricing estimation example

When a batch workload completes, Dataproc Serverless for Spark calculates UsageMetrics, which contain an approximation of the total DCU, accelerator, and shuffle storage resources consumed by the completed workload. After running a workload, you can run the gcloud dataproc batches describe BATCH_ID command to view workload usage metrics to help you estimate the cost of running the workload.

Example:

Dataproc Serverless for Spark runs a workload on an ephemeral cluster with one master and two workers. Each node consumes 4 DCUs (default is 4 DCUs per core—see spark.dataproc.driver.disk.size) and 400 GB shuffle storage (default is 100GB per core—see spark.driver.cores). Workload run time is 60 seconds. Also, each worker has 1 GPU for a total of 2 across the cluster.

The user runs gcloud dataproc batches describe BATCH_ID --region REGION to obtain usage metrics. The command output includes the following snippet (milliDcuSeconds: 4 DCUs x 3 VMs x 60 seconds x 1000 = 720000, milliAcceleratorSeconds: 1 GPU x 2 VMs x 60 seconds x 1000 = 120000, and shuffleStorageGbSeconds: 400GB x 3 VMs x 60 seconds = 72000):

runtimeInfo:
  approximateUsage:
    milliDcuSeconds: '720000'
    shuffleStorageGbSeconds: '72000'
    milliAcceleratorSeconds: '120000'

Use of other Google Cloud resources

Your Dataproc Serverless for Spark workload can optionally utilize the following resources, each billed at its own pricing, including but not limited to:

What's next

Request a custom quote

With Google Cloud's pay-as-you-go pricing, you only pay for the services you use. Connect with our sales team to get a custom quote for your organization.
Contact sales