The right fitting feature uses Apache Beam resource hints to customize worker resources for a pipeline. The ability to target resources to specific pipeline steps provides additional pipeline flexibility and capability, and potential cost savings. You can apply more costly resources to pipeline steps that require them, and less costly resources to other pipeline steps. Use right fitting to specify resource requirements for an entire pipeline or for specific pipeline steps.
Support and limitations
- Resource hints are supported with the Apache Beam Java and Python SDKs, versions 2.31.0 and later.
- Right fitting is supported with batch pipelines. Streaming pipelines aren't supported.
- Right fitting supports Dataflow Prime.
- Right fitting doesn't support FlexRS.
- When you use right fitting, don't use the
worker_accelerator
service option.
Enable right fitting
To turn on right fitting, use one or more of the available resource hints in your pipeline. When you use a resource hint in your pipeline, right fitting is automatically enabled. For more information, see the Use resource hints section of this document.
Available resource hints
The following resource hints are available.
Resource hint | Description |
---|---|
min_ram |
The minimum amount of RAM in gigabytes to allocate to workers. Dataflow uses this value as a lower limit when allocating memory to new workers (horizontal scaling) or to existing workers (vertical scaling). For example: min_ram=NUMBERGB
|
accelerator |
A user-supplied allocation of GPUs that lets you control the use and cost of GPUs in your pipeline and its steps. Specify the type and number of GPUs to attach to Dataflow workers as parameters to the flag. For example: accelerator="type:GPU_TYPE;count:GPU_COUNT;machine_type:MACHINE_TYPE;CONFIGURATION_OPTIONS"
For more information about using GPUs, see GPUs with Dataflow. |
Resource hint nesting
Resource hints are applied to the pipeline transform hierarchy as follows:
min_ram
: The value on a transform is evaluated as the largestmin_ram
hint value among the values that are set on the transform itself and all of its parents in the transform's hierarchy.- Example: If an inner transform hint sets
min_ram
to 16 GB, and the outer transform hint in the hierarchy setsmin_ram
to 32 GB, a hint of 32 GB is used for all steps in the entire transform. - Example: If an inner transform hint sets
min_ram
to 16 GB, and the outer transform hint in the hierarchy setsmin_ram
to 8 GB, a hint of 8 GB is used for all steps in the outer transform that are not in the inner transform, and a 16 GB hint is used for all steps in the inner transform.
- Example: If an inner transform hint sets
accelerator
: The innermost value in the transform's hierarchy takes precedence.- Example: If an inner transform
accelerator
hint is different from an outer transformaccelerator
hint in a hierarchy, the inner transformaccelerator
hint is used for the inner transform.
- Example: If an inner transform
Hints that are set for the entire pipeline are treated as if they are set on a separate outermost transform.
Use resource hints
You can set resource hints on the entire pipeline or on pipeline steps.
Pipeline resource hints
You can set resource hints on the entire pipeline when you run the pipeline from the command line.
To set up your Python environment, see the Python quickstart.
Example:
python my_pipeline.py \
--runner=DataflowRunner \
--resource_hints=min_ram=numberGB \
--resource_hints=accelerator="type:type;count:number;install-nvidia-driver" \
...
Pipeline step resource hints
You can set resource hints on pipeline steps (transforms) programmatically.
Java
To install the Apache Beam SDK for Java, see Install the Apache Beam SDK.
You can set resource hints programmatically on pipeline transforms by using the
ResourceHints
class.
The following example demonstrates how to set resource hints programmatically on pipeline transforms.
pcoll.apply(MyCompositeTransform.of(...)
.setResourceHints(
ResourceHints.create()
.withMinRam("15GB")
.withAccelerator(
"type:nvidia-tesla-l4;count:1;install-nvidia-driver")))
pcoll.apply(ParDo.of(new BigMemFn())
.setResourceHints(
ResourceHints.create().withMinRam("30GB")))
To programmatically set resource hints on the entire pipeline, use the
ResourceHintsOptions
interface.
Python
To install the Apache Beam SDK for Python, see Install the Apache Beam SDK.
You can set resource hints programmatically on pipeline transforms by using the
PTransforms.with_resource_hints
class.
For more information, see the
ResourceHint
class.
The following example demonstrates how to set resource hints programmatically on pipeline transforms.
pcoll | MyPTransform().with_resource_hints(
min_ram="4GB",
accelerator="type:nvidia-tesla-l4;count:1;install-nvidia-driver")
pcoll | beam.ParDo(BigMemFn()).with_resource_hints(
min_ram="30GB")
To set resource hints on the entire pipeline, use the --resource_hints
pipeline option when you run your pipeline. For an example, see
Pipeline resource hints.
Go
Resource hints aren't supported in Go.
Right fitting and fusion
In some cases, transforms set with different resource hints can be executed on workers in the same worker pool, as part of the process of fusion optimization. When transforms are fused, Dataflow executes them in an environment that satisfies the union of resource hints set on the transforms.
When resource hints can't be merged, fusion doesn't occur. For example, resource hints for different GPUs aren't mergeable, so those transforms aren't fused.
You can also prevent fusion by adding an operation to your pipeline that forces
Dataflow to materialize an intermediate PCollection
. To learn
more, see
Prevent fusion.
Troubleshoot right fitting
This section provides instructions for troubleshooting common issues related to right fitting.
Invalid configuration
When you try to use right fitting, the following error occurs:
Workflow failed. Causes: One or more operations had an error: 'operation-OPERATION_ID':
[UNSUPPORTED_OPERATION] 'NUMBER vCpus with NUMBER MiB memory is
an invalid configuration for NUMBER count of 'GPU_TYPE' in family 'MACHINE_TYPE'.'.
This error occurs when the GPU type selected isn't compatible with the machine type selected. To resolve this error, select a compatible GPU type and machine type. For compatibility details, see GPU platforms.