Troubleshoot unsupported program type errors

This page describes how to resolve a known issue in Cloud Data Fusion 6.8.0 and 6.8.1 where a data pipeline fails with an unsupported program type error in Cloud Data Fusion. This issue is resolved in version 6.8.2.

To reduce the start time for pipelines, Cloud Data Fusion version 6.8.0 and 6.8.1 instances cache the artifacts that are required to start a pipeline in a Dataproc cluster inside a Cloud Storage bucket. One of these cached artifacts is application.jar. Depending on the order in which you run your pipelines, some pipelines might fail with the following error:

Unsupported program type: Spark

For example, after you create a new 6.8.1 instance (or upgrade to 6.8.1), the first time that you run a pipeline that only contains actions, it succeeds. However, the next pipeline runs, which include sources or sinks, might fail with this error.

Recommendation

To resolve this issue, do either of the following:

You can disable caching for any of the following:

  • For all pipelines in an instance.
  • For a given namespace.
  • For the specific Dataproc profiles that contain the failing pipelines.
  • For only the failing pipelines.

Disable Cloud Storage caching for all pipelines in an instance

To disable Cloud Storage caching for all pipelines in an instance, follow these steps:

Console

  1. Go to your instance:
    1. In the Google Cloud console, go to the Cloud Data Fusion page.

    2. To open the instance in the Cloud Data Fusion Studio, click Instances, and then click View instance.

      Go to Instances

  2. Click System Admin > System Preferences and set the value for system.profile.properties.gcsCacheEnabled to false.

    Preferences dialog

REST API

To set system.profile.properties.gcsCacheEnabled to false, see Set preferences.

Disable Cloud Storage caching for a given namespace

To disable Cloud Storage caching for a given namespace, follow these steps:

Console

  1. Go to your instance:
    1. In the Google Cloud console, go to the Cloud Data Fusion page.

    2. To open the instance in the Cloud Data Fusion Studio, click Instances, and then click View instance.

      Go to Instances

  2. Click System Admin > Namespaces and select your namespace.
  3. Click Preferences > Edit and set the value for system.profile.properties.gcsCacheEnabled to false.

    Namespace preferences dialog

REST API

To set this through the REST API, see Set preferences.

Disable Cloud Storage caching for a Dataproc profile

To disable Cloud Storage caching for the specific Dataproc profiles that contain the failing pipelines, follow these steps:

Console

  • Set gcsCacheEnabled to false in the Dataproc profile.

Disable Cloud Storage caching for only the failing pipelines

To disable Cloud Storage caching for only the failing pipelines, follow these steps:

Console

  1. Go to your instance:
    1. In the Google Cloud console, go to the Cloud Data Fusion page.

    2. To open the instance in the Cloud Data Fusion Studio, click Instances, and then click View instance.

      Go to Instances

  2. Click List and select the failing pipeline.
  3. Click Expand next to Run and set the runtime argument system.profile.properties.gcsCacheEnabled to false.
  4. Repeat for any other failing pipelines.

Runtime dialog

REST API

Cloud Storage caching can be disabled when starting a pipeline through REST API and also by optionally specifying runtime arguments as a JSON map in the request body. For more information, see Start a program.