Class PySparkBatch (4.0.3)

PySparkBatch(mapping=None, *, ignore_unknown_fields=False, **kwargs)

A configuration for running an Apache PySpark <https://spark.apache.org/docs/latest/api/python/getting_started/quickstart.html>__ batch workload.

Attributes
Name	Description
`main_python_file_uri`	`str` Required. The HCFS URI of the main Python file to use as the Spark driver. Must be a .py file.
`args`	`Sequence[str]` Optional. The arguments to pass to the driver. Do not include arguments that can be set as batch properties, such as `--conf`, since a collision can occur that causes an incorrect batch submission.
`python_file_uris`	`Sequence[str]` Optional. HCFS file URIs of Python files to pass to the PySpark framework. Supported file types: `.py`, `.egg`, and `.zip`.
`jar_file_uris`	`Sequence[str]` Optional. HCFS URIs of jar files to add to the classpath of the Spark driver and tasks.
`file_uris`	`Sequence[str]` Optional. HCFS URIs of files to be placed in the working directory of each executor.
`archive_uris`	`Sequence[str]` Optional. HCFS URIs of archives to be extracted into the working directory of each executor. Supported file types: `.jar`, `.tar`, `.tar.gz`, `.tgz`, and `.zip`.

Attributes