HEX
Server: Apache/2.4.65 (Ubuntu)
System: Linux ielts-store-v2 6.8.0-1036-gcp #38~22.04.1-Ubuntu SMP Thu Aug 14 01:19:18 UTC 2025 x86_64
User: root (0)
PHP: 7.2.34-54+ubuntu20.04.1+deb.sury.org+1
Disabled: pcntl_alarm,pcntl_fork,pcntl_waitpid,pcntl_wait,pcntl_wifexited,pcntl_wifstopped,pcntl_wifsignaled,pcntl_wifcontinued,pcntl_wexitstatus,pcntl_wtermsig,pcntl_wstopsig,pcntl_signal,pcntl_signal_get_handler,pcntl_signal_dispatch,pcntl_get_last_error,pcntl_strerror,pcntl_sigprocmask,pcntl_sigwaitinfo,pcntl_sigtimedwait,pcntl_exec,pcntl_getpriority,pcntl_setpriority,pcntl_async_signals,
Upload Files
File: //snap/google-cloud-cli/396/help/man/man1/gcloud_beta_dataproc_jobs_submit_pyspark.1
.TH "GCLOUD_BETA_DATAPROC_JOBS_SUBMIT_PYSPARK" 1



.SH "NAME"
.HP
gcloud beta dataproc jobs submit pyspark \- submit a PySpark job to a cluster



.SH "SYNOPSIS"
.HP
\f5gcloud beta dataproc jobs submit pyspark\fR \fIPY_FILE\fR (\fB\-\-cluster\fR=\fICLUSTER\fR\ |\ \fB\-\-cluster\-labels\fR=[\fIKEY\fR=\fIVALUE\fR,...]) [\fB\-\-archives\fR=[\fIARCHIVE\fR,...]] [\fB\-\-async\fR] [\fB\-\-bucket\fR=\fIBUCKET\fR] [\fB\-\-driver\-log\-levels\fR=[\fIPACKAGE\fR=\fILEVEL\fR,...]] [\fB\-\-driver\-required\-memory\-mb\fR=\fIDRIVER_REQUIRED_MEMORY_MB\fR] [\fB\-\-driver\-required\-vcores\fR=\fIDRIVER_REQUIRED_VCORES\fR] [\fB\-\-files\fR=[\fIFILE\fR,...]] [\fB\-\-jars\fR=[\fIJAR\fR,...]] [\fB\-\-labels\fR=[\fIKEY\fR=\fIVALUE\fR,...]] [\fB\-\-max\-failures\-per\-hour\fR=\fIMAX_FAILURES_PER_HOUR\fR] [\fB\-\-max\-failures\-total\fR=\fIMAX_FAILURES_TOTAL\fR] [\fB\-\-properties\fR=[\fIPROPERTY\fR=\fIVALUE\fR,...]] [\fB\-\-properties\-file\fR=\fIPROPERTIES_FILE\fR] [\fB\-\-py\-files\fR=[\fIPY_FILE\fR,...]] [\fB\-\-region\fR=\fIREGION\fR] [\fIGCLOUD_WIDE_FLAG\ ...\fR] [\-\-\ \fIJOB_ARGS\fR\ ...]



.SH "DESCRIPTION"

\fB(BETA)\fR Submit a PySpark job to a cluster.



.SH "EXAMPLES"

To submit a PySpark job with a local script and custom flags, run:

.RS 2m
$ gcloud beta dataproc jobs submit pyspark \-\-cluster=my\-cluster \e
    my_script.py \-\- \-\-custom\-flag
.RE

To submit a Spark job that runs a script that is already on the cluster, run:

.RS 2m
$ gcloud beta dataproc jobs submit pyspark \-\-cluster=my\-cluster \e
    file:///usr/lib/spark/examples/src/main/python/pi.py \-\- 100
.RE



.SH "POSITIONAL ARGUMENTS"

.RS 2m
.TP 2m
\fIPY_FILE\fR

Main .py file to run as the driver.

.TP 2m
[\-\- \fIJOB_ARGS\fR ...]

Arguments to pass to the driver.

The '\-\-' argument must be specified between gcloud specific args on the left
and JOB_ARGS on the right.


.RE
.sp

.SH "REQUIRED FLAGS"

.RS 2m
.TP 2m

Exactly one of these must be specified:


.RS 2m
.TP 2m
\fB\-\-cluster\fR=\fICLUSTER\fR

The Dataproc cluster to submit the job to.

.TP 2m
\fB\-\-cluster\-labels\fR=[\fIKEY\fR=\fIVALUE\fR,...]

List of label KEY=VALUE pairs to add.

Keys must start with a lowercase character and contain only hyphens (\f5\-\fR),
underscores (\f5_\fR), lowercase characters, and numbers. Values must contain
only hyphens (\f5\-\fR), underscores (\f5_\fR), lowercase characters, and
numbers.

Labels of Dataproc cluster on which to place the job.


.RE
.RE
.sp

.SH "OPTIONAL FLAGS"

.RS 2m
.TP 2m
\fB\-\-archives\fR=[\fIARCHIVE\fR,...]

Comma separated list of archives to be extracted into the working directory of
each executor. Must be one of the following file formats: .zip, .tar, .tar.gz,
or .tgz.

.TP 2m
\fB\-\-async\fR

Return immediately, without waiting for the operation in progress to complete.

.TP 2m
\fB\-\-bucket\fR=\fIBUCKET\fR

The Cloud Storage bucket to stage files in. Defaults to the cluster's configured
bucket.

.TP 2m
\fB\-\-driver\-log\-levels\fR=[\fIPACKAGE\fR=\fILEVEL\fR,...]

List of key value pairs to configure driver logging, where key is a package and
value is the log4j log level. For example: root=FATAL,com.example=INFO

.TP 2m
\fB\-\-driver\-required\-memory\-mb\fR=\fIDRIVER_REQUIRED_MEMORY_MB\fR

The memory allocation requested by the job driver in megabytes (MB) for
execution on the driver node group (it is used only by clusters with a driver
node group).

.TP 2m
\fB\-\-driver\-required\-vcores\fR=\fIDRIVER_REQUIRED_VCORES\fR

The vCPU allocation requested by the job driver for execution on the driver node
group (it is used only by clusters with a driver node group).

.TP 2m
\fB\-\-files\fR=[\fIFILE\fR,...]

Comma separated list of files to be placed in the working directory of both the
app driver and executors.

.TP 2m
\fB\-\-jars\fR=[\fIJAR\fR,...]

Comma separated list of jar files to be provided to the executor and driver
classpaths.

.TP 2m
\fB\-\-labels\fR=[\fIKEY\fR=\fIVALUE\fR,...]

List of label KEY=VALUE pairs to add.

Keys must start with a lowercase character and contain only hyphens (\f5\-\fR),
underscores (\f5_\fR), lowercase characters, and numbers. Values must contain
only hyphens (\f5\-\fR), underscores (\f5_\fR), lowercase characters, and
numbers.

.TP 2m
\fB\-\-max\-failures\-per\-hour\fR=\fIMAX_FAILURES_PER_HOUR\fR

Specifies the maximum number of times a job can be restarted per hour in event
of failure. Default is 0 (no retries after job failure).

.TP 2m
\fB\-\-max\-failures\-total\fR=\fIMAX_FAILURES_TOTAL\fR

Specifies the maximum total number of times a job can be restarted after the job
fails. Default is 0 (no retries after job failure).

.TP 2m
\fB\-\-properties\fR=[\fIPROPERTY\fR=\fIVALUE\fR,...]

List of key value pairs to configure PySpark. For a list of available
properties, see:
https://spark.apache.org/docs/latest/configuration.html#available\-properties.

.TP 2m
\fB\-\-properties\-file\fR=\fIPROPERTIES_FILE\fR

Path to a local file or a file in a Cloud Storage bucket containing
configuration properties for the job. The client machine running this command
must have read permission to the file.

Specify properties in the form of property=value in the text file. For example:

.RS 2m
  # Properties to set for the job:
  key1=value1
  key2=value2
  # Comment out properties not used.
  # key3=value3
.RE

If a property is set in both \f5\-\-properties\fR and
\f5\-\-properties\-file\fR, the value defined in \f5\-\-properties\fR takes
precedence.

.TP 2m
\fB\-\-py\-files\fR=[\fIPY_FILE\fR,...]

Comma separated list of Python files to be provided to the job. Must be one of
the following file formats ".py, .zip, or .egg".

.TP 2m
\fB\-\-region\fR=\fIREGION\fR

Dataproc region to use. Each Dataproc region constitutes an independent resource
namespace constrained to deploying instances into Compute Engine zones inside
the region. Overrides the default \fBdataproc/region\fR property value for this
command invocation.


.RE
.sp

.SH "GCLOUD WIDE FLAGS"

These flags are available to all commands: \-\-access\-token\-file, \-\-account,
\-\-billing\-project, \-\-configuration, \-\-flags\-file, \-\-flatten,
\-\-format, \-\-help, \-\-impersonate\-service\-account, \-\-log\-http,
\-\-project, \-\-quiet, \-\-trace\-token, \-\-user\-output\-enabled,
\-\-verbosity.

Run \fB$ gcloud help\fR for details.



.SH "NOTES"

This command is currently in beta and might change without notice. These
variants are also available:

.RS 2m
$ gcloud dataproc jobs submit pyspark
.RE

.RS 2m
$ gcloud alpha dataproc jobs submit pyspark
.RE