HEX

File: //snap/google-cloud-cli/394/help/man/man1/gcloud_beta_dataproc_batches_submit_spark.1
.TH "GCLOUD_BETA_DATAPROC_BATCHES_SUBMIT_SPARK" 1



.SH "NAME"
.HP
gcloud beta dataproc batches submit spark \- submit a Spark batch job



.SH "SYNOPSIS"
.HP
\f5gcloud beta dataproc batches submit spark\fR  (\fB\-\-class\fR=\fIMAIN_CLASS\fR\ |\ \fB\-\-jar\fR=\fIMAIN_JAR\fR) [\fB\-\-archives\fR=[\fIARCHIVE\fR,...]] [\fB\-\-async\fR] [\fB\-\-batch\fR=\fIBATCH\fR] [\fB\-\-container\-image\fR=\fICONTAINER_IMAGE\fR] [\fB\-\-deps\-bucket\fR=\fIDEPS_BUCKET\fR] [\fB\-\-files\fR=[\fIFILE\fR,...]] [\fB\-\-history\-server\-cluster\fR=\fIHISTORY_SERVER_CLUSTER\fR] [\fB\-\-jars\fR=[\fIJAR\fR,...]] [\fB\-\-kms\-key\fR=\fIKMS_KEY\fR] [\fB\-\-labels\fR=[\fIKEY\fR=\fIVALUE\fR,...]] [\fB\-\-metastore\-service\fR=\fIMETASTORE_SERVICE\fR] [\fB\-\-properties\fR=[\fIPROPERTY\fR=\fIVALUE\fR,...]] [\fB\-\-region\fR=\fIREGION\fR] [\fB\-\-request\-id\fR=\fIREQUEST_ID\fR] [\fB\-\-service\-account\fR=\fISERVICE_ACCOUNT\fR] [\fB\-\-staging\-bucket\fR=\fISTAGING_BUCKET\fR] [\fB\-\-tags\fR=[\fITAGS\fR,...]] [\fB\-\-ttl\fR=\fITTL\fR] [\fB\-\-user\-workload\-authentication\-type\fR=\fIUSER_WORKLOAD_AUTHENTICATION_TYPE\fR] [\fB\-\-version\fR=\fIVERSION\fR] [\fB\-\-network\fR=\fINETWORK\fR\ |\ \fB\-\-subnet\fR=\fISUBNET\fR] [\fIGCLOUD_WIDE_FLAG\ ...\fR] [\-\-\ \fIJOB_ARG\fR\ ...]



.SH "DESCRIPTION"

\fB(BETA)\fR Submit a Spark batch job.



.SH "EXAMPLES"

To submit a Spark job, run:

.RS 2m
$ gcloud beta dataproc batches submit spark \-\-region=us\-central1 \e
    \-\-jar=my_jar.jar \-\-deps\-bucket=gs://my\-bucket \-\- ARG1 ARG2
.RE

To submit a Spark job that runs a specific class of a jar, run:

.RS 2m
$ gcloud beta dataproc batches submit spark \-\-region=us\-central1 \e
    \-\-class=org.my.main.Class \-\-jars=my_jar1.jar,my_jar2.jar \e
    \-\-deps\-bucket=gs://my\-bucket \-\- ARG1 ARG2
.RE

To submit a Spark job that runs a jar installed on the cluster, run:

.RS 2m
$ gcloud beta dataproc batches submit spark \-\-region=us\-central1 \e
    \-\-class=org.apache.spark.examples.SparkPi \e
    \-\-deps\-bucket=gs://my\-bucket \e
    \-\-jars=file:///usr/lib/spark/examples/jars/spark\-examples.jar \e
    \-\- 15
.RE



.SH "POSITIONAL ARGUMENTS"

.RS 2m
.TP 2m
[\-\- \fIJOB_ARG\fR ...]

Arguments to pass to the driver.

The '\-\-' argument must be specified between gcloud specific args on the left
and JOB_ARG on the right.


.RE
.sp

.SH "REQUIRED FLAGS"

.RS 2m
.TP 2m

Exactly one of these must be specified:


.RS 2m
.TP 2m
\fB\-\-class\fR=\fIMAIN_CLASS\fR

Class contains the main method of the job. The jar file that contains the class
must be in the classpath or specified in \f5jar_files\fR.

.TP 2m
\fB\-\-jar\fR=\fIMAIN_JAR\fR

URI of the main jar file.


.RE
.RE
.sp

.SH "OPTIONAL FLAGS"

.RS 2m
.TP 2m
\fB\-\-archives\fR=[\fIARCHIVE\fR,...]

Archives to be extracted into the working directory. Supported file types: .jar,
.tar, .tar.gz, .tgz, and .zip.

.TP 2m
\fB\-\-async\fR

Return immediately without waiting for the operation in progress to complete.

.TP 2m
\fB\-\-batch\fR=\fIBATCH\fR

The ID of the batch job to submit. The ID must contain only lowercase letters
(a\-z), numbers (0\-9) and hyphens (\-). The length of the name must be between
4 and 63 characters. If this argument is not provided, a random generated UUID
will be used.

.TP 2m
\fB\-\-container\-image\fR=\fICONTAINER_IMAGE\fR

Optional custom container image to use for the batch/session runtime
environment. If not specified, a default container image will be used. The value
should follow the container image naming format:
{registry}/{repository}/{name}:{tag}, for example,
gcr.io/my\-project/my\-image:1.2.3

.TP 2m
\fB\-\-deps\-bucket\fR=\fIDEPS_BUCKET\fR

A Cloud Storage bucket to upload workload dependencies.

.TP 2m
\fB\-\-files\fR=[\fIFILE\fR,...]

Files to be placed in the working directory.

.TP 2m
\fB\-\-history\-server\-cluster\fR=\fIHISTORY_SERVER_CLUSTER\fR

Spark History Server configuration for the batch/session job. Resource name of
an existing Dataproc cluster to act as a Spark History Server for the workload
in the format: "projects/{project_id}/regions/{region}/clusters/{cluster_name}".

.TP 2m
\fB\-\-jars\fR=[\fIJAR\fR,...]

Comma\-separated list of jar files to be provided to the classpaths.

.TP 2m
\fB\-\-kms\-key\fR=\fIKMS_KEY\fR

Cloud KMS key to use for encryption.

.TP 2m
\fB\-\-labels\fR=[\fIKEY\fR=\fIVALUE\fR,...]

List of label KEY=VALUE pairs to add.

Keys must start with a lowercase character and contain only hyphens (\f5\-\fR),
underscores (\f5_\fR), lowercase characters, and numbers. Values must contain
only hyphens (\f5\-\fR), underscores (\f5_\fR), lowercase characters, and
numbers.

.TP 2m
\fB\-\-metastore\-service\fR=\fIMETASTORE_SERVICE\fR

Name of a Dataproc Metastore service to be used as an external metastore in the
format: "projects/{project\-id}/locations/{region}/services/{service\-name}".

.TP 2m
\fB\-\-properties\fR=[\fIPROPERTY\fR=\fIVALUE\fR,...]

Specifies configuration properties for the workload. See Dataproc Serverless for
Spark documentation
(https://cloud.google.com/dataproc\-serverless/docs/concepts/properties) for the
list of supported properties.

.TP 2m

Region resource \- Dataproc region to use. Each Dataproc region constitutes an
independent resource namespace constrained to deploying instances into Compute
Engine zones inside the region. This represents a Cloud resource. (NOTE) Some
attributes are not given arguments in this group but can be set in other ways.

To set the \f5project\fR attribute:
.RS 2m
.IP "\(em" 2m
provide the argument \f5\-\-region\fR on the command line with a fully specified
name;
.IP "\(em" 2m
set the property \f5dataproc/region\fR with a fully specified name;
.IP "\(em" 2m
provide the argument \f5\-\-project\fR on the command line;
.IP "\(em" 2m
set the property \f5core/project\fR.
.RE
.sp


.RS 2m
.TP 2m
\fB\-\-region\fR=\fIREGION\fR

ID of the region or fully qualified identifier for the region.

To set the \f5region\fR attribute:
.RS 2m
.IP "\(bu" 2m
provide the argument \f5\-\-region\fR on the command line;
.IP "\(bu" 2m
set the property \f5dataproc/region\fR.
.RE
.sp

.RE
.sp
.TP 2m
\fB\-\-request\-id\fR=\fIREQUEST_ID\fR

A unique ID that identifies the request. If the service receives two batch
create requests with the same request_id, the second request is ignored and the
operation that corresponds to the first batch created and stored in the backend
is returned. Recommendation: Always set this value to a UUID. The value must
contain only letters (a\-z, A\-Z), numbers (0\-9), underscores (\fI), and
hyphens (\-). The maximum length is 40 characters.

.TP 2m
\fB\-\-service\-account\fR=\fRSERVICE_ACCOUNT\fI

The IAM service account to be used for a batch/session job.

.TP 2m
\fB\-\-staging\-bucket\fR=\fRSTAGING_BUCKET\fI

The Cloud Storage bucket to use to store job dependencies, config files, and job
driver console output. If not specified, the default [staging bucket]
(https://cloud.google.com/dataproc\-serverless/docs/concepts/buckets) is used.

.TP 2m
\fB\-\-tags\fR=[\fRTAGS\fI,...]

Network tags for traffic control.

.TP 2m
\fB\-\-ttl\fR=\fRTTL\fI

The duration after the workload will be unconditionally terminated, for example,
\'20m' or '1h'. Run gcloud topic datetimes
(https://cloud.google.com/sdk/gcloud/reference/topic/datetimes) for information
on duration formats.

.TP 2m
\fB\-\-user\-workload\-authentication\-type\fR=\fRUSER_WORKLOAD_AUTHENTICATION_TYPE\fI

Whether to use END_USER_CREDENTIALS or SERVICE_ACCOUNT to run the workload.

.TP 2m
\fB\-\-version\fR=\fRVERSION\fI

Optional runtime version. If not specified, a default version will be used.

.TP 2m

At most one of these can be specified:


.RS 2m
.TP 2m
\fB\-\-network\fR=\fRNETWORK\fI

Network URI to connect network to.

.TP 2m
\fB\-\-subnet\fR=\fRSUBNET\fI

Subnetwork URI to connect network to. Subnet must have Private Google Access
enabled.


\fR
.RE
.RE
.sp

.SH "GCLOUD WIDE FLAGS"

These flags are available to all commands: \-\-access\-token\-file, \-\-account,
\-\-billing\-project, \-\-configuration, \-\-flags\-file, \-\-flatten,
\-\-format, \-\-help, \-\-impersonate\-service\-account, \-\-log\-http,
\-\-project, \-\-quiet, \-\-trace\-token, \-\-user\-output\-enabled,
\-\-verbosity.

Run \fB$ gcloud help\fR for details.



.SH "NOTES"

This command is currently in beta and might change without notice. This variant
is also available:

.RS 2m
$ gcloud dataproc batches submit spark
.RE