File: //snap/google-cloud-cli/394/help/man/man1/gcloud_ai_model-garden_models_deploy.1
.TH "GCLOUD_AI_MODEL\-GARDEN_MODELS_DEPLOY" 1
.SH "NAME"
.HP
gcloud ai model\-garden models deploy \- deploy a model in Model Garden to a Vertex AI endpoint
.SH "SYNOPSIS"
.HP
\f5gcloud ai model\-garden models deploy\fR \fB\-\-model\fR=\fIMODEL\fR [\fB\-\-accelerator\-count\fR=\fIACCELERATOR_COUNT\fR] [\fB\-\-accelerator\-type\fR=\fIACCELERATOR_TYPE\fR] [\fB\-\-accept\-eula\fR] [\fB\-\-asynchronous\fR] [\fB\-\-container\-args\fR=[\fIARG\fR,...]] [\fB\-\-container\-command\fR=[\fICOMMAND\fR,...]] [\fB\-\-container\-deployment\-timeout\-seconds\fR=\fICONTAINER_DEPLOYMENT_TIMEOUT_SECONDS\fR] [\fB\-\-container\-env\-vars\fR=[\fIKEY\fR=\fIVALUE\fR,...]] [\fB\-\-container\-grpc\-ports\fR=[\fIPORT\fR,...]] [\fB\-\-container\-health\-probe\-exec\fR=[\fIHEALTH_PROBE_EXEC\fR,...]] [\fB\-\-container\-health\-probe\-period\-seconds\fR=\fICONTAINER_HEALTH_PROBE_PERIOD_SECONDS\fR] [\fB\-\-container\-health\-probe\-timeout\-seconds\fR=\fICONTAINER_HEALTH_PROBE_TIMEOUT_SECONDS\fR] [\fB\-\-container\-health\-route\fR=\fICONTAINER_HEALTH_ROUTE\fR] [\fB\-\-container\-image\-uri\fR=\fICONTAINER_IMAGE_URI\fR] [\fB\-\-container\-ports\fR=[\fIPORT\fR,...]] [\fB\-\-container\-predict\-route\fR=\fICONTAINER_PREDICT_ROUTE\fR] [\fB\-\-container\-shared\-memory\-size\-mb\fR=\fICONTAINER_SHARED_MEMORY_SIZE_MB\fR] [\fB\-\-container\-startup\-probe\-exec\fR=[\fISTARTUP_PROBE_EXEC\fR,...]] [\fB\-\-container\-startup\-probe\-period\-seconds\fR=\fICONTAINER_STARTUP_PROBE_PERIOD_SECONDS\fR] [\fB\-\-container\-startup\-probe\-timeout\-seconds\fR=\fICONTAINER_STARTUP_PROBE_TIMEOUT_SECONDS\fR] [\fB\-\-enable\-fast\-tryout\fR] [\fB\-\-endpoint\-display\-name\fR=\fIENDPOINT_DISPLAY_NAME\fR] [\fB\-\-hugging\-face\-access\-token\fR=\fIHUGGING_FACE_ACCESS_TOKEN\fR] [\fB\-\-machine\-type\fR=\fIMACHINE_TYPE\fR] [\fB\-\-region\fR=\fIREGION\fR] [\fB\-\-reservation\-affinity\fR=[\fIkey\fR=\fIKEY\fR],[\fIreservation\-affinity\-type\fR=\fIRESERVATION\-AFFINITY\-TYPE\fR],[\fIvalues\fR=\fIVALUES\fR]] [\fB\-\-spot\fR] [\fB\-\-use\-dedicated\-endpoint\fR] [\fIGCLOUD_WIDE_FLAG\ ...\fR]
.SH "EXAMPLES"
To deploy a Model Garden model \f5google/gemma2/gemma2\-9b\fR under project
\f5example\fR in region \f5us\-central1\fR, run:
.RS 2m
$ gcloud ai model\-garden models deploy \e
\-\-model=google/gemma2@gemma\-2\-9b \-\-project=example \e
\-\-region=us\-central1
.RE
To deploy a Hugging Face model \f5meta\-llama/Meta\-Llama\-3\-8B\fR under
project \f5example\fR in region \f5us\-central1\fR, run:
.RS 2m
$ gcloud ai model\-garden models deploy \e
\-\-model=meta\-llama/Meta\-Llama\-3\-8B \e
\-\-hugging\-face\-access\-token={hf_token} \-\-project=example \e
\-\-region=us\-central1
.RE
.SH "REQUIRED FLAGS"
.RS 2m
.TP 2m
\fB\-\-model\fR=\fIMODEL\fR
The model to be deployed. If it is a Model Garden model, it should be in the
format of \f5{publisher_name}/{model_name}@{model_version_name}, e.g.
\fRgoogle/gemma2@gemma\-2\-2b\f5. If it is a Hugging Face model, it should be in
the convention of Hugging Face models, e.g.
\fRmeta\-llama/Meta\-Llama\-3\-8B\f5. If it is a Custom Weights model, it should
be in the format of \fRgs://{gcs_bucket_uri}\f5, e.g.
\fRgs://\-model\-garden\-public\-us/llama3.1/Meta\-Llama\-3.1\-8B\-Instruct\f5.
\fR
.RE
.sp
.SH "OPTIONAL FLAGS"
.RS 2m
.TP 2m
\fB\-\-accelerator\-count\fR=\fIACCELERATOR_COUNT\fR
The accelerator count to serve the model. Accelerator count should be
non\-negative.
.TP 2m
\fB\-\-accelerator\-type\fR=\fIACCELERATOR_TYPE\fR
The accelerator type to serve the model. It should be a supported accelerator
type from the verified deployment configurations of the model. Use \f5gcloud ai
model\-garden models list\-deployment\-config\fR to check the supported
accelerator types.
.TP 2m
\fB\-\-accept\-eula\fR
When set, the user accepts the End User License Agreement (EULA) of the model.
.TP 2m
\fB\-\-asynchronous\fR
If set to true, the command will terminate immediately and not keep polling the
operation status.
.TP 2m
\fB\-\-container\-args\fR=[\fIARG\fR,...]
Comma\-separated arguments passed to the command run by the container image. If
not specified and no \f5\-\-command\fR is provided, the container image's
default command is used.
.TP 2m
\fB\-\-container\-command\fR=[\fICOMMAND\fR,...]
Entrypoint for the container image. If not specified, the container image's
default entrypoint is run.
.TP 2m
\fB\-\-container\-deployment\-timeout\-seconds\fR=\fICONTAINER_DEPLOYMENT_TIMEOUT_SECONDS\fR
Deployment timeout in seconds.
.TP 2m
\fB\-\-container\-env\-vars\fR=[\fIKEY\fR=\fIVALUE\fR,...]
List of key\-value pairs to set as environment variables.
.TP 2m
\fB\-\-container\-grpc\-ports\fR=[\fIPORT\fR,...]
Container ports to receive grpc requests at. Must be a number between 1 and
65535, inclusive.
.TP 2m
\fB\-\-container\-health\-probe\-exec\fR=[\fIHEALTH_PROBE_EXEC\fR,...]
Exec specifies the action to take. Used by health probe. An example of this
argument would be ["cat", "/tmp/healthy"].
.TP 2m
\fB\-\-container\-health\-probe\-period\-seconds\fR=\fICONTAINER_HEALTH_PROBE_PERIOD_SECONDS\fR
How often (in seconds) to perform the health probe. Default to 10 seconds.
Minimum value is 1.
.TP 2m
\fB\-\-container\-health\-probe\-timeout\-seconds\fR=\fICONTAINER_HEALTH_PROBE_TIMEOUT_SECONDS\fR
Number of seconds after which the health probe times out. Defaults to 1 second.
Minimum value is 1.
.TP 2m
\fB\-\-container\-health\-route\fR=\fICONTAINER_HEALTH_ROUTE\fR
HTTP path to send health checks to inside the container.
.TP 2m
\fB\-\-container\-image\-uri\fR=\fICONTAINER_IMAGE_URI\fR
URI of the Model serving container file in the Container Registry (e.g.
gcr.io/myproject/server:latest).
.TP 2m
\fB\-\-container\-ports\fR=[\fIPORT\fR,...]
Container ports to receive http requests at. Must be a number between 1 and
65535, inclusive.
.TP 2m
\fB\-\-container\-predict\-route\fR=\fICONTAINER_PREDICT_ROUTE\fR
HTTP path to send prediction requests to inside the container.
.TP 2m
\fB\-\-container\-shared\-memory\-size\-mb\fR=\fICONTAINER_SHARED_MEMORY_SIZE_MB\fR
The amount of the VM memory to reserve as the shared memory for the model in
megabytes.
.TP 2m
\fB\-\-container\-startup\-probe\-exec\fR=[\fISTARTUP_PROBE_EXEC\fR,...]
Exec specifies the action to take. Used by startup probe. An example of this
argument would be ["cat", "/tmp/healthy"].
.TP 2m
\fB\-\-container\-startup\-probe\-period\-seconds\fR=\fICONTAINER_STARTUP_PROBE_PERIOD_SECONDS\fR
How often (in seconds) to perform the startup probe. Default to 10 seconds.
Minimum value is 1.
.TP 2m
\fB\-\-container\-startup\-probe\-timeout\-seconds\fR=\fICONTAINER_STARTUP_PROBE_TIMEOUT_SECONDS\fR
Number of seconds after which the startup probe times out. Defaults to 1 second.
Minimum value is 1.
.TP 2m
\fB\-\-enable\-fast\-tryout\fR
If True, model will be deployed using faster deployment path. Useful for quick
experiments. Not for production workloads. Only available for most popular
models with certain machine types.
.TP 2m
\fB\-\-endpoint\-display\-name\fR=\fIENDPOINT_DISPLAY_NAME\fR
Display name of the endpoint with the deployed model.
.TP 2m
\fB\-\-hugging\-face\-access\-token\fR=\fIHUGGING_FACE_ACCESS_TOKEN\fR
The access token from Hugging Face needed to read the model artifacts of gated
models. It is only needed when the Hugging Face model to deploy is gated.
.TP 2m
\fB\-\-machine\-type\fR=\fIMACHINE_TYPE\fR
The machine type to deploy the model to. It should be a supported machine type
from the deployment configurations of the model. Use \f5gcloud ai model\-garden
models list\-deployment\-config\fR to check the supported machine types.
.TP 2m
Region resource \- Cloud region to deploy the model. This represents a Cloud
resource. (NOTE) Some attributes are not given arguments in this group but can
be set in other ways.
To set the \f5project\fR attribute:
.RS 2m
.IP "\(em" 2m
provide the argument \f5\-\-region\fR on the command line with a fully specified
name;
.IP "\(em" 2m
set the property \f5ai/region\fR with a fully specified name;
.IP "\(em" 2m
choose one from the prompted list of available regions with a fully specified
name;
.IP "\(em" 2m
provide the argument \f5\-\-project\fR on the command line;
.IP "\(em" 2m
set the property \f5core/project\fR.
.RE
.sp
.RS 2m
.TP 2m
\fB\-\-region\fR=\fIREGION\fR
ID of the region or fully qualified identifier for the region.
To set the \f5region\fR attribute:
.RS 2m
.IP "\(bu" 2m
provide the argument \f5\-\-region\fR on the command line;
.IP "\(bu" 2m
set the property \f5ai/region\fR;
.IP "\(bu" 2m
choose one from the prompted list of available regions.
.RE
.sp
.RE
.sp
.TP 2m
\fB\-\-reservation\-affinity\fR=[\fIkey\fR=\fIKEY\fR],[\fIreservation\-affinity\-type\fR=\fIRESERVATION\-AFFINITY\-TYPE\fR],[\fIvalues\fR=\fIVALUES\fR]
A ReservationAffinity can be used to configure a Vertex AI resource (e.g., a
DeployedModel) to draw its Compute Engine resources from a Shared Reservation,
or exclusively from on\-demand capacity.
.TP 2m
\fB\-\-spot\fR
If true, schedule the deployment workload on Spot VM.
.TP 2m
\fB\-\-use\-dedicated\-endpoint\fR
If true, the endpoint will be exposed through a dedicated DNS. Your request to
the dedicated DNS will be isolated from other users' traffic and will have
better performance and reliability.
.RE
.sp
.SH "GCLOUD WIDE FLAGS"
These flags are available to all commands: \-\-access\-token\-file, \-\-account,
\-\-billing\-project, \-\-configuration, \-\-flags\-file, \-\-flatten,
\-\-format, \-\-help, \-\-impersonate\-service\-account, \-\-log\-http,
\-\-project, \-\-quiet, \-\-trace\-token, \-\-user\-output\-enabled,
\-\-verbosity.
Run \fB$ gcloud help\fR for details.
.SH "NOTES"
These variants are also available:
.RS 2m
$ gcloud alpha ai model\-garden models deploy
.RE
.RS 2m
$ gcloud beta ai model\-garden models deploy
.RE