HEX

File: //snap/google-cloud-cli/394/help/man/man1/gcloud_container_ai_profiles_manifests_create.1
.TH "GCLOUD_CONTAINER_AI_PROFILES_MANIFESTS_CREATE" 1



.SH "NAME"
.HP
gcloud container ai profiles manifests create \- generate ready\-to\-deploy Kubernetes manifests with compute, load balancing, and autoscaling capabilities



.SH "SYNOPSIS"
.HP
\f5gcloud container ai profiles manifests create\fR \fB\-\-accelerator\-type\fR=\fIACCELERATOR_TYPE\fR \fB\-\-model\fR=\fIMODEL\fR \fB\-\-model\-server\fR=\fIMODEL_SERVER\fR [\fB\-\-model\-bucket\-uri\fR=\fIMODEL_BUCKET_URI\fR] [\fB\-\-model\-server\-version\fR=\fIMODEL_SERVER_VERSION\fR] [\fB\-\-namespace\fR=\fINAMESPACE\fR] [\fB\-\-output\fR=\fIOUTPUT\fR;\ default="all"] [\fB\-\-output\-path\fR=\fIOUTPUT_PATH\fR] [\fB\-\-target\-ntpot\-milliseconds\fR=\fITARGET_NTPOT_MILLISECONDS\fR] [\fB\-\-target\-ttft\-milliseconds\fR=\fITARGET_TTFT_MILLISECONDS\fR] [\fIGCLOUD_WIDE_FLAG\ ...\fR]



.SH "DESCRIPTION"

To get supported model, model servers, and model server versions, run \f5gcloud
alpha container ai profiles model\-and\-server\-combinations list\fR. To get
supported accelerators with their performance metrics, run \f5gcloud alpha
container ai profiles accelerators list\fR.



.SH "REQUIRED FLAGS"

.RS 2m
.TP 2m
\fB\-\-accelerator\-type\fR=\fIACCELERATOR_TYPE\fR

The accelerator type.

.TP 2m
\fB\-\-model\fR=\fIMODEL\fR

The model.

.TP 2m
\fB\-\-model\-server\fR=\fIMODEL_SERVER\fR

The model server.


.RE
.sp

.SH "OPTIONAL FLAGS"

.RS 2m
.TP 2m
\fB\-\-model\-bucket\-uri\fR=\fIMODEL_BUCKET_URI\fR

The Google Cloud Storage bucket URI to load the model from. This URI must point
to the directory containing the model's config file (config.json) and model
weights. If unspecified, defaults to loading the model from Hugging Face.

.TP 2m
\fB\-\-model\-server\-version\fR=\fIMODEL_SERVER_VERSION\fR

The model server version. If not specified, this defaults to the latest version.

.TP 2m
\fB\-\-namespace\fR=\fINAMESPACE\fR

The namespace to deploy the manifests in. Default namespace is 'default'.

.TP 2m
\fB\-\-output\fR=\fIOUTPUT\fR; default="all"

The output to display. Default is all. \fIOUTPUT\fR must be one of:
\fBmanifest\fR, \fBcomments\fR, \fBall\fR.

.TP 2m
\fB\-\-output\-path\fR=\fIOUTPUT_PATH\fR

The path to save the output to. If not specified, output to the terminal.

.TP 2m
\fB\-\-target\-ntpot\-milliseconds\fR=\fITARGET_NTPOT_MILLISECONDS\fR

The maximum normalized time per output token (NTPOT) in milliseconds. NTPOT is
measured as the request_latency / output_tokens. If this is set, the manifests
will include Horizontal Pod Autoscaler (HPA) resources which automatically
adjust the model server replica count in response to changes in model server
load to keep p50 NTPOT below the specified threshold. If the provided
target\-ntpot\-milliseconds is too low to achieve, the HPA manifest will not be
generated.

.TP 2m
\fB\-\-target\-ttft\-milliseconds\fR=\fITARGET_TTFT_MILLISECONDS\fR

If specified, results will only show accelerators that can meet the latency
target and will show their throughput performances at the target ttft target to
achieve, the HPA manifest will not be generated.


.RE
.sp

.SH "GCLOUD WIDE FLAGS"

These flags are available to all commands: \-\-access\-token\-file, \-\-account,
\-\-billing\-project, \-\-configuration, \-\-flags\-file, \-\-flatten,
\-\-format, \-\-help, \-\-impersonate\-service\-account, \-\-log\-http,
\-\-project, \-\-quiet, \-\-trace\-token, \-\-user\-output\-enabled,
\-\-verbosity.

Run \fB$ gcloud help\fR for details.



.SH "NOTES"

This variant is also available:

.RS 2m
$ gcloud alpha container ai profiles manifests create
.RE