Overview
Use this guide to change the set of models served by a runninginference release: adding a new model, replacing a model’s checkpoint, or removing a model. You edit your inference_values.yaml file and run helm upgrade; the chart reconciles the model Deployments, Services, and Routes to match.
You can make these changes on their own against the current chart version, or apply them as part of a chart upgrade to a new Poolside bundle. To upgrade the chart, see Upgrade on OpenShift; make the model edits described here in the same inference_values.yaml file before you run helm upgrade.
Prerequisites
- A working deployment completed with the Install on OpenShift guide.
- The customized
inference_values.yamlfile you used to install. - The new model checkpoint, provided by Poolside.
- Workstation tools:
helm3.12or laterocorkubectlawsCLI (to upload checkpoints to S3-compatible object storage)jq(to parse JSON responses from the inference API)
Downtime
Adding a model does not affect models that are already serving. Updating a checkpoint rolls that model’s Deployment, and the model server re-downloads the checkpoint from S3 on restart, so expect a delay before it becomes ready again. Plan a maintenance window for single-replica models.Add a model
Upload the new checkpoint to your S3 bucket. Use a distinct prefix per model. For NooBaa or another non-AWS endpoint, include--endpoint-url:
models in your inference_values.yaml file. Give the model its own routeHost, or leave it empty for a router-generated hostname:
inference_values.yaml
helm upgrade. Use the same flags you used to install. If your install command used --set-file s3.caBundle=... because your S3 backend uses a private CA such as NooBaa, include that flag every time you run helm upgrade on this page:
Deployment, Service, and Route named inference-<model-key> for the model. Confirm the new pod starts and the Route is created:
Update a model checkpoint
Upload the new checkpoint to a new, versioned prefix rather than overwriting the existing one. A new path letshelm upgrade detect the change and roll the Deployment automatically, and it lets you roll back by pointing at the previous path:
model field at the new path in your inference_values.yaml file. Update modelName only if the served model name changes:
inference_values.yaml
If you reuse the same S3 path instead of a versioned one,
helm upgrade detects no change to the values and does not restart the model. Force a restart so the init container re-downloads the checkpoint:Remove a model
Delete the model’s key frommodels in your inference_values.yaml file, then apply the change:
Deployment, Service, and Route. Confirm the resources are gone:
Verification
Confirm a model serves traffic, where<route-host> is the host of that model’s Route: