Overview
Use this guide to change the set of models served by a runninginference release: adding a new model, replacing a model’s checkpoint, or removing a model. You edit your inference_values.yaml file and run helm upgrade; the chart reconciles the model Deployments, Services, and Ingress objects to match.
You can make these changes on their own against the current chart version, or apply them as part of a chart upgrade to a new Poolside bundle. To upgrade the chart, see Upgrade on Amazon EKS; make the model edits described here in the same inference_values.yaml file before you run helm upgrade.
Prerequisites
- A working deployment completed with the Install on Amazon EKS guide.
- The customized
inference_values.yamlfile you used to install. - The new model checkpoint, provided by Poolside.
- Workstation tools:
helm3.12or laterkubectl, configured for your EKS clusterawsCLI, to upload checkpoints to S3jq, to parse JSON responses from the inference API
Downtime
Adding a model does not affect models that are already serving. Updating a checkpoint rolls that model’s Deployment, and the model server re-downloads the checkpoint from S3 on restart, so expect a delay before it becomes ready again. Plan a maintenance window for single-replica models.Add a model
Extract the new checkpoint archive as described in Upload model checkpoints to S3 so its files sit at the prefix root, then upload it to your S3 bucket. Use a distinct prefix per model:models in your inference_values.yaml file. Give the model its own ingressHost, covered by the ACM certificate referenced in ingress.annotations:
inference_values.yaml
helm upgrade:
Deployment, Service, and Ingress named inference-<model-key> for the model. Confirm the new pod starts and the ingress is created:
ingressHost, pointing it at the load balancer address.
Update a model checkpoint
Upload the new checkpoint to a new, versioned prefix rather than overwriting the existing one. A new path letshelm upgrade detect the change and roll the Deployment automatically, and it lets you roll back by pointing at the previous path. Extract the archive first, as in Step 3, so the files sit at the prefix root:
model field at the new path in your inference_values.yaml file. Update modelName only if the served model name changes:
inference_values.yaml
If you reuse the same S3 path instead of a versioned one,
helm upgrade detects no change to the values and does not restart the model. Force a restart so the init container re-downloads the checkpoint:Remove a model
Delete the model’s key frommodels in your inference_values.yaml file, then apply the change:
Deployment, Service, and Ingress. Confirm the resources are gone:
Verification
Confirm a model serves traffic, where<model-hostname> is the ingressHost of that model:
Authorization header.