Overview
Use this guide to upgrade an existing on-premises Poolside model inference deployment to a new installation bundle. Upgrades are performed bundle-to-bundle. To keep the process predictable, preserve the configuration and installation state you need from the current bundle, remove the current Terraform-managed deployment, and apply all phases from the new bundle. Do not run only the model upload and inference phases because the new bundle can include infrastructure changes that require full removal and reapplication. The upgrade process has the following phases:- Prepare the new installation bundle and preserve configuration.
- Remove the current deployment.
- Apply the new installation bundle.
- Configure local DNS.
- Verify model inference.
Prerequisites
Before you begin, ensure that you have:- A working on-premises model inference deployment completed with Install on-premises
- Access to the deployment host
- Access to the current installation bundle or a copy of its Terraform configuration
- The new on-premises installation bundle provided by Poolside
- Poolside model checkpoint files available on the host
- The ingress hostnames you plan to expose for model inference
- Any bring-your-own (BYO) certificates that match the ingress hostnames, if you are not using installer-generated self-signed certificates
- The same host tools required by Install on-premises
Downtime
Plan a maintenance window before you start. Model inference endpoints are unavailable while you remove the current deployment and apply the new bundle. Model upload and model startup can take a long time because model checkpoint files are large. Plan extra time if the upgrade includes large checkpoints or multiple model workloads.Preparation
Step A: Obtain the new bundle
Obtain the new on-premises installation bundle from Poolside and extract it on the deployment host. Set shell variables for the current and new bundle root directories:Step B: Set up air-gapped Terraform configuration
This configuration is required for air-gapped installations. In internet-connected environments, you can skip this step.
terraform.d directory.
-
Locate
poolside-terraform.tfrcin the root of the new bundle. -
Replace the
$POOLSIDE_INSTALL_DIRplaceholder with the fully qualified path to the new bundle’s root directory. -
For Terraform commands that run against the new bundle, prefix the command with the Terraform CLI configuration file path:
$OLD_BUNDLE/poolside-terraform.tfrc instead.
Step C: Preserve installation state
Copy the current bundle’spoolside-install directory into the new bundle before you remove the current deployment.
poolside-install directory contains persistent installation state shared across phases, including generated configuration files and BYO TLS certificate files if you stored them under poolside-install/byo-certs/.
Step D: Record configuration values
Record the values from the current deployment that you need to use in the new deployment. At minimum, review:- Host volume paths configured in
01-infra-rke2 - Model ingress hostnames
- Model names, S3 URIs, GPU counts, replica counts, and model types
- BYO certificate and CA file paths
- Local DNS entries or external DNS records
- Any environment-specific sizing or networking values
terraform.tfvars files directly into the new bundle. The new installation bundle can include different variables, defaults, and phase behavior.
For each phase, open the new bundle’s terraform.tfvars file and copy only the values that must carry forward from the current deployment. Leave new variables at their default values unless Poolside release notes specify otherwise.
If you used custom TLS certificates during the original install and stored the files under
$OLD_BUNDLE/poolside-install/byo-certs/, update the certificate paths in $NEW_BUNDLE/02-infra-services/terraform.tfvars so they point to $NEW_BUNDLE/poolside-install/byo-certs/.Upgrade
Step 1: Remove the current deployment
Remove the current deployment before you apply the new installation bundle.This is an upgrade procedure, not a full removal. Do not clean up local model checkpoint files, DNS records, BYO certificate files, or configuration values that the new bundle still needs.
Destroy model inference
Run the following commands from the current bundle’s04-poolside-inference directory.
Air-gapped environment:
Destroy model upload
Run the following commands from the current bundle’s03-poolside-model-upload directory.
Air-gapped environment:
Destroy supporting infrastructure services
Usingsudo, run the following commands from the current bundle’s 02-infra-services directory.
Air-gapped environment:
Destroy RKE2 infrastructure
Usingsudo, run the following commands from the current bundle’s 01-infra-rke2 directory.
Air-gapped environment:
Step 2: Apply the new bundle
Apply the new installation bundle by following Install in Install on-premises. Use$NEW_BUNDLE as the bundle path when you run commands from the install guide.
Run all installation phases in order:
- Set up air-gapped Terraform configuration, if your environment is air-gapped.
- Install RKE2 infrastructure from
01-infra-rke2. - Install supporting infrastructure services from
02-infra-services. - Upload Poolside models from
03-poolside-model-upload. - Deploy Poolside model inference from
04-poolside-inference.
/opt/poolside/poolside-model-uploads, or into the custom host volume location you configured in 01-infra-rke2, before you run the model upload phase.
Model upload and infrastructure service installation can take some time because model checkpoint files and container images can be large.
Step 3: Configure local DNS
Confirm that local DNS entries still match the modelingress_host_name values you configured in the new bundle. If you changed any model ingress hostnames during the upgrade, update /etc/hosts or your external DNS records.
For local host resolution, include every model ingress hostname on the same line:
Verification
Your upgrade is successful when the following checks pass:-
Confirm that all pods show a healthy status, such as
RunningorCompleted: -
Confirm that the model inference endpoint resolves to the deployment host:
-
Confirm that model workloads are running:
-
Confirm that the model upload job completed successfully:
Troubleshooting
Pods stuck pulling images
- Confirm that the supporting infrastructure services phase finished loading container images into the local RKE2 registry.
-
Check the affected pod events:
Model pods stuck in ContainerCreating
-
Confirm that the host detects NVIDIA GPU devices:
-
Confirm that Kubernetes reports GPUs as allocatable:
-
Check model workload status:
Model pods stuck in Init
Each model pod downloads its checkpoint during initialization. Check the initialization container logs and confirm that the checkpoint paths in 04-poolside-inference/terraform.tfvars are still valid.
Models not loading after upgrade
-
Confirm that the model checkpoint files were copied into
/opt/poolside/poolside-model-uploads, or into the custom host volume location you configured. - Confirm that the model upload job completed successfully.
-
Confirm that
terraform.tfvarsin04-poolside-inferencepoints to the expected model S3 URIs. -
Check model initialization logs: