Skip to main content
Follow these steps to deploy Poolside model inference on your GPU-backed OpenShift cluster. For an overview of this deployment approach and architecture, see OpenShift deployment overview.

Prerequisites

Poolside distributes the Helm deployment bundle as a .tar.gz archive. Extract it before you start:
tar -xzf <bundle-name>.tar.gz
cd <bundle-name>
Confirm that you are working from the root of the extracted bundle. The bundle root contains the following directories:
./scripts/
./containers/
./charts/
./binaries/
Cluster requirements
  • OpenShift 4.16 or later
  • GPU nodes with enough GPUs for the models you deploy
  • NVIDIA GPU Operator 26.3.0, with NVIDIA driver 580.126.20 and NVIDIA Container Toolkit 1.19.0
  • DNS records that resolve to the cluster router endpoint, or use a router-generated hostname
  • An S3-compatible object storage service such as NooBaa (OpenShift Data Foundation), Amazon S3, or MinIO
  • A container registry that your cluster can access
Workstation tools Install the following tools on the host you use to run the deployment:
  • helm 3.12 or later
  • oc or kubectl
  • skopeo
  • aws CLI (to upload checkpoints to S3-compatible object storage)
  • jq (to parse JSON responses from the inference API)
  • tar (to extract the deployment bundle)
  • curl (to call the inference API)
  • openssl (optional, to generate a TLS certificate for the inference endpoint)
Minimum resource requirements Ensure that your cluster has enough GPUs for the models you deploy. If you have questions about the required specs, contact Poolside support.

Step 1: Create the namespace

The inference stack runs in a single namespace:
oc create namespace poolside-models

Step 2: Upload container images

Copy the bundled images into your registry. Log in to your target registry using docker login or podman login before running any upload commands. Authenticate skopeo against your target registry:
skopeo login <registry-host> --username <username> --password <password>
Upload the images with the provided script:
chmod +x ./scripts/upload_images.sh
./scripts/upload_images.sh <registry-host>
If your registry requires authentication, create an image pull secret in poolside-models:
oc create secret docker-registry poolside-registry-secret \
  --docker-server=<registry-host> \
  --docker-username=<registry-user> \
  --docker-password=<registry-password> \
  -n poolside-models
If you use the OpenShift internal registry, push the images into the poolside-models namespace. Pods in that namespace pull same-namespace imagestreams with the default service account, so no cross-namespace system:image-puller rolebinding is required.

Step 3: Upload model checkpoints

The inference stack downloads model weights from your S3 bucket on pod startup, so the checkpoints must be in place before you deploy the chart. Poolside provides the checkpoint files separately from the deployment bundle. Confirm the local path and the destination prefix with your Poolside contact. Uploading checkpoints is time consuming. Start it now and continue with the remaining steps in parallel. Create the bucket if it does not already exist. The example uses the NooBaa endpoint; for AWS S3, omit the --endpoint-url flag:
aws s3 mb s3://<bucket-name> --endpoint-url https://<s3-endpoint> --region <aws-region>
Note the bucket name; you reference it in the models.<key>.model paths in Step 5. Then upload the checkpoints to the bucket:
aws s3 cp ./checkpoints s3://<bucket-name>/checkpoints --recursive --region <aws-region>
For a non-AWS S3 endpoint such as NooBaa or MinIO, add --endpoint-url:
aws s3 cp ./checkpoints s3://<bucket-name>/checkpoints \
  --recursive \
  --endpoint-url https://<s3-endpoint> \
  --region <aws-region>
Checkpoints are typically tens of GiB per model. For faster throughput, or for backends sensitive to upload concurrency such as NooBaa, run the upload from a host inside the cluster and tune aws configure set default.s3.max_concurrent_requests and default.s3.multipart_chunksize.

Step 4: Create the S3 credentials secret

The model servers read checkpoints from S3 using credentials in a Kubernetes secret. Create it in poolside-models:
oc create secret generic aws-credentials \
  --from-literal=AWS_ACCESS_KEY_ID=<access-key-id> \
  --from-literal=AWS_SECRET_ACCESS_KEY=<secret-access-key> \
  -n poolside-models
API key authentication (optional) To require an API key on the vLLM inference servers, create a secret containing the key in poolside-models:
oc create secret generic vllm-auth \
  --from-literal=VLLM_API_KEY=<vllm-api-key> \
  -n poolside-models
Creating the secret does not enable API key authentication by itself. In Step 5, set authentication.secretName to vllm-auth.

Step 5: Configure the inference values file

Create an inference_values.yaml file in the bundle root:
cp ./charts/inference/values.yaml ./inference_values.yaml
Set the fields that apply to your environment. The example below deploys two models and exposes each model through its own OpenShift Route:
inference_values.yaml
image:
  # -- Registry you uploaded the atlas image to (required)
  registry: "<registry-host>"
  # -- Image name and tag come pre-set in the bundle to match the shipped image
  name: "atlas"
  tag: "<atlas-tag>"
# -- Name of the image pull secret for private registries (omit if your registry is public)
imagePullSecret: "poolside-registry-secret"
podSecurityContext:
  # -- Require non-root user. Do not set runAsUser on OpenShift; the SCC injects a UID from the namespace range.
  runAsNonRoot: true
  seccompProfile:
    # -- Seccomp profile type
    type: RuntimeDefault
s3:
  # -- Name of secret containing AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
  secretName: "aws-credentials"
  # -- Custom CA certificate bundle for S3 (required for NooBaa with the OpenShift service CA)
  caBundle: ""
authentication:
  # -- Name of secret containing VLLM_API_KEY for vLLM server authentication (set to "vllm-auth" if you created the optional secret in Step 4; leave empty to disable)
  secretName: ""
route:
  # -- Create a Route for every model
  enabled: true
  tls:
    # -- Terminate TLS at the OpenShift router
    enabled: true
    termination: edge
    insecureEdgeTerminationPolicy: Redirect
models:
  laguna:
    model: s3://<bucket-name>/checkpoints/laguna
    modelName: Laguna
    modelType: agent
    gpus: 4
    # -- Route host for this model (leave empty for a router-generated hostname)
    routeHost: ""
  point:
    model: s3://<bucket-name>/checkpoints/point
    modelName: Point
    modelType: completion
    gpus: 1
    # -- Route host for this model (leave empty for a router-generated hostname)
    routeHost: ""
The checkpoint paths in models.<key>.model and the image registry must exactly match the locations you uploaded from the deployment bundle. The image name and tag come pre-set to match the shipped atlas image. Set each model’s gpus to a value that meets its minimum GPU memory for your GPU type. For the per-model minimums, see Supported configurations.
Each model is exposed through a separate Route named inference-<model-key>. Leave routeHost empty to let the OpenShift router generate a hostname per model, or set an explicit host. The Route sends the host’s root path directly to that model’s vLLM service, so clients reach the OpenAI-compatible API at https://<route-host>/v1.
NooBaa and non-AWS S3 endpoints If your object storage is NooBaa or another non-AWS S3 service, point the model servers at the endpoint and region:
extraEnv:
  AWS_REGION: "<aws-region>"
  AWS_ENDPOINT_URL_S3: "https://s3.openshift-storage.svc:443"
For NooBaa with the OpenShift service CA, the model servers also need the service CA to trust the S3 endpoint. The inference chart takes it as inline text in s3.caBundle, supplied at deployment time in Step 6. NooBaa and other S3 backends with limited concurrency need throttled downloads. Without throttling, the init container can fail after downloading 1-2 GiB and restart in an infinite loop because the emptyDir volume is wiped on each restart:
awsCliConfig:
  default.s3.max_concurrent_requests: "2"
  default.s3.max_queue_size: "1000"
  default.s3.multipart_chunksize: "64MB"

Step 6: Install the inference chart

Install the inference chart into poolside-models. If your S3 backend uses a publicly trusted certificate, install the chart directly:
helm install inference ./charts/inference \
  --namespace poolside-models \
  -f ./inference_values.yaml
If you use NooBaa, the model servers must trust its self-signed S3 serving certificate. NooBaa’s certificate is signed by the OpenShift service CA and rotates automatically, so extract it fresh from the noobaa-s3-serving-cert secret rather than committing it to your values file:
oc get secret noobaa-s3-serving-cert -n openshift-storage \
  -o jsonpath='{.data.tls\.crt}' | base64 -d > service-ca.crt
Then install with the certificate passed inline through --set-file:
helm install inference ./charts/inference \
  --namespace poolside-models \
  -f ./inference_values.yaml \
  --set-file s3.caBundle=./service-ca.crt

Step 7: Verify the deployment

Check that the model pods are running. The only pods in the namespace are the per-model servers:
oc get pods -n poolside-models
Each model server takes time to become ready on first start because it downloads its checkpoint from S3. Watch a model’s logs to track progress, where <model-key> is the key you set under models in the values file (the Step 5 example uses laguna and point):
oc logs -f -n poolside-models deploy/inference-<model-key>
Confirm a Route was created for each model and note its host:
oc get route -n poolside-models
List the served models on a model’s endpoint to confirm routing works, where <route-host> is the host of that model’s Route:
curl -s https://<route-host>/v1/models

Step 8: Call the inference API

Each model serves the OpenAI-compatible API directly at its own Route host. The base URL has the form:
https://<route-host>/v1
Append the OpenAI-compatible route to the base URL, such as /chat/completions or /completions. The commands below use three placeholders. Fill the model values from the inference_values.yaml you wrote in Step 5; OpenShift assigns each model’s Route host unless you set routeHost:
PlaceholderSourceExample
<route-host>assigned by OpenShift per model (or models.<model-key>.routeHost if you set a custom host)inference-laguna-poolside-models.apps.cluster.example.com
<model-key>a key under modelslaguna
<served-model-name>models.<model-key>.modelNameLaguna
Retrieve each value from the running cluster. Retrieve the <model-key> values. Each model deployment is named inference-<model-key>:
oc get deploy -n poolside-models -l app.kubernetes.io/component=inference
Retrieve <route-host> from the model’s Route:
oc get route inference-<model-key> -n poolside-models -o jsonpath='{.spec.host}'
Retrieve <served-model-name> from the id field of that model’s models endpoint:
curl -s https://<route-host>/v1/models | jq -r '.data[].id'
Send a chat completion request:
curl https://<route-host>/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "<served-model-name>",
    "messages": [{"role": "user", "content": "Write a function that reverses a string."}]
  }'
For example, to call the laguna model served as Laguna:
curl https://inference-laguna-poolside-models.apps.cluster.example.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Laguna",
    "messages": [{"role": "user", "content": "Write a function that reverses a string."}]
  }'
If you set authentication.secretName in Step 5, include the key as a bearer token:
curl https://<route-host>/v1/chat/completions \
  -H "Authorization: Bearer <vllm-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "<served-model-name>",
    "messages": [{"role": "user", "content": "Write a function that reverses a string."}]
  }'

TLS

The Route example in Step 5 uses edge termination, where the OpenShift router terminates TLS with its default certificate. To serve a custom certificate, provide it inline under route.tls. This block applies to every model’s Route, so the certificate must be valid for all model Route hosts (for example, a wildcard certificate):
route:
  enabled: true
  tls:
    enabled: true
    termination: edge
    insecureEdgeTerminationPolicy: Redirect
    certificate: ""
    key: ""
    caCertificate: ""

Offline documentation (optional)

The bundle also ships the Poolside documentation site, which the same inference chart can deploy in-cluster so operators have local access to the docs. It is off by default. To enable and expose it through a Route, see Set up offline documentation.

Troubleshooting

  • If pods stay in Init or restart in a loop, check the init container logs with oc logs -n poolside-models <pod-name> -c <init-container>. A stale or misspelled checkpoint path syncs nothing and the pod never starts.
  • If checkpoint downloads fail against NooBaa, confirm the S3 CA bundle is mounted and review the awsCliConfig throttle settings in Step 5.
  • If model servers fail to pull images, run oc describe pod <pod-name> -n poolside-models and verify the image pull secret or internal-registry pull access.
  • If a model pod is Pending, confirm the cluster has enough GPUs for the gpus value you requested and that the NVIDIA GPU Operator is healthy.
For questions about hardware requirements, infrastructure configuration, or deployment issues, contact Poolside support.