Skip to main content
Use cloud deployment to serve Poolside models from a GPU-backed Kubernetes environment. You deploy the inference chart, expose each model through its own ingress or OpenShift Route, and call the OpenAI-compatible API.

Supported environments

Amazon EKS

Deploy model inference with Helm on Amazon EKS, using IRSA for object storage and an Application Load Balancer for ingress.

Red Hat OpenShift

Deploy model inference with Helm on your OpenShift cluster.

Upstream Kubernetes

Deploy model inference with Helm on your self-managed Kubernetes cluster, such as RKE2 or Charmed Kubernetes.

Architecture

Cloud deployment includes:
  • One Deployment and Service per model. Each model server downloads its checkpoint from object storage on startup and serves an OpenAI-compatible API.
  • Each model is exposed at its own hostname through an ingress or OpenShift Route that routes directly to its vLLM service.
  • Optionally, the Poolside documentation site, deployed in-cluster from the bundle. See Set up offline documentation.
You are responsible for sending requests to the inference endpoints and for any authentication or routing in front of them.

Operational considerations

  • Service availability: All external services your deployment depends on, including object storage and the container registry, must be reachable from within the cluster. The cluster must have access to compatible GPU hardware.
  • Backup and recovery: You are responsible for backup and recovery for the infrastructure and external services in your environment, such as object storage, container registry contents, and Kubernetes configuration.