> ## Documentation Index > Fetch the complete documentation index at: https://docs-staging.poolside.ai/llms.txt > Use this file to discover all available pages before exploring further. # Cloud deployment > Deploy Poolside model inference in a supported GPU-backed cloud Kubernetes environment. Use cloud deployment to serve Poolside models from a GPU-backed Kubernetes environment. You deploy the `inference` chart, expose each model through its own ingress or OpenShift Route, and call the OpenAI-compatible API. ## Supported environments Deploy model inference with Helm on Amazon EKS, using IRSA for object storage and an Application Load Balancer for ingress. Deploy model inference with Helm on your OpenShift cluster. Deploy model inference with Helm on your self-managed Kubernetes cluster, such as RKE2 or Charmed Kubernetes. ## Architecture Cloud deployment includes: * One `Deployment` and `Service` per model. Each model server downloads its checkpoint from object storage on startup and serves an OpenAI-compatible API. * Each model is exposed at its own hostname through an ingress or OpenShift Route that routes directly to its vLLM service. * Optionally, the Poolside documentation site, deployed in-cluster from the bundle. See [Set up offline documentation](/deployment/cloud/set-up-offline-documentation). You are responsible for sending requests to the inference endpoints and for any authentication or routing in front of them. ## Operational considerations * **Service availability**: All external services your deployment depends on, including object storage and the container registry, must be reachable from within the cluster. The cluster must have access to compatible GPU hardware. * **Backup and recovery**: You are responsible for backup and recovery for the infrastructure and external services in your environment, such as object storage, container registry contents, and Kubernetes configuration.