inference Helm chart. The model checkpoints are provided separately. You deploy the chart, expose each model through its own Application Load Balancer ingress, and call the OpenAI-compatible API.
This deployment uses the standalone inference chart from the current Poolside inference bundle. It serves the model servers directly.
Architecture
This deployment includes:- One
DeploymentandServiceper model. Each model server downloads its checkpoint from Amazon S3 on startup and serves an OpenAI-compatible API. - One
Ingressper model, reconciled by the AWS Load Balancer Controller into a shared internal or internet-facing Application Load Balancer. Each model is reachable at its own hostname. - A single shared service account,
inference, annotated for IAM Roles for Service Accounts (IRSA). The model servers read checkpoints from S3 through this role, so the cluster needs no static AWS credentials. - Optionally, the Poolside documentation site, deployed in-cluster from the bundle. See Set up offline documentation.
How Amazon EKS differs from upstream Kubernetes
The deployment shape matches the upstream Kubernetes deployment, with these AWS-native substitutions:- Ingress: an Application Load Balancer provisioned by the AWS Load Balancer Controller, instead of an in-cluster ingress controller.
- Object storage access: IRSA on the
inferenceservice account, instead of a mounted AWS credentials secret. - Container registry: Amazon ECR, with image pulls authorized by the GPU node group’s instance role, instead of an image pull secret.
- TLS: terminated at the load balancer with an AWS Certificate Manager certificate, instead of a TLS secret in the cluster.
Required AWS foundation
You provision the AWS infrastructure that the chart runs on. The Install on Amazon EKS page lists the required services and the reason for each. For a turnkey foundation, Poolside publishes a Terraform reference architecture in thepoolsideai/reference_architectures repository. You can apply it as published, fork it, or reproduce the same architecture in your own infrastructure-as-code. For the architecture diagram and the key design decisions, see Reference architecture.