Skip to main content

Overview

Poolside supports model inference deployments in GPU-backed Kubernetes environments. Use this page to review supported environments and size the infrastructure that serves Poolside models.
Poolside support covers the environments and minimum requirements documented on this page. For other Kubernetes environments, GPU configurations, storage backends, or security requirements, contact your Poolside account team to discuss support options.

Supported environments

DeploymentDescription
Amazon EKS 1.29+Amazon Elastic Kubernetes Service, with IAM Roles for Service Accounts (IRSA) and an Application Load Balancer
OpenShift 4.16+Red Hat OpenShift environments
Upstream Kubernetes 1.29+Self-managed Kubernetes environments such as RKE2 or Charmed Kubernetes
On-premisesSingle-node Kubernetes (RKE2) on customer-provided or Poolside-provided hardware

Inference requirements

Poolside models have different minimum requirements. Use this table to size your inference nodes. For concurrent-agent capacity and developer-seat estimates, contact your Poolside account team. Laguna is the recommended primary model family for new deployments.
ModelQuantizationMinimum GPU memoryMinimum CPUMinimum host memory
Laguna M.1FP8384 GB128 cores1 TB
Laguna XS.2FP896 GB44 cores512 GB
Malibu 2.2FP8192 GB128 cores1 TB
Malibu 2.2INT496 GB44 cores512 GB
PointFP896 GB44 cores512 GB
For cloud deployments, storage requirements depend on whether S3-compatible storage is colocated on the inference node. For on-premises deployments, see Storage requirements for guidance on sizing and configuring storage.

GPU memory reference

Poolside model inference targets the following NVIDIA GPU families: RTX 6000 Blackwell, H100, H200, B200, and B300. The memory per GPU is listed here for reference against the per-model minimum GPU memory table above:
GPUMemory per GPU
H10080 GB
RTX 6000 Blackwell96 GB
H200141 GB
B200192 GB
B300288 GB
Which GPUs suit a given model is model-dependent. The per-model minimum GPU memory table is a floor, not a GPU selector: context length, batch size, and the number of GPUs all affect what actually serves a model. Confirm the right combination of model, GPU type, and GPU count for your workload with your Poolside account team.

Support and compatibility

For questions about integration with specific enterprise tooling or deployment workflows, contact Poolside.