inference chart, expose each model through its own OpenShift Route, and call the OpenAI-compatible API.
Architecture
This deployment includes:- One
DeploymentandServiceper model. Each model server downloads its checkpoint from S3 on startup and serves an OpenAI-compatible API. - Each model is exposed at its own hostname through an OpenShift Route that routes directly to its vLLM service.
- Optionally, the Poolside documentation site, deployed in-cluster from the bundle. See Set up offline documentation.