> ## Documentation Index
> Fetch the complete documentation index at: https://docs-staging.poolside.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Admin toolkit

> Common commands and checks for operating a Poolside on-premises model inference deployment.

## Introduction

Use this page when you need to inspect or troubleshoot an on-premises Poolside model inference deployment. The commands on this page assume you have shell access to the deployment host and `kubectl` access to the RKE2 cluster.

## Helpful aliases

Set these aliases for the current shell session, or add them to your shell profile.

```bash theme={null}
# Shorthand kubectl, for example, k get pods.
alias k=kubectl

# Switch the current namespace, for example, kcs poolside-models.
alias kcs='kubectl config set-context --current --namespace '

# Common get commands.
alias kgp='kubectl get pods'
alias kgd='kubectl get deployments'
alias kl='kubectl logs'

# Describe Kubernetes resources.
alias kd='kubectl describe'
```

## Check namespaces

On-premises model inference deployments commonly use these namespaces:

* `poolside-models` for model inference workloads
* `poolside-services` for supporting infrastructure services, such as S3 object storage and the model checkpoint uploads to S3
* `poolside-registry` for the embedded OCI registry that serves containers
* `kube-system` for RKE2 system components
* `poolside-cert-manager` for certificate management

List namespaces:

```bash theme={null}
kubectl get namespaces
```

Check all pods:

```bash theme={null}
kubectl get pods -A
```

Pods should usually be `Running` or `Completed`. Pods in `Pending`, `ContainerCreating`, `Init`, `CrashLoopBackOff`, or `ImagePullBackOff` require additional investigation.

## Check model workloads

The `poolside-models` namespace contains deployed model inference workloads and model upload jobs.

```bash theme={null}
kcs poolside-models
kubectl get deployments,svc,ingress,pods,jobs
```

Check model pod logs:

```bash theme={null}
kubectl logs <pod-name> -n poolside-models
```

If a model pod is still initializing, check the model downloader container logs:

```bash theme={null}
kubectl logs <pod-name> -c model-downloader -n poolside-models
```

Inspect recent events:

```bash theme={null}
kubectl get events -n poolside-models --sort-by=.lastTimestamp
```

## Check supporting services

The `poolside-services` namespace contains infrastructure services required by model inference, such as S3-compatible object storage.

```bash theme={null}
kcs poolside-services
kubectl get deployments,statefulsets,svc,ingress,pods
```

Inspect recent events:

```bash theme={null}
kubectl get events -n poolside-services --sort-by=.lastTimestamp
```

## Check GPU availability

Confirm that the host detects the expected NVIDIA GPU devices:

```bash theme={null}
lspci | grep -i nvidia
```

Confirm that Kubernetes reports GPUs as allocatable:

```bash theme={null}
kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{" "}{.status.allocatable.nvidia\.com/gpu}{"\n"}{end}'
```

Check GPU Operator pods:

```bash theme={null}
kubectl get pods -n gpu-operator
```

Check the NVIDIA Container Toolkit DaemonSet:

```bash theme={null}
kubectl get daemonset -n gpu-operator | grep nvidia-container-toolkit
kubectl get pods -n gpu-operator | grep nvidia-container-toolkit
```

Host-level `nvidia-smi` is available only when NVIDIA drivers are installed on the host. If `nvidia-smi` is not available on the host, use a GPU-enabled Kubernetes pod or a GPU Operator validation container to confirm runtime access to the GPUs.

If model pods are stuck in `Pending` or `ContainerCreating`, inspect the pod details and GPU Operator events:

```bash theme={null}
kubectl describe pod <pod-name> -n poolside-models
kubectl get events -n gpu-operator --sort-by=.lastTimestamp
```

## Check certificates

Poolside on-premises deployments use `cert-manager` to issue and renew certificates for internal and ingress endpoints.

```bash theme={null}
kubectl get certificates,issuers,clusterissuers -A
kubectl get pods -n cert-manager
kubectl get events -n cert-manager --sort-by=.lastTimestamp
```

If a certificate is not ready, describe it:

```bash theme={null}
kubectl describe certificate <certificate-name> -n <namespace>
```

## Resolve SSL and x509 errors

The cluster is the source of truth for certificates. Use Kubernetes certificate and secret resources when you need to inspect the current certificate state. Exported certificate files under `poolside-install/certs` are local copies generated by the deployment process.

The deployment installs CA certificates into the deployment host's trust chain. When the deployment uses self-signed certificates, clients that connect to Poolside services also need to trust the self-signed CA certificate. If a client returns an x509 or certificate authority error, import the self-signed CA certificate into that client's trusted root store.

After you import the certificate, restart the application, browser, shell session, or client process so it reloads the trust store.

### Import on Windows

```text theme={null}
Double-click the certificate file (.crt or .pem).
Click Install Certificate.
Choose Local Machine.
Select Place all certificates in the following store.
Select Trusted Root Certification Authorities.
Finish the import.
```

### Import on macOS

```text theme={null}
Double-click the certificate file.
Open the certificate in Keychain Access.
Expand Trust.
Set When using this certificate to Always Trust.
Close the window and enter your password when prompted.
```

Alternatively, import the certificate into the system keychain:

```bash theme={null}
sudo security add-trusted-cert -d -r trustRoot -k /Library/Keychains/System.keychain <self-signed-ca-file>
```

### Import on Ubuntu and Debian

```bash theme={null}
sudo cp <self-signed-ca-file> /usr/local/share/ca-certificates/
sudo update-ca-certificates
```

### Import on Red Hat Enterprise Linux and Fedora

```bash theme={null}
sudo mkdir -p /usr/local/share/ca-trust-source/anchors
sudo cp <self-signed-ca-file> /usr/local/share/ca-trust-source/anchors/
sudo update-ca-trust
```

## Distribute the self-signed CA certificate

Use your organization's normal endpoint management process to distribute the self-signed CA certificate to clients that access Poolside services. Common options include:

* Group Policy for domain-joined Windows hosts
* Configuration management tools, such as Ansible or Puppet
* Mobile device management tools, such as Microsoft Intune
* Browser enterprise policies
* Internal file shares or package repositories

## Related resources

* [Install on-premises](/deployment/on-prem/install)
* [Upgrade on-premises](/deployment/on-prem/upgrade)
* [Relocate an on-premises server](/deployment/on-prem/relocation)
* [Server and service maintenance](/deployment/on-prem/server-maintenance)
