Choose a start or stop method
Use this guide to stop, start, reboot, or fully shut down a Poolside on-premises RKE2 node.
| Goal | Use this method |
|---|
| Stop and start Poolside inference workloads without stopping RKE2 | Stop and start Poolside workloads without stopping RKE2 |
| Reboot or shut down the node for planned maintenance or hardware servicing | Reboot or shut down the node |
| Stop RKE2 and remaining RKE2-managed processes without rebooting | Stop RKE2 without rebooting |
| Preview script actions or use a custom timeout | Run the scripts directly |
Timing expectations
Stopping and starting workloads can take several minutes.
| Action | Expected time | What happens |
|---|
| Stop Poolside workloads | 1 to 2 minutes | Poolside workloads stop. rke2-server remains running. |
| Start Poolside workloads | 3 to 10 minutes | Poolside workloads start after RKE2 and the GPU Operator report healthy status. |
Stop and start Poolside workloads without stopping RKE2
Use this method when you want to stop Poolside workloads without stopping RKE2. For example, use this method before short maintenance windows that do not require a full host reboot.
Stop Poolside workloads
sudo systemctl stop poolside-services
This command does not stop rke2-server. It stops or scales down Poolside workloads while leaving the RKE2 cluster running.
Start Poolside workloads
sudo systemctl start poolside-services
This command starts rke2-server if it is not already active, then starts Poolside workloads.
Check the current status
sudo systemctl status poolside-services
View live logs
sudo journalctl -t poolside-shutdown -f
sudo journalctl -t poolside-startup -f
Reboot or shut down the node
Use this method for planned maintenance or hardware servicing, such as operating system patching, kernel updates, or GPU replacement.
Reboot the node
Shut down the node
After the host starts again, check RKE2 and Poolside workloads:
sudo systemctl status rke2-server
kubectl get nodes
kubectl get pods -A
Stop RKE2 without rebooting
Use this method when you need to fully stop RKE2-managed processes but cannot reboot the host.
Stopping rke2-server alone is not sufficient. RKE2 can leave DaemonSet pods and static control-plane pods running under orphan containerd-shim processes until rke2-killall.sh stops them.
Stop RKE2
sudo systemctl stop rke2-server
Clean up remaining RKE2 processes
sudo /usr/local/bin/rke2-killall.sh
Verify that RKE2 stopped
sudo systemctl is-active rke2-server
Expected result: inactive.
Check for remaining containerd-shim processes:
ps -ef | grep containerd-shim | grep -v grep | wc -l
Expected result: 0.
Run the scripts directly
Use this method when you want to preview actions with --dry-run or set a custom timeout.
Preview shutdown or startup actions
sudo /usr/local/bin/poolside-shutdown.sh --dry-run
sudo /usr/local/bin/poolside-startup.sh --dry-run
Run with a custom timeout
Specify the timeout in seconds:
sudo /usr/local/bin/poolside-shutdown.sh --timeout 120
sudo /usr/local/bin/poolside-startup.sh --timeout 120
Show script help
/usr/local/bin/poolside-shutdown.sh --help
/usr/local/bin/poolside-startup.sh --help
When you run poolside-shutdown.sh directly, the script may stop rke2-server as its last step. If it does, run rke2-killall.sh to clean up remaining RKE2-managed processes.
Troubleshooting
| Symptom | Likely cause | Action |
|---|
systemctl stop poolside-services returns immediately with no log output | The unit is not active, or the current bundle does not install the unit. | Check systemctl is-active poolside-services and confirm whether the current deployment includes the unit. |
systemctl stop poolside-services hangs | One or more pods did not stop before the unit timeout. | Run kubectl get pods -A, then inspect stuck pods with kubectl describe pod <pod-name> -n <namespace>. |
Startup finishes but pods remain Pending | GPUs are not allocatable or required images are unavailable. | Check kubectl get nodes -o yaml | grep nvidia.com/gpu, kubectl get pods -n gpu-operator, and pod events. |
Model pods remain in Init | The model pod is downloading checkpoints or cannot access model storage. | Check kubectl logs <pod-name> -c model-downloader -n poolside-models. |
containerd-shim processes remain after rke2-killall.sh | Some containers were not reaped on the first pass. | Re-run sudo /usr/local/bin/rke2-killall.sh. If shims persist, list them with ps -ef | grep containerd-shim | grep -v grep. |