Skip to main content

Choose a start or stop method

Use this guide to stop, start, reboot, or fully shut down a Poolside on-premises RKE2 node.
GoalUse this method
Stop and start Poolside inference workloads without stopping RKE2Stop and start Poolside workloads without stopping RKE2
Reboot or shut down the node for planned maintenance or hardware servicingReboot or shut down the node
Stop RKE2 and remaining RKE2-managed processes without rebootingStop RKE2 without rebooting
Preview script actions or use a custom timeoutRun the scripts directly

Timing expectations

Stopping and starting workloads can take several minutes.
ActionExpected timeWhat happens
Stop Poolside workloads1 to 2 minutesPoolside workloads stop. rke2-server remains running.
Start Poolside workloads3 to 10 minutesPoolside workloads start after RKE2 and the GPU Operator report healthy status.

Stop and start Poolside workloads without stopping RKE2

Use this method when you want to stop Poolside workloads without stopping RKE2. For example, use this method before short maintenance windows that do not require a full host reboot.

Stop Poolside workloads

sudo systemctl stop poolside-services
This command does not stop rke2-server. It stops or scales down Poolside workloads while leaving the RKE2 cluster running.

Start Poolside workloads

sudo systemctl start poolside-services
This command starts rke2-server if it is not already active, then starts Poolside workloads.

Check the current status

sudo systemctl status poolside-services

View live logs

sudo journalctl -t poolside-shutdown -f
sudo journalctl -t poolside-startup -f

Reboot or shut down the node

Use this method for planned maintenance or hardware servicing, such as operating system patching, kernel updates, or GPU replacement.

Reboot the node

sudo reboot

Shut down the node

sudo shutdown -h now
After the host starts again, check RKE2 and Poolside workloads:
sudo systemctl status rke2-server
kubectl get nodes
kubectl get pods -A

Stop RKE2 without rebooting

Use this method when you need to fully stop RKE2-managed processes but cannot reboot the host.
Stopping rke2-server alone is not sufficient. RKE2 can leave DaemonSet pods and static control-plane pods running under orphan containerd-shim processes until rke2-killall.sh stops them.

Stop RKE2

sudo systemctl stop rke2-server

Clean up remaining RKE2 processes

sudo /usr/local/bin/rke2-killall.sh

Verify that RKE2 stopped

sudo systemctl is-active rke2-server
Expected result: inactive. Check for remaining containerd-shim processes:
ps -ef | grep containerd-shim | grep -v grep | wc -l
Expected result: 0.

Run the scripts directly

Use this method when you want to preview actions with --dry-run or set a custom timeout.

Preview shutdown or startup actions

sudo /usr/local/bin/poolside-shutdown.sh --dry-run
sudo /usr/local/bin/poolside-startup.sh --dry-run

Run with a custom timeout

Specify the timeout in seconds:
sudo /usr/local/bin/poolside-shutdown.sh --timeout 120
sudo /usr/local/bin/poolside-startup.sh --timeout 120

Show script help

/usr/local/bin/poolside-shutdown.sh --help
/usr/local/bin/poolside-startup.sh --help
When you run poolside-shutdown.sh directly, the script may stop rke2-server as its last step. If it does, run rke2-killall.sh to clean up remaining RKE2-managed processes.

Troubleshooting

SymptomLikely causeAction
systemctl stop poolside-services returns immediately with no log outputThe unit is not active, or the current bundle does not install the unit.Check systemctl is-active poolside-services and confirm whether the current deployment includes the unit.
systemctl stop poolside-services hangsOne or more pods did not stop before the unit timeout.Run kubectl get pods -A, then inspect stuck pods with kubectl describe pod <pod-name> -n <namespace>.
Startup finishes but pods remain PendingGPUs are not allocatable or required images are unavailable.Check kubectl get nodes -o yaml | grep nvidia.com/gpu, kubectl get pods -n gpu-operator, and pod events.
Model pods remain in InitThe model pod is downloading checkpoints or cannot access model storage.Check kubectl logs <pod-name> -c model-downloader -n poolside-models.
containerd-shim processes remain after rke2-killall.shSome containers were not reaped on the first pass.Re-run sudo /usr/local/bin/rke2-killall.sh. If shims persist, list them with ps -ef | grep containerd-shim | grep -v grep.