Server and service maintenance

Choose a start or stop method

Use this guide to stop, start, reboot, or fully shut down a Poolside on-premises RKE2 node.

Goal	Use this method
Stop and start Poolside inference workloads without stopping RKE2	Stop and start Poolside workloads without stopping RKE2
Reboot or shut down the node for planned maintenance or hardware servicing	Reboot or shut down the node
Stop RKE2 and remaining RKE2-managed processes without rebooting	Stop RKE2 without rebooting
Preview script actions or use a custom timeout	Run the scripts directly

Timing expectations

Stopping and starting workloads can take several minutes.

Action	Expected time	What happens
Stop Poolside workloads	1 to 2 minutes	Poolside workloads stop. `rke2-server` remains running.
Start Poolside workloads	3 to 10 minutes	Poolside workloads start after RKE2 and the GPU Operator report healthy status.

Stop and start Poolside workloads without stopping RKE2

Use this method when you want to stop Poolside workloads without stopping RKE2. For example, use this method before short maintenance windows that do not require a full host reboot.

Stop Poolside workloads

sudo systemctl stop poolside-services

This command does not stop rke2-server. It stops or scales down Poolside workloads while leaving the RKE2 cluster running.

Start Poolside workloads

sudo systemctl start poolside-services

This command starts rke2-server if it is not already active, then starts Poolside workloads.

Check the current status

sudo systemctl status poolside-services

View live logs

sudo journalctl -t poolside-shutdown -f
sudo journalctl -t poolside-startup -f

Reboot or shut down the node

Use this method for planned maintenance or hardware servicing, such as operating system patching, kernel updates, or GPU replacement.

Reboot the node

sudo reboot

Shut down the node

sudo shutdown -h now

After the host starts again, check RKE2 and Poolside workloads:

sudo systemctl status rke2-server
kubectl get nodes
kubectl get pods -A

Stop RKE2 without rebooting

Use this method when you need to fully stop RKE2-managed processes but cannot reboot the host.

Stopping rke2-server alone is not sufficient. RKE2 can leave DaemonSet pods and static control-plane pods running under orphan containerd-shim processes until rke2-killall.sh stops them.

Stop RKE2

sudo systemctl stop rke2-server

Clean up remaining RKE2 processes

sudo /usr/local/bin/rke2-killall.sh

Verify that RKE2 stopped

sudo systemctl is-active rke2-server

Expected result: inactive. Check for remaining containerd-shim processes:

ps -ef | grep containerd-shim | grep -v grep | wc -l

Expected result: 0.

Run the scripts directly

Use this method when you want to preview actions with --dry-run or set a custom timeout.

Preview shutdown or startup actions

sudo /usr/local/bin/poolside-shutdown.sh --dry-run
sudo /usr/local/bin/poolside-startup.sh --dry-run

Run with a custom timeout

Specify the timeout in seconds:

sudo /usr/local/bin/poolside-shutdown.sh --timeout 120
sudo /usr/local/bin/poolside-startup.sh --timeout 120

Show script help

/usr/local/bin/poolside-shutdown.sh --help
/usr/local/bin/poolside-startup.sh --help

When you run poolside-shutdown.sh directly, the script may stop rke2-server as its last step. If it does, run rke2-killall.sh to clean up remaining RKE2-managed processes.

Troubleshooting

Symptom	Likely cause	Action
`systemctl stop poolside-services` returns immediately with no log output	The unit is not active, or the current bundle does not install the unit.	Check `systemctl is-active poolside-services` and confirm whether the current deployment includes the unit.
`systemctl stop poolside-services` hangs	One or more pods did not stop before the unit timeout.	Run `kubectl get pods -A`, then inspect stuck pods with `kubectl describe pod <pod-name> -n <namespace>`.
Startup finishes but pods remain `Pending`	GPUs are not allocatable or required images are unavailable.	Check `kubectl get nodes -o yaml \| grep nvidia.com/gpu`, `kubectl get pods -n gpu-operator`, and pod events.
Model pods remain in `Init`	The model pod is downloading checkpoints or cannot access model storage.	Check `kubectl logs <pod-name> -c model-downloader -n poolside-models`.
`containerd-shim` processes remain after `rke2-killall.sh`	Some containers were not reaped on the first pass.	Re-run `sudo /usr/local/bin/rke2-killall.sh`. If shims persist, list them with `ps -ef \| grep containerd-shim \| grep -v grep`.

​Choose a start or stop method

​Timing expectations

​Stop and start Poolside workloads without stopping RKE2

​Stop Poolside workloads

​Start Poolside workloads

​Check the current status

​View live logs

​Reboot or shut down the node

​Reboot the node

​Shut down the node

​Stop RKE2 without rebooting

​Stop RKE2

​Clean up remaining RKE2 processes

​Verify that RKE2 stopped

​Run the scripts directly

​Preview shutdown or startup actions

​Run with a custom timeout

​Show script help

​Troubleshooting

​Related resources

Choose a start or stop method

Timing expectations

Stop and start Poolside workloads without stopping RKE2

Stop Poolside workloads

Start Poolside workloads

Check the current status

View live logs

Reboot or shut down the node

Reboot the node

Shut down the node

Stop RKE2 without rebooting

Stop RKE2

Clean up remaining RKE2 processes

Verify that RKE2 stopped

Run the scripts directly

Preview shutdown or startup actions

Run with a custom timeout

Show script help

Troubleshooting

Related resources