Server Is Slow or Load Is High¶
The box feels sluggish, SSH lags, or your monitoring fired a "high load" alert. The job is to answer one question fast: is the system CPU-bound, memory-bound, or I/O-bound? Each points to a different culprit and a different fix.
Tested on
AlmaLinux 9 / RHEL 9. The same tools exist on Debian/Ubuntu; install the optional ones with sudo apt install sysstat htop (RHEL: sudo dnf install sysstat htop).
Symptom¶
uptimeshows load average well above your CPU core count.- Interactive commands stall; applications time out.
- A monitoring alert flags high load, low free memory, or high iowait.
Understanding load average¶
The three numbers are the 1-, 5-, and 15-minute averages of runnable + uninterruptible (D-state, usually I/O) processes. Read them relative to CPU core count (nproc):
- Load ≈ cores → fully utilized but keeping up.
- Load > cores → tasks are queueing; the trend across the three numbers tells you if it is rising or recovering.
- High load with low CPU usage → the queue is full of processes blocked on I/O, not CPU.
Load is not CPU%
A load of 8 on a 16-core box can be perfectly healthy. A load of 8 on a 2-core box is a problem. Always check nproc first.
Likely causes¶
| Class | Signature |
|---|---|
| CPU-bound | High %us/%sy in top, one or more processes pinned near 100% |
| Memory-bound | free -h near zero available, swap filling, OOM kills in dmesg |
| I/O-bound | High %wa (iowait), processes in D state, busy disk in iostat |
Diagnose¶
Work top-down: load → which resource → which process.
uptime # quick load snapshot
w # load plus who is logged in and doing what
nproc # core count to interpret the load against
Live view — watch %CPU, the load line, and especially %wa (iowait):
top # press '1' to see per-core, 'M' sort by memory, 'P' by CPU
htop # friendlier, color-coded (install separately)
Memory and swap:
System-wide activity, one sample per second:
Per-device I/O (from the sysstat package):
Top consumers by CPU and by memory:
ps aux --sort=-%cpu | head # biggest CPU users
ps aux --sort=-%mem | head # biggest memory users
pidstat 1 # per-process CPU over time (sysstat)
pidstat -d 1 # per-process disk I/O
The OOM killer¶
When the system runs out of memory (and swap), the kernel's Out-Of-Memory killer picks a process and kills it to stay alive. Symptoms: a service vanishes for no apparent reason, or load spikes during heavy swapping. Confirm it:
sudo dmesg -T | grep -i -E 'killed process|out of memory|oom'
sudo journalctl -k | grep -i -E 'oom|killed process'
A match like Out of memory: Killed process 4821 (mysqld) tells you the kernel reclaimed memory by killing that PID — your real problem is memory pressure, not the service "randomly crashing". See Logs & journald for retaining these kernel messages.
Fix¶
Match the fix to the bound you identified.
Find the process, then renice it (lower priority) or kill/fix the runaway. See Process Management.
sudo renice +10 -p <PID> # deprioritize a non-critical hog
sudo kill <PID> # stop it (SIGTERM); -9 only as last resort
If it is a legitimate workload consistently maxing cores, add CPUs or scale out.
Restart a leaking service, then address the leak. If the host is simply short on RAM, add swap as a stopgap:
sudo systemctl restart <leaky-service>
sudo fallocate -l 4G /swapfile && sudo chmod 600 /swapfile
sudo mkswap /swapfile && sudo swapon /swapfile
free -h
Make swap persistent by adding /swapfile none swap sw 0 0 to /etc/fstab. Swap buys time — it does not fix a leak. See Storage & Filesystems.
Find the disk culprit with iostat -xz 1 and pidstat -d 1, then throttle or reschedule it (e.g. move a backup or dd job off peak). Heavy D-state processes blocking everything else usually point at one greedy job or a failing disk — check dmesg for I/O errors.
After mitigating, re-run uptime and iostat to confirm load is trending down.
Prevent¶
- Monitor and alert on load relative to cores, memory
available, swap usage, and disk%util— not just raw load. Tie OOM events to a page. -
Set systemd resource limits so one service can't starve the host:
[Service] MemoryMax=2G # hard cap; cgroup OOM-kills only this service MemoryHigh=1500M # soft throttle before the hard cap CPUQuota=200% # at most 2 full coresApply with
sudo systemctl daemon-reload && sudo systemctl restart <svc>. See systemd Service Management. -
Capacity-plan from trends: if load and memory creep up week over week, size up before it becomes an incident.
- Keep
sysstatenabled (sudo systemctl enable --now sysstat) sosarretains historical CPU/IO/memory data for post-mortems.