Monitoring
Nagios — Core, Agents & SNMP¶
Nagios is the elder statesman of open-source monitoring — running in datacenters since 1999 — and you will meet it in hosting and enterprise jobs. Its model is radically simple: everything is a small program (a plugin) whose exit code tells Nagios whether things are OK. This guide covers the server, the plugin model, and how to add hosts with NCPA, NRPE, and SNMP.
Applies to
Nagios Core 4.5.x (4.5.13 released May 2026) with NCPA 3.x (3.1.3 current) as the recommended agent. Commands target AlmaLinux 9 / RHEL 9. Nagios Core is the free engine; Nagios XI is the commercial product built on top with a config UI and wizards.
What is Nagios Core?¶
Nagios Core is a scheduling and notification engine: it runs checks on a timetable, tracks state changes, and notifies contacts when a host or service goes bad. It ships with a CGI web UI (status screens, maps, acknowledgements) served by Apache, and is configured entirely through text files — there is no config database and no built-in API.
| Concept | Meaning |
|---|---|
| Host | A machine/device, checked with something like check_ping |
| Service | One thing checked on a host (disk, HTTP, load…) |
| Command | How to run a plugin, with arguments |
| Plugin | Any executable returning a status exit code |
| Contact / contact group | Who gets notified, and how |
| Time period | When checks run and notifications fire |
The plugin model — four exit codes¶
Nagios itself measures nothing. A plugin prints one line and exits with:
| Exit code | State |
|---|---|
0 |
OK |
1 |
WARNING |
2 |
CRITICAL |
3 |
UNKNOWN |
/usr/lib64/nagios/plugins/check_disk -w 20% -c 10% -p /
# DISK OK - free space: / 41927 MiB (82% inode=99%);
echo $? # 0
Anything that follows this contract — bash, Python, a compiled binary — is a Nagios plugin. That's why "other types" of monitoring in Nagios is really just "other plugins."
Objects in text files¶
A host and a service look like this (in /usr/local/nagios/etc/ or
/etc/nagios/, depending on install):
define host {
use linux-server ; inherit a template
host_name web01
address 192.0.2.20
}
define service {
use generic-service
host_name web01
service_description Root Partition
check_command check_ncpa!-t 'MyToken' -P 5693 -M 'disk/logical/|/used_percent' -w 80 -c 90
}
After any config change:
sudo /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg # validate
sudo systemctl reload nagios # apply
Add a Linux host with NCPA (the modern agent)¶
NCPA (Nagios Cross-Platform Agent) is the actively developed agent — one package for Linux/Windows/macOS, a REST API over SSL on port 5693, a built-in web GUI, and both active and passive modes. New deployments should prefer it over NRPE.
1. Install NCPA on the host¶
# add the official Nagios repo (EL9)
sudo rpm -Uvh https://repo.nagios.com/nagios/9/nagios-repo-9-2.el9.noarch.rpm
sudo dnf install -y ncpa
# set the API token the server will authenticate with
sudo sed -i "s/^community_string =.*/community_string = MyToken/" /etc/ncpa/ncpa.cfg
sudo systemctl enable --now ncpa
sudo firewall-cmd --permanent --add-port=5693/tcp && sudo firewall-cmd --reload
Browse to https://192.0.2.20:5693/ — NCPA's own GUI shows live metrics, which
doubles as your "is the agent fine?" check.
2. Check it from the Nagios server¶
Install the check_ncpa.py plugin on the server, then:
./check_ncpa.py -H 192.0.2.20 -t 'MyToken' -P 5693 -M cpu/percent -w 80 -c 90
# OK: CPU Percent was 4.25 % | 'cpu_percent'=4.25%;80;90;
3. Define the host + services¶
Add define host / define service blocks (as above) using a check_ncpa
command, validate, reload. Useful metric paths: cpu/percent, memory/virtual,
disk/logical/|/used_percent, processes, services.
NRPE — the legacy agent you'll still meet¶
NRPE (Nagios Remote Plugin Executor) was the standard for two decades:
a small daemon on port 5666 that runs local plugins when the server's
check_nrpe asks. It still works and is everywhere in older estates, but it's
in maintenance mode — remote command definitions live on every client
(/etc/nagios/nrpe.cfg), there's no API, and historically it has had security
sharp edges (keep dont_blame_nrpe=0).
# on the client (EPEL packages)
sudo dnf install -y epel-release && sudo dnf install -y nrpe nagios-plugins-all
sudo vi /etc/nagios/nrpe.cfg # allowed_hosts=127.0.0.1,192.0.2.10
sudo systemctl enable --now nrpe
# on the server
/usr/lib64/nagios/plugins/check_nrpe -H 192.0.2.20 -c check_load
Windows hosts: use NCPA too (same package model); NSClient++ is the older Windows agent you'll find in existing setups.
SNMP and agentless checks¶
For switches, routers, PDUs, and appliances, use check_snmp (or the
purpose-built check_ifstatus/check_ifoperstatus) against the device — same
SNMP setup as for any monitoring system (enable SNMP, set a v2c community or a
v3 user, verify with snmpwalk first):
/usr/lib64/nagios/plugins/check_snmp -H 192.0.2.50 -P 2c -C MySecureString \
-o sysUpTime.0
# SNMP OK - Timeticks: (123456789) 14 days, 6:56:07.89
And because plugins run from the server, plenty of monitoring needs no agent at all:
| Plugin | Checks |
|---|---|
check_ping / check_icmp |
Reachability and latency |
check_http |
Web pages — status code, string match, cert expiry (-C 30) |
check_tcp / check_smtp / check_imap |
Any TCP service banner |
check_dns |
Name resolution against a specific resolver |
check_ssh |
SSH availability |
check_by_ssh |
Run remote plugins over SSH instead of an agent |
| NSCA / passive | Remote systems push results in (cron jobs, batch results) |
Nagios or Zabbix?¶
You'll be asked this in interviews:
| Nagios Core | Zabbix | |
|---|---|---|
| Model | Check status (OK/WARN/CRIT) | Collect metrics, evaluate triggers |
| Config | Text files, reload | Web UI + API, templates |
| History/graphs | Minimal built-in | First-class (history, trends, dashboards) |
| Agent | NCPA (modern) / NRPE (legacy) | Agent 2 |
| Strength | Simplicity, plugin ecosystem, ubiquity | Scale, autodiscovery, visualisation |
Both monitor a fleet well; Zabbix gives you more out of the box, Nagios gives you a model you can hold in your head and extend with twenty lines of bash. See the Zabbix guide for the other side.
Verify your work¶
- [ ] You can recite the four plugin exit codes and states (0 OK, 1 WARNING, 2 CRITICAL, 3 UNKNOWN).
- [ ]
nagios -v nagios.cfgvalidates cleanly after your host/service additions. - [ ]
https://<host>:5693/shows the NCPA GUI andcheck_ncpa.py … -M cpu/percentreturns OK from the server. - [ ]
check_snmp -H <device> … -o sysUpTime.0returns OK against your SNMP device. - [ ] You can explain to an interviewer when you'd reach for NCPA vs NRPE vs agentless plugins.
Summary¶
- Nagios Core schedules plugins and notifies on state changes; config is plain text, the web UI is CGI, and 4.5.x is current. Nagios XI is the commercial layer on top.
- The whole system rests on exit codes: 0 OK, 1 WARNING, 2 CRITICAL, 3 UNKNOWN — any executable that follows the contract is a plugin.
- NCPA (port 5693, REST API, SSL, cross-platform) is the agent for new installs; NRPE (port 5666) is legacy but everywhere; Windows estates may still run NSClient++.
- SNMP devices are checked with
check_snmpfrom the server; lots of monitoring (check_http,check_ping,check_dns,check_by_ssh) needs no agent at all. - Versus Zabbix: Nagios = simple status-check engine with a huge plugin ecosystem; Zabbix = richer metrics, templates, and dashboards out of the box.