Monitoring

Nagios — Core, Agents & SNMP¶

Nagios is the elder statesman of open-source monitoring — running in datacenters since 1999 — and you will meet it in hosting and enterprise jobs. Its model is radically simple: everything is a small program (a plugin) whose exit code tells Nagios whether things are OK. This guide covers the server, the plugin model, and how to add hosts with NCPA, NRPE, and SNMP.

Applies to

Nagios Core 4.5.x (4.5.13 released May 2026) with NCPA 3.x (3.1.3 current) as the recommended agent. Commands target AlmaLinux 9 / RHEL 9. Nagios Core is the free engine; Nagios XI is the commercial product built on top with a config UI and wizards.

What is Nagios Core?¶

Nagios Core is a scheduling and notification engine: it runs checks on a timetable, tracks state changes, and notifies contacts when a host or service goes bad. It ships with a CGI web UI (status screens, maps, acknowledgements) served by Apache, and is configured entirely through text files — there is no config database and no built-in API.

Concept	Meaning
Host	A machine/device, checked with something like `check_ping`
Service	One thing checked on a host (disk, HTTP, load…)
Command	How to run a plugin, with arguments
Plugin	Any executable returning a status exit code
Contact / contact group	Who gets notified, and how
Time period	When checks run and notifications fire

The plugin model — four exit codes¶

Nagios itself measures nothing. A plugin prints one line and exits with:

Exit code	State
`0`	OK
`1`	WARNING
`2`	CRITICAL
`3`	UNKNOWN

/usr/lib64/nagios/plugins/check_disk -w 20% -c 10% -p /
# DISK OK - free space: / 41927 MiB (82% inode=99%);
echo $?   # 0

Anything that follows this contract — bash, Python, a compiled binary — is a Nagios plugin. That's why "other types" of monitoring in Nagios is really just "other plugins."

Objects in text files¶

A host and a service look like this (in /usr/local/nagios/etc/ or /etc/nagios/, depending on install):

define host {
    use        linux-server          ; inherit a template
    host_name  web01
    address    192.0.2.20
}

define service {
    use                 generic-service
    host_name           web01
    service_description Root Partition
    check_command       check_ncpa!-t 'MyToken' -P 5693 -M 'disk/logical/|/used_percent' -w 80 -c 90
}

After any config change:

sudo /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg   # validate
sudo systemctl reload nagios                                            # apply

Add a Linux host with NCPA (the modern agent)¶

NCPA (Nagios Cross-Platform Agent) is the actively developed agent — one package for Linux/Windows/macOS, a REST API over SSL on port 5693, a built-in web GUI, and both active and passive modes. New deployments should prefer it over NRPE.

1. Install NCPA on the host¶

# add the official Nagios repo (EL9)
sudo rpm -Uvh https://repo.nagios.com/nagios/9/nagios-repo-9-2.el9.noarch.rpm
sudo dnf install -y ncpa

# set the API token the server will authenticate with
sudo sed -i "s/^community_string =.*/community_string = MyToken/" /etc/ncpa/ncpa.cfg
sudo systemctl enable --now ncpa
sudo firewall-cmd --permanent --add-port=5693/tcp && sudo firewall-cmd --reload

Browse to https://192.0.2.20:5693/ — NCPA's own GUI shows live metrics, which doubles as your "is the agent fine?" check.

2. Check it from the Nagios server¶

Install the check_ncpa.py plugin on the server, then:

./check_ncpa.py -H 192.0.2.20 -t 'MyToken' -P 5693 -M cpu/percent -w 80 -c 90
# OK: CPU Percent was 4.25 % | 'cpu_percent'=4.25%;80;90;

3. Define the host + services¶

Add define host / define service blocks (as above) using a check_ncpa command, validate, reload. Useful metric paths: cpu/percent, memory/virtual, disk/logical/|/used_percent, processes, services.

NRPE — the legacy agent you'll still meet¶

NRPE (Nagios Remote Plugin Executor) was the standard for two decades: a small daemon on port 5666 that runs local plugins when the server's check_nrpe asks. It still works and is everywhere in older estates, but it's in maintenance mode — remote command definitions live on every client (/etc/nagios/nrpe.cfg), there's no API, and historically it has had security sharp edges (keep dont_blame_nrpe=0).

# on the client (EPEL packages)
sudo dnf install -y epel-release && sudo dnf install -y nrpe nagios-plugins-all
sudo vi /etc/nagios/nrpe.cfg     # allowed_hosts=127.0.0.1,192.0.2.10
sudo systemctl enable --now nrpe

# on the server
/usr/lib64/nagios/plugins/check_nrpe -H 192.0.2.20 -c check_load

Windows hosts: use NCPA too (same package model); NSClient++ is the older Windows agent you'll find in existing setups.

SNMP and agentless checks¶

For switches, routers, PDUs, and appliances, use check_snmp (or the purpose-built check_ifstatus/check_ifoperstatus) against the device — same SNMP setup as for any monitoring system (enable SNMP, set a v2c community or a v3 user, verify with snmpwalk first):

/usr/lib64/nagios/plugins/check_snmp -H 192.0.2.50 -P 2c -C MySecureString \
  -o sysUpTime.0
# SNMP OK - Timeticks: (123456789) 14 days, 6:56:07.89

And because plugins run from the server, plenty of monitoring needs no agent at all:

Plugin	Checks
`check_ping` / `check_icmp`	Reachability and latency
`check_http`	Web pages — status code, string match, cert expiry (`-C 30`)
`check_tcp` / `check_smtp` / `check_imap`	Any TCP service banner
`check_dns`	Name resolution against a specific resolver
`check_ssh`	SSH availability
`check_by_ssh`	Run remote plugins over SSH instead of an agent
NSCA / passive	Remote systems push results in (cron jobs, batch results)

Nagios or Zabbix?¶

You'll be asked this in interviews:

	Nagios Core	Zabbix
Model	Check status (OK/WARN/CRIT)	Collect metrics, evaluate triggers
Config	Text files, reload	Web UI + API, templates
History/graphs	Minimal built-in	First-class (history, trends, dashboards)
Agent	NCPA (modern) / NRPE (legacy)	Agent 2
Strength	Simplicity, plugin ecosystem, ubiquity	Scale, autodiscovery, visualisation

Both monitor a fleet well; Zabbix gives you more out of the box, Nagios gives you a model you can hold in your head and extend with twenty lines of bash. See the Zabbix guide for the other side.

Verify your work¶

[ ] You can recite the four plugin exit codes and states (0 OK, 1 WARNING, 2 CRITICAL, 3 UNKNOWN).
[ ] nagios -v nagios.cfg validates cleanly after your host/service additions.
[ ] https://<host>:5693/ shows the NCPA GUI and check_ncpa.py … -M cpu/percent returns OK from the server.
[ ] check_snmp -H <device> … -o sysUpTime.0 returns OK against your SNMP device.
[ ] You can explain to an interviewer when you'd reach for NCPA vs NRPE vs agentless plugins.

Summary¶

Nagios Core schedules plugins and notifies on state changes; config is plain text, the web UI is CGI, and 4.5.x is current. Nagios XI is the commercial layer on top.
The whole system rests on exit codes: 0 OK, 1 WARNING, 2 CRITICAL, 3 UNKNOWN — any executable that follows the contract is a plugin.
NCPA (port 5693, REST API, SSL, cross-platform) is the agent for new installs; NRPE (port 5666) is legacy but everywhere; Windows estates may still run NSClient++.
SNMP devices are checked with check_snmp from the server; lots of monitoring (check_http, check_ping, check_dns, check_by_ssh) needs no agent at all.
Versus Zabbix: Nagios = simple status-check engine with a huge plugin ecosystem; Zabbix = richer metrics, templates, and dashboards out of the box.