Nagios runs on your monitoring server. That's great for network-level checks — ping, HTTP, port availability. But a lot of what you actually care about is invisible from the outside: disk usage, memory consumption, process counts, whether a specific log file is growing errors. For that you need something running on the monitored host itself.
NRPE — Nagios Remote Plugin Executor — is the standard solution. It's a small daemon that runs on each monitored host, listens for commands from the Nagios server, executes local checks, and sends the results back. Simple concept, a few gotchas in practice.
The architecture
Nagios server
└── check_nrpe plugin
└── (TCP port 5666)
└── nrpe daemon on monitored host
└── local check script
└── result (exit code + output) back to Nagios
The Nagios server runs check_nrpe just like any other check plugin. check_nrpe opens a TCP connection to the NRPE daemon on the target host, tells it which check to run, and waits for the response. The NRPE daemon runs the local check script and ships the exit code and output back. From Nagios's perspective it looks like any other plugin — it gets an exit code and a line of output.
Installing NRPE on the monitored host
On Amazon Linux or CentOS:
sudo yum install nrpe nagios-plugins-all
sudo systemctl enable nrpe
sudo systemctl start nrpe
nagios-plugins-all gives you the standard set of check scripts (check_disk, check_load, check_procs, etc.) that NRPE will be calling. You want these on the monitored host, not the Nagios server.
Configuring nrpe.cfg
The main config file is /etc/nagios/nrpe.cfg. The key settings:
# Which hosts are allowed to connect to this NRPE daemon.
# Only your Nagios server should be here.
allowed_hosts=127.0.0.1,10.0.1.50
# This controls whether the Nagios server can pass arguments to checks.
# Keep this at 0 unless you specifically need it (see the section below).
dont_blame_nrpe=0
# Define the checks this host exposes.
command[check_disk]=/usr/lib64/nagios/plugins/check_disk -w 20% -c 10% -p /
command[check_load]=/usr/lib64/nagios/plugins/check_load -w 5,4,3 -c 10,8,6
command[check_mem]=/usr/lib/nagios/plugins/check_mem.pl -w 80 -c 90
command[check_zombie_procs]=/usr/lib64/nagios/plugins/check_procs -w 5 -c 10 -s Z
command[check_myapp]=/usr/local/lib/nagios/plugins/check_myapp.py -w 100 -c 200
The command[name]=... lines define what checks are available. The name in brackets is what the Nagios server uses to request the check. Notice that the thresholds are hardcoded here on the monitored host — that's the safe default when dont_blame_nrpe=0.
After editing the config, restart NRPE:
sudo systemctl restart nrpe
Firewall
NRPE listens on TCP port 5666. The monitored host needs to allow inbound connections on that port from the Nagios server, and only the Nagios server. If you're on AWS, that means a security group rule. If you're using iptables directly:
sudo iptables -A INPUT -p tcp --dport 5666 -s 10.0.1.50 -j ACCEPT
sudo iptables -A INPUT -p tcp --dport 5666 -j DROP
Don't leave port 5666 open to the world. NRPE doesn't have authentication beyond the allowed_hosts IP whitelist, and you don't want random people querying your monitoring checks.
On the Nagios server side
Install the check_nrpe plugin on the Nagios server:
sudo yum install nagios-plugins-nrpe
Define the NRPE command in your Nagios config:
define command {
command_name check_nrpe
command_line /usr/lib64/nagios/plugins/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}
Then use it in service definitions:
define service {
use generic-service
host_name myserver
service_description Disk Usage
check_command check_nrpe!check_disk
}
define service {
use generic-service
host_name myserver
service_description Load Average
check_command check_nrpe!check_load
}
The !check_disk part is the check name — it has to match exactly what you defined in nrpe.cfg on the monitored host.
Verifying from the command line
Before wiring things into Nagios, test from the Nagios server directly:
/usr/lib64/nagios/plugins/check_nrpe -H 10.0.1.100 -c check_disk
You should see output like:
DISK OK - free space: / 42567 MB (68% inode=99%): | /=19661MB;52428;58982;0;65536
If you get "Connection refused" check that NRPE is running and the firewall rule is right. If you get "CHECK_NRPE: Error - Could not complete SSL handshake," you may have an SSL version mismatch between the client and server versions of NRPE — this is common when the Nagios server is on a different distro version than the monitored host. Usually resolved by making sure both are using the same NRPE version.
Passing arguments from Nagios to NRPE
By default, checks in nrpe.cfg have their arguments hardcoded. If you want to pass thresholds from the Nagios server at check time, you need to set dont_blame_nrpe=1 in nrpe.cfg and use $ARG1$ macros in your command definitions.
# nrpe.cfg on the monitored host
dont_blame_nrpe=1
command[check_disk_args]=/usr/lib64/nagios/plugins/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$
Then on the Nagios server:
define command {
command_name check_nrpe_args
command_line /usr/lib64/nagios/plugins/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ -a $ARG2$ $ARG3$ $ARG4$
}
define service {
...
check_command check_nrpe_args!check_disk_args!20%!10%!/
}
The security tradeoff with dont_blame_nrpe=1 is real: if an attacker can manipulate the arguments being passed, they can potentially inject shell commands into your check invocations. The allowed_hosts whitelist helps, but defense in depth means keeping dont_blame_nrpe=0 when you can and hardcoding the thresholds. I only enable argument passing when I have a legitimate operational reason — like wanting to control thresholds from the Nagios server without touching every nrpe.cfg file.
A real nrpe.cfg with useful checks
Here's what a reasonably complete nrpe.cfg command section looks like for a typical Linux server:
# Disk usage - warn at 20% free, critical at 10% free
command[check_disk]=/usr/lib64/nagios/plugins/check_disk -w 20% -c 10% -p /
# Memory usage (requires check_mem.pl from nagios-plugins-contrib)
command[check_mem]=/usr/lib/nagios/plugins/check_mem.pl -f -w 20 -c 10
# System load - warn/crit thresholds for 1m, 5m, 15m averages
command[check_load]=/usr/lib64/nagios/plugins/check_load -w 4,3,2 -c 8,6,4
# Zombie processes
command[check_zombie_procs]=/usr/lib64/nagios/plugins/check_procs -w 5 -c 10 -s Z
# Custom app check - our Python check from the previous post
command[check_myapp_memory]=/usr/local/lib/nagios/plugins/check_process_memory.py -p myapp -w 512 -c 1024
Debugging: the nagios user's environment problem
This is the gotcha that burns everyone at least once. You SSH to the monitored host, run the check manually as yourself, it works fine. Then Nagios reports UNKNOWN or the check fails. The reason is almost always the nagios user's environment.
The nagios user has a minimal shell environment. It may not have the same PATH as your user. It may not have environment variables your check depends on. It may not have permission to read files your check needs.
Always test checks as the nagios user:
sudo -u nagios /usr/lib64/nagios/plugins/check_disk -w 20% -c 10% -p /
echo $?
If that fails and running as yourself works, you've found your problem. Common fixes:
- Use full absolute paths in your check scripts instead of relying on PATH
- Set required environment variables explicitly in the check script
- Use sudoers to grant the nagios user permission to run specific commands as root
For that last one, if your check needs elevated permissions — reading from /proc in certain ways, running netstat, checking hardware sensors — add a sudoers entry:
# /etc/sudoers.d/nagios
nagios ALL=(root) NOPASSWD: /usr/lib64/nagios/plugins/check_disk
nagios ALL=(root) NOPASSWD: /usr/local/lib/nagios/plugins/check_process_memory.py
And call the check in nrpe.cfg with sudo:
command[check_disk_root]=sudo /usr/lib64/nagios/plugins/check_disk -w 20% -c 10% -p /
Don't give nagios a blanket NOPASSWD: ALL — that's a privilege escalation waiting to happen. Be specific about which commands it's allowed to run.
Summary
NRPE is straightforward once you understand what's happening:
- Install NRPE and the nagios-plugins on each monitored host
- Configure
allowed_hostsand define your commands innrpe.cfg - Open port 5666 from the Nagios server only
- Wire up
check_nrpecommands on the Nagios server side - Test from the command line with
check_nrpe -H host -c check_namebefore you trust the config - Debug as the nagios user, not as yourself
The nagios user environment thing is where most NRPE debugging time goes. Check that first when something works manually but fails through NRPE.