Check smartmon - NRPE S.M.A.R.T harddisk check: Difference between revisions
Created page with '{{DISPLAYTITLE:check_pgsql - NRPE S.M.A.R.T harddisk check}} category:nagios S.M.A.R.T is a technology used to ask harddisks how they are doing. You can have Nagios monitor t…' |
(No difference)
|
Latest revision as of 11:04, 2 April 2010
S.M.A.R.T is a technology used to ask harddisks how they are doing. You can have Nagios monitor the temperature of the harddisks in your servers using this port:
$ cat /usr/ports/net-mgmt/nagios-check_smartmon/pkg-descr check_smartmon is a Nagios plug-in written in python that uses smartmontools to check disk health status and temperature.
Configuring Nagios
First I define a few new services on the Nagios server, in /usr/local/etc/nagios/objects/services.cfg. I define one service per disk name I want to check. If I have three servers with an ad0 drive, and one server (host2 in the example below) with both ad0 and ad1 drives, I add service definitions for checking both ad0 and ad1:
# SMART ad0
define service {
use generic-service
host_name host1,host2,host3
service_description nrpe_check_smart_ad0
check_command check_nrpe2!check_smart_ad0
}
# SMART ad1
define service {
use generic-service
host_name host2
service_description nrpe_check_smart_ad1
check_command check_nrpe2!check_smart_ad1
}
Instead of adding the server hostnames to the service definitions directly, I could also have added the servers I want to check to groups called something like smart-ad0-servers, smart-ad1-servers etc., and then added the groups to the services, but for now I did it like this.
Install the plugin
Install the port:
sudo portmaster /usr/ports/net-mgmt/nagios-check_smartmon/
Fix sudo permissions
The Nagios user needs permission to run the smartctl binary with root permissions, I recommend using sudo for this purpose. I add the following to /usr/local/etc/sudoers on the servers being monitored:
nagios ALL=(ALL) NOPASSWD: /usr/local/libexec/nagios/check_smartmon -d /dev/ad* nagios ALL=(ALL) NOPASSWD: /usr/local/libexec/nagios/check_smartmon -d /dev/da*
The first line is needed if you are checking ide adX devices, the second line is needed if you are checking scsi or usb daX devices. I normally just leave both of them in.
To test this, as a user who has sudo access run the following command, substituting ad10 for the device name you want to monitor:
$ sudo su -m nagios -c "sudo /usr/local/libexec/nagios/check_smartmon -d /dev/ad10" OK: device is functional and stable (temperature: 36)
If you get a reply like the one above, everything works as intended.
Configuring NRPE
On the server being monitored, add the following line to /usr/local/etc/nrpe.cfg (this example has both an ad0 and an ad1 drive:
command[check_smart_ad0]=/usr/local/bin/sudo /usr/local/libexec/nagios/check_smartmon -d /dev/ad0 command[check_smart_ad1]=/usr/local/bin/sudo /usr/local/libexec/nagios/check_smartmon -d /dev/ad1
Remember to restart NRPE after changing the config:
sudo /usr/local/etc/rc.d/nrpe2 restart