Running on from today’s outage, I’ve put a few more safety measures in place. I’ve always had old faithful Nagios watching over my servers, however sometimes a little more is needed. Introducing Monit into the picture…
Monit is a free open source utility for managing and monitoring, processes, files, directories and filesystems on a UNIX system. Monit conducts automatic maintenance and repair and can execute meaningful causal actions in error situations.
Monit basically watches your processes, and takes actions based on criteria specified. The configurations are practically English, so bashing something together quickly to monitor the major culprits isn’t hard. Debian is even easier for general installs.
sudo apt-get install monit
Then you just need to edit the config. The base config has an awful log of commenting, so it’s easy to understand, but here is an example of monitoring apache…
check process apache2 with pidfile /var/run/apache2.pid start program = "/etc/init.d/apache2 start" stop program = "/etc/init.d/apache2 stop" if cpu is greater than 60% for 2 cycles then alert if cpu > 80% for 5 cycles then restart if totalmem > 200.0 MB for 5 cycles then restart if children > 250 then restart if loadavg(5min) greater than 10 for 8 cycles then stop if 3 restarts within 5 cycles then timeout group server
Without any knowledge of the Monit configuration system, I bet 90% of all admins can figure out what all of this means.
Once started, Monit will come to life every X minutes (based on configuration variable), check the process, make sure it’s running, and not exceeding the defined limits, and take actions if it is.
Stay tuned for more server modifications for better handling my customers.