Register the controled daemons (monitrc)

What daemons are watched and how they are treated are governed via the main configuration file /etc/monitrc. Most things are so clearly described in MONIT's default sample that I don't think every detail needs to be mentioned. So here I explain only the rather complicated matters that bothered me before. Do note that monitrc must be readable/writable only by its owner, which usually means root:root 600 becuase it is root that usually be in charge of this kind of work.

set logfile directive

set logfile /var/log/monit

in the case above, logs are written by monit itself to /var/log/monit. On the other hand;

set logfile syslog facility log_local1

will be more appropriate, with which monit just hand over the messages to syslog daemon and the latter is in charge of actual output. Facility argument should not be something like 'LOCAL1' but should be `log_something'.
To get messages actually be logged, you should add a line to syslog.conf according to monitrc, Say `facility log_local1' was declared in monitrc, you can have monit logs written to /var/log/monit by adding the line below to syslog.conf;

local1.* /var/log/monit

Note that the log file will grow endlessly unless you setup logrotate or some sort.

set mail-format directive

The global mail-format setting defines the default which will applied to every service unless it has its own. Among some other directives, definitions in each individual service block overrides the global defaults. We can use certain variables in mail-format. They are;

$EVENT Event which triggered the alert. `checksum failed', `connection failed', `timeout', `does not exist' and so on
$SERVICE Service name defined in monitrc
$DATE Date and time when the event happened
$HOST Name of host on which MONIT is running
$ACTION The action MONIT took on the event. One of alert, monitor, unmonitor, start, stop, restart, exec
$DESCRIPTION Description of the error condition

Example;

from: monit@host1.hoge.cxm
subject: $SERVICE $ACTION on $HOST
message: Action:$ACTION taken
because $DESCRIPTION
 
    Date:   $DATE
    Host:   $HOST
    Event:  $EVENT

A space just after `from:', 'subject:' and `message:' are ignored, while the remains of 'message:' part will be sent as is, including spaces, tabs and linefeeds.

set alert directive

This directive, too, is about alert mail, but this directive can define what occasions you want them on. Without this, among others, every start and stop of monit itself will be informed. They are annoying to me, then I can now get rid of such mails by explicitly specifying the events which I want an mail on,

The relationship between various tests and events are not very clearly described in either MONIT's manual or man. My experiments have revealed some for now;

For readablity, the events below are separeted into two lines. But I found that they must be in one line, otherwise only the parameters untill the first newline (from nonexist to checksum in this example) are recognized.

set alert hoshu@host1.hoge.cxm only on {
  nonexist, invalid, timeout, resource, checksum,
  connection, permission, uid, gid
}

set httpd directive

This is used to control MONIT's built-in httpd server. It is not reasonable to disable it since most of the monit's commands such as "monit start all", "monit stop service " need built-in httpd functionality. It is possible to limit accessbility in several ways.

Example 1 (permit local to local only):

set httpd port 2812 and
  use address localhost
  allow 127.0.0.1

The localhost (or 127.0.0.1) of allow statement is indispensable to use such commands like `monit status' on the box itself. When there is a line `use address localhost', built-in httpd will listen on the loop-back address only, so any requests from other hosts can't reach it in the first place.

Example 2 (permit specific subnet - with clear text password):

set httpd port 2812 and
  allow 127.0.0.1
  allow 192.168.0.0/24
  allow hoshu:secret

If you need to access MONIT also from other hosts, specify the IP address (or resolvable name) like Example 2. As you can see, you can specify a range of addresses with subnet mask, which can be described either in bit style (like 255.255.255.0) or in CIDR. In this case, never define `use address localhost'. Without USE line, monit will listen on any ethernet devices attached to your box. Furthermore, monit can control access with user:password pairs shown as 4th line in the sample above. An important thing you should notice is that user:password definitions are used only for the real WEB control page but not for command line interface. The password in example 2 is written in clear text, so if you hate such savage way, see the next example;

Example 3 (permit specific subnet - with separate password file):

set httpd port 2812 and
  allow 127.0.0.1
  allow 192.168.0.0/24
  allow md5 /etc/monitpasswd hoshu
  allow dummyuser:b77e86f629174381d57e8f1895732d

Though most settings such as subnet are just the same as example 2, this uses an external password file, /etc/monitpasswd. You can store encrypted (or more precisely `hashed') passwords by this method (and only by this method). The algorithm part `md5' can also be `crypt' which no one may recommend. The external password file has to be of Apache's htpasswd style, which consists of user :password on each line and password portion must have `$id$salt$digest' format.

It is the easiest way that you utilize genuine Apache's htpasswd command to manage the password file. Though I leave further details to man page of htpasswd, below is a simple command example to create a new /etc/monitpasswd file and write a line of user `hoshu' and his MD5 hashed password;

root# htpasswd -c -m /etc/monitpasswd  hoshu

You can ommit -m option which means MD5 because it is the default algorithm of htpasswd utility.

I can almost hear a question "What is the 5th line of the Example 3 ?". As a matter of truth, MONIT can never accept any terminal commands such as `monit status' if monitrc has any directive lines related to user/password thing and doesn't have any clear text user:password definition besides it. This line is the dummy definition to work it around. The dummy password should be a baffling, hard-to-type one (sample is a MD5 hash of a word `detarame').

I once tried `allow dummyuser:xxx... read-only' to avoid mere possibility of accidental hit of the dummy password, but in vain. Such a definition ends up with 100% denial of commands like `monit restart apache'. This means that MONIT's read-only argument has certain effect not only on WEB control page but also on console commands.

When I experimented on MONIT 4.2.1, password authentication code behaved inadequately. "allow host" and "allow user:password" were not evaluated as AND but as OR. It meant that if you would have password authentication line in monitrc, anyone who could type a right user name and password would be accepted even if he is out of allowed IP addresses. I found this weakness has disappeared in MONIT 4.7, which actually denies access unless both the address AND user:password were sitisfied altogether.

Before MONIT 4.0, you could not use the service name `httpd' for your service definition name since it was reserved for MONIT's built-in httpd server. But this restriction doesn't exist any longer.

include directive

MONIT inserts the file(s) contents where it find a statement;

include file

in monitrc. This gives you an ability to split the configurations into several files to make management easier. For example, you can define global settings in the main monitrc while daemon-specific configurations in separate files. The original sample monitrc shows how to include all the files in /etc/monit.d/ directory by;

include /etc/monit.d/* 

However, by that way, MONIT will read really all the files in monit.d directory. To be more practical, you would better use something like;

include /etc/monit.d/*.rc

so that you can filter only the necessary files by their file extensions. In this case, you can forgo parse of, say, foo.rc by renameing it to foo.back when you temporarily want to hide it from MONIT. Though MONIT doesn't seem to be checking the permission of included files, you would better set them root-owned and 0600 just like the main configuration file.

PostgreSQL connection test

I may be responsible for explaining this since this is the test I wrote. If you use MONIT 4.7 or earlier, you need to apply pgsql-patch to make use of this protocol test.

A connection test in MONIT is done by opening a connection to the socket which the service is listening, sending some packets and then MONIT will decide if the service is alive based on response the service returns (or none at all). `Socket' can be a TCP/UDP port or a UNIX socket. Before dealing with PGSQL test, we might need to take a look at MONIT's connection test in general.

DNS service connection test example:

if failed host localhost port 53 type udp protocol dns with timeout 10
  then restart

The host argument can be ommited. In that case, host is assumed to be localhost. type defaults to TCP and you can ommit it if it is. protocol can be one of (as of MONIT 4.8) APACHE-STATUS, DNS, DWP, FTP, HTTP, IMAP, LDAP2, LDAP3, MYSQL, PGSQL, NNTP, NTP3, POP, POSTFIX-POLICY, RDATE, RSYNC, SMTP, SSH, TNS, and if you ommit it generic connection test will be used. timeout means how long MONIT will wait before it giives up, whose default value is 5 (seconds).

`pgsql' is not very special in synopsis. Now, let us examine the case when we want to test PostgreSQL's activity through its UNIX socket.

PostgreSQL connection test example (via UNIX socket):

if failed unixsocket /tmp/.s.PGSQL.5432 proto pgsql
  with timeout 15
  then restart
Prerequisites for PGSQL test

As PostgreSQL requires authentication even merely to connect it, certain preparations need to be done before practical use of this test. This procedure is not mandatory because the PGSQL test assumes it to be success when PostgreSQL might demand authentication or tell you there be no such user since they both mean functionality of postmaster. However, you'd better follow the procedure below to keep Postgres' log as clean as possible, that was the very initial aim for which I wrote this code. We are going to create DB user `root' for convenience because of the fact that MONIT is usually run by root. The example below assumes PostgreSQL is 8.x. If yours is older, some synopsis such as subnet format may vary;

  1. Create DB user `root'.
  2. Create a database 'root' owned by root. It doesn't need to contain any data.
  3. Add these descriptions to pg_hba.conf;
    host   root  root  127.0.0.1/32  trust          <= for test via TCP port
    local  root  root                ident sameuser <= for test via UNIX socket

depends on directive

"depends on" keyword can be used to make MONIT have another service to take action when an $EVENT occurred to certain service.

Caution ! :
You'd better NOT reference a service from a lot of definitions of services simultaneously. Note that when MONIT starts or restarts the parent service, it usually starts or restarts services dependent on it as well. This may end up with loop of controls, in which case MONIT will detect it and refuse to start.

Example besed on default configuration sample:

######## Parent process ########
check process apache with pidfile /var/run/httpd.pid
start program = "/etc/service/httpd start"
stop program = "/etc/service/httpd stop"
if failed host localhost port 80
  with timeout 10 seconds then restart
if cpu > 60% for 2 cycles then alert
if cpu > 80% for 5 cycles then restart
if loadavg(5min) > 10 for 8 cycles then restart
if 3 restarts within 5 cycles then timeout
depends on apache_bin
alert hoshu@host1.hoge.cxm on {
 restart, stop, timeout
}
######### Child process dependent on parent #########
check file apache_bin with path /usr/sbin/httpd
if failed checksum then unmonitor
if failed permission 755 then unmonitor
if failed uid root then unmonitor
if failed gid root then unmonitor
alert hoshu@host1.hoge.cxm on {
  checksum, permission, uid, gid, unmonitor
} with mail-format {
  subject: Alarm! $SERVICE hucked
}

Will MONIT stop apache when some intruder hucked the apache binary ? NO. Eventually, apache daemon was only unmonitored. When child detects some trouble, MONIT applies the same action (blue ones above) as specified for the event in child definition to the parent. Unmonitored process won't be started ever again, but if an emergency halt of parent is what you want, replace every ` unmonitor' with `stop'. Though "stop a file" is something strange in real world context, it is not a matter to MONIT.

Notice on checksum test:
You need to stop MONIT or unmonitor the service before you update a controled application. Otherwise, MONIT will detect checksum error and act, for example stop or unmonitor. When you foregot such procedure and MONIT unmonitored the service, restarting MONIT itself or
monit restart all
or
monit monitor <service>
will get them together.

group directive

Another useful directive is group. For example, qmail, qsmtpd, qpop3d discussed in qmail page can have;

group qmaild

Then you can stop, restart or unmonitor... them all at single command. If now you want to stop them;

root# monit -g qmaild stop all

will stop qmail, qsmtpd, qpop3d at a time. Don't foreget the last argument `all'.

Work around for non-chkconfig-compliant services

As an appendix, this section discuss how to get around some startup scripts which are not chkconfig compliant or you decline to elaborate such scripts. To tell the truth, I had used this method for whatever daemons, but I came to realize this was only a mess if they are chkconfig compliant.

Place the startup scripts

Create a directory and gather the startup scripts of MONIT controled services in it. Say directory is /etc/service;

root# mkdir /etc/service
root# mv /script/of/service /etc/service
root# chmod 755 /etc/service/service

Adjust MONIT init script

In this strategy, at system startup, monit is started from system init, then monit brings up the services in /etc/service. When you have some startup scripts in a directory outside of that of system init, you may need to adjust the chkconfig paremeters in the script of MONIT itself. If you intend to deal with, for example, a mail service in this way, standard startup order 98 of MONIT is too late since other daemons may want to send startup error messages. So, edit the line that looks like;

# chkconfig: 345 85 02

monitrc needs no special treatment for this.

Additional `Job killer' rc script

There still be one broblem to be resolved. MONIT does exit when "service monit stop" is called, but stopping controled service is another thing. Therefore, you need certain way to stop the services on shutdown to halt, restart or single user mode, because system init will no longer stop them. For this purpose I made a couple of scripts to kill them. You can use one of the two by copying it to system init directory.

Idea 1: Kill services via MONIT

monitjob script uses monit to stop the services, so it should be run before monit is killed. To accomplish this, MONIT's kill order in /etc/monitrc must be delayed considerably. MONIT's has to be 03 or later, as monitjob's is 02 per default.

# chkconfig: 345 85 03

Idea 2: Kill services via their startup scripts

monitjob2 features a bit cleverer method. This searches regular files whose permissions are 755 in the service directory (/etc/service as to example above) using find and calls each of them with stop argument. This doesn't care whether MONIT is dead or alive. But as it would be reasonable that this script will be called after MONIT was killed, I set its KILL order to 35. You don't need to delay MONIT's KILL parameter.

Put the Job killer in place

Both of them are written in chkconfig compliant format, then you can register it to appropriate run levels by;

root# cp (monitjob or monitjob2) /etc/rc.d/init.d/monitjob
root# chmod 755 /etc/rc.d/init.d/monitjob
root# chkconfig --add monitjob

The Job killertouches a dummy lock file /var/lock/subsys/monitjob when called with start argument. Sysvinit won't kill any service that doesn't have a lock file whose name is same as init script file, when it falls to init level, say, 0, 1 or 6. Therefore if you want it to work on next system halt without restarting your box now, do;

root# service monitjob start

once. From now on, there will be no need to do this manually since it will create one for you every time the system starts.