This site has begun just as my own memo. Refer to the specific OFFICAL documentations or other resources and utilize this at your own risk. ABSOLUTELY NO WARRANTY. |
What daemons are watched and how they are treated are governed via the main configuration file /etc/monitrc. Most things are so clearly described in MONIT's default sample that I don't think every detail needs to be mentioned. So here I explain only the rather complicated matters that bothered me before. Do note that monitrc must be readable/writable only by its owner, which usually means root:root 600 becuase it is root that usually be in charge of this kind of work.
set logfile /var/log/monit
in the case above, logs are written by monit itself to /var/log/monit. On the other hand;
set logfile syslog facility log_local1
will be more appropriate, with which monit just hand over the messages to syslog daemon and the latter is in charge of actual output. Facility argument should not be something like 'LOCAL1'
but should be `log_something'.
To get messages actually be logged, you should add a line to syslog.conf according to monitrc, Say `facility log_local1' was declared in monitrc, you can have monit logs written to /var/log/monit by adding the line below to syslog.conf;
local1.* /var/log/monit
Note that the log file will grow endlessly unless you setup logrotate or some sort.
The global mail-format setting defines the default which will applied to every service unless it has its own. Among some other directives, definitions in each individual service block overrides the global defaults. We can use certain variables in mail-format. They are;
$EVENT | Event which triggered the alert. `checksum failed', `connection failed', `timeout', `does not exist' and so on |
$SERVICE | Service name defined in monitrc |
$DATE | Date and time when the event happened |
$HOST | Name of host on which MONIT is running |
$ACTION | The action MONIT took on the event. One of alert, monitor, unmonitor, start, stop, restart, exec |
$DESCRIPTION | Description of the error condition |
Example;
from: monit@host1.hoge.cxm subject: $SERVICE $ACTION on $HOST message: Action:$ACTION taken because $DESCRIPTION Date: $DATE Host: $HOST Event: $EVENT
A space just after `from:', 'subject:' and `message:' are ignored, while the remains of 'message:' part will be sent as is, including spaces, tabs and linefeeds.
This directive, too, is about alert mail, but this directive can define what occasions you want them on. Without this, among others, every start and stop of monit itself will be informed. They are annoying to me, then I can now get rid of such mails by explicitly specifying the events which I want an mail on,
The relationship between various tests and events are not very clearly described in either MONIT's manual or man. My experiments have revealed some for now;
For readablity, the events below are separeted into two lines. But I found that they must be in one line, otherwise only the parameters untill the first newline (from nonexist to checksum in this example) are recognized.
set alert hoshu@host1.hoge.cxm only on { nonexist, invalid, timeout, resource, checksum, connection, permission, uid, gid }
This is used to control MONIT's built-in httpd server. It is not reasonable to disable it since most of the monit's commands such as "monit start all", "monit stop service " need built-in httpd functionality. It is possible to limit accessbility in several ways.
Example 1 (permit local to local only):
set httpd port 2812 and use address localhost allow 127.0.0.1
The localhost (or 127.0.0.1) of allow statement is indispensable to use such commands like `monit status' on the box itself. When there is a line `use address localhost', built-in httpd will listen on the loop-back address only, so any requests from other hosts can't reach it in the first place.
Example 2 (permit specific subnet - with clear text password):
set httpd port 2812 and allow 127.0.0.1 allow 192.168.0.0/24 allow hoshu:secret
If you need to access MONIT also from other hosts, specify the IP address (or resolvable name) like Example 2. As you can see, you can specify a range of addresses with subnet mask, which can be described either in bit style (like 255.255.255.0) or in CIDR. In this case, never define `use address localhost'. Without USE line, monit will listen on any ethernet devices attached to your box. Furthermore, monit can control access with user:password pairs shown as 4th line in the sample above. An important thing you should notice is that user:password definitions are used only for the real WEB control page but not for command line interface. The password in example 2 is written in clear text, so if you hate such savage way, see the next example;
Example 3 (permit specific subnet - with separate password file):
set httpd port 2812 and allow 127.0.0.1 allow 192.168.0.0/24 allow md5 /etc/monitpasswd hoshu allow dummyuser:b77e86f629174381d57e8f1895732d
Though most settings such as subnet are just the same as example 2, this uses an external password file, /etc/monitpasswd. You can store encrypted (or more precisely `hashed') passwords by this method (and only by this method). The algorithm part `md5' can also be `crypt' which no one may recommend. The external password file has to be of Apache's htpasswd style, which consists of user :password on each line and password portion must have `$id$salt$digest' format.
It is the easiest way that you utilize genuine Apache's htpasswd command to manage the password file. Though I leave further details to man page of htpasswd, below is a simple command example to create a new /etc/monitpasswd file and write a line of user `hoshu' and his MD5 hashed password;
root# htpasswd -c -m /etc/monitpasswd hoshu
You can ommit -m option which means MD5 because it is the default algorithm of htpasswd utility.
I can almost hear a question "What is the 5th line of the Example 3 ?". As a matter of truth, MONIT can never accept any terminal commands such as `monit status' if monitrc has any directive lines related to user/password thing and doesn't have any clear text user:password definition besides it. This line is the dummy definition to work it around. The dummy password should be a baffling, hard-to-type one (sample is a MD5 hash of a word `detarame').
I once tried `allow dummyuser:xxx... read-only' to avoid mere possibility of accidental hit of the dummy password, but in vain. Such a definition ends up with 100% denial of commands like `monit restart apache'. This means that MONIT's read-only argument has certain effect not only on WEB control page but also on console commands.
When I experimented on MONIT 4.2.1, password authentication code behaved inadequately. "allow host" and "allow user:password" were not evaluated as AND but as OR. It meant that if you would have password authentication line in monitrc, anyone who could type a right user name and password would be accepted even if he is out of allowed IP addresses. I found this weakness has disappeared in MONIT 4.7, which actually denies access unless both the address AND user:password were sitisfied altogether.
Before MONIT 4.0, you could not use the service name `httpd' for your service definition name since it was reserved for MONIT's built-in httpd server. But this restriction doesn't exist any longer.
MONIT inserts the file(s) contents where it find a statement;
include file
in monitrc. This gives you an ability to split the configurations into several files to make management easier. For example, you can define global settings in the main monitrc while daemon-specific configurations in separate files. The original sample monitrc shows how to include all the files in /etc/monit.d/ directory by;
include /etc/monit.d/*
However, by that way, MONIT will read really all the files in monit.d directory. To be more practical, you would better use something like;
include /etc/monit.d/*.rc
so that you can filter only the necessary files by their file extensions. In this case, you can forgo parse of, say, foo.rc by renameing it to foo.back when you temporarily want to hide it from MONIT. Though MONIT doesn't seem to be checking the permission of included files, you would better set them root-owned and 0600 just like the main configuration file.
I may be responsible for explaining this since this is the test I wrote. If you use MONIT 4.7 or earlier, you need to apply pgsql-patch to make use of this protocol test.
A connection test in MONIT is done by opening a connection to the socket which the service is listening, sending some packets and then MONIT will decide if the service is alive based on response the service returns (or none at all). `Socket' can be a TCP/UDP port or a UNIX socket. Before dealing with PGSQL test, we might need to take a look at MONIT's connection test in general.
DNS service connection test example:
if failed host localhost port 53 type udp protocol dns with timeout 10 then restart
The host argument can be ommited. In that case, host is assumed to be localhost. type defaults to TCP and you can ommit it if it is. protocol can be one of (as of MONIT 4.8) APACHE-STATUS, DNS, DWP, FTP, HTTP, IMAP, LDAP2, LDAP3, MYSQL, PGSQL, NNTP, NTP3, POP, POSTFIX-POLICY, RDATE, RSYNC, SMTP, SSH, TNS, and if you ommit it generic connection test will be used. timeout means how long MONIT will wait before it giives up, whose default value is 5 (seconds).
`pgsql' is not very special in synopsis. Now, let us examine the case when we want to test PostgreSQL's activity through its UNIX socket.
PostgreSQL connection test example (via UNIX socket):
if failed unixsocket /tmp/.s.PGSQL.5432 proto pgsql with timeout 15 then restart
As PostgreSQL requires authentication even merely to connect it, certain preparations need to be done before practical use of this test. This procedure is not mandatory because the PGSQL test assumes it to be success when PostgreSQL might demand authentication or tell you there be no such user since they both mean functionality of postmaster. However, you'd better follow the procedure below to keep Postgres' log as clean as possible, that was the very initial aim for which I wrote this code. We are going to create DB user `root' for convenience because of the fact that MONIT is usually run by root. The example below assumes PostgreSQL is 8.x. If yours is older, some synopsis such as subnet format may vary;
host root root 127.0.0.1/32 trust <= for test via TCP port local root root ident sameuser <= for test via UNIX socket
"depends on" keyword can be used to make MONIT have another service to take action when an $EVENT occurred to certain service.
Caution ! :
You'd better NOT reference a service from a lot of definitions of services simultaneously. Note that when MONIT starts or restarts the parent service, it usually starts or restarts services dependent on it as well. This may end up with loop of controls, in which case MONIT will detect it and refuse to start.
Example besed on default configuration sample:
######## Parent process ######## check process apache with pidfile /var/run/httpd.pid start program = "/etc/service/httpd start" stop program = "/etc/service/httpd stop" if failed host localhost port 80 with timeout 10 seconds then restart if cpu > 60% for 2 cycles then alert if cpu > 80% for 5 cycles then restart if loadavg(5min) > 10 for 8 cycles then restart if 3 restarts within 5 cycles then timeout depends on apache_bin alert hoshu@host1.hoge.cxm on { restart, stop, timeout } ######### Child process dependent on parent ######### check file apache_bin with path /usr/sbin/httpd if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor alert hoshu@host1.hoge.cxm on { checksum, permission, uid, gid, unmonitor } with mail-format { subject: Alarm! $SERVICE hucked }
Will MONIT stop apache when some intruder hucked the apache binary ? NO. Eventually, apache daemon was only unmonitored. When child detects some trouble, MONIT applies the same action (blue ones above) as specified for the event in child definition to the parent. Unmonitored process won't be started ever again, but if an emergency halt of parent is what you want, replace every ` unmonitor' with `stop'. Though "stop a file" is something strange in real world context, it is not a matter to MONIT.
Notice on checksum test:
You need to stop MONIT or unmonitor the service before you update a controled application. Otherwise, MONIT will detect checksum error and act, for example stop or unmonitor. When you foregot such procedure and MONIT unmonitored the service, restarting MONIT itself or
monit restart all
or
monit monitor <service>
will get them together.
Another useful directive is group. For example, qmail, qsmtpd, qpop3d discussed in qmail page can have;
group qmaild
Then you can stop, restart or unmonitor... them all at single command. If now you want to stop them;
root# monit -g qmaild stop all
will stop qmail, qsmtpd, qpop3d at a time. Don't foreget the last argument `all'.
As an appendix, this section discuss how to get around some startup scripts which are not chkconfig compliant or you decline to elaborate such scripts. To tell the truth, I had used this method for whatever daemons, but I came to realize this was only a mess if they are chkconfig compliant.
Create a directory and gather the startup scripts of MONIT controled services in it. Say directory is /etc/service;
root# mkdir /etc/service root# mv /script/of/service /etc/service root# chmod 755 /etc/service/service
In this strategy, at system startup, monit is started from system init, then monit brings up the services in /etc/service. When you have some startup scripts in a directory outside of that of system init, you may need to adjust the chkconfig paremeters in the script of MONIT itself. If you intend to deal with, for example, a mail service in this way, standard startup order 98 of MONIT is too late since other daemons may want to send startup error messages. So, edit the line that looks like;
# chkconfig: 345 85 02
monitrc needs no special treatment for this.
There still be one broblem to be resolved. MONIT does exit when "service monit stop" is called, but stopping controled service is another thing. Therefore, you need certain way to stop the services on shutdown to halt, restart or single user mode, because system init will no longer stop them. For this purpose I made a couple of scripts to kill them. You can use one of the two by copying it to system init directory.
monitjob script uses monit to stop the services, so it should be run before monit is killed. To accomplish this, MONIT's kill order in /etc/monitrc must be delayed considerably. MONIT's has to be 03 or later, as monitjob's is 02 per default.
# chkconfig: 345 85 03
monitjob2 features a bit cleverer method. This searches regular files whose permissions are 755 in the service directory (/etc/service as to example above) using find and calls each of them with stop argument. This doesn't care whether MONIT is dead or alive. But as it would be reasonable that this script will be called after MONIT was killed, I set its KILL order to 35. You don't need to delay MONIT's KILL parameter.
Both of them are written in chkconfig compliant format, then you can register it to appropriate run levels by;
root# cp (monitjob or monitjob2) /etc/rc.d/init.d/monitjob root# chmod 755 /etc/rc.d/init.d/monitjob root# chkconfig --add monitjob
The Job killertouches a dummy lock file /var/lock/subsys/monitjob when called with start argument. Sysvinit won't kill any service that doesn't have a lock file whose name is same as init script file, when it falls to init level, say, 0, 1 or 6. Therefore if you want it to work on next system halt without restarting your box now, do;
root# service monitjob start
once. From now on, there will be no need to do this manually since it will create one for you every time the system starts.