NO MORE UPDATES TO THIS PAGE PLEASE. SUBMIT ALL FUTURE COMMENTS TO |
24.1 pg 508
Warning: excessive monitoring can also impact CPU usage. I've seen servers with multiple monitoring systems installed (legacy, current, new) all running at the same time, and between them they were chewing up 10% of the CPU. I've also seen servers where monitoring has gone "mad" and was running hundreds of processes and blocking cron from running any more, because the server they were trying to report to was down.
24.1.1 pg 510
Ensure you have STRONG access authentication to the monitoring daemons. Sometimes these monitoring tools can leak data (eg process lists) which may be potentially useful for attackers. This means don't use a simple password passed in plain text over the network. SNMPv1 community strings, I'm looking at you! SNMPv2 can easily be misconfigured this way as well.
21.1.2 pg 511
Ensure you have STRONG access authentication to the alerting process. If it's a simple SNMP trap using standard community strings then it's easy for an attacker to mount a
DoS? attack on your monitoring gateway by flooding it with spoofed trap messages. I'm sure you'd love being woken up at 3am by your pager beeping because someone decided this would be funny!
--
StephenHarris - 24 Aug 2006
Active monitoring pg 514
Example of limits to what active monitoring can do, and the problems that may arise: A system that monitored /var would rotate log files if the disk filled up too much. In effect this deleted the oldest log file, and so reclaimed space. One day the log files were being rotated each time the monitoring system checked the disk, and the /var partition was 100% full. Why? Obviously the log files were no use; they were all zero length! It turns out that a developer had turned on debugging for one of his processes that was run from cron once a minute. And so, once a minute, the cron job sent out an email. The mailbox (in /var/mail) filled up the disk!
Another case where we
didn't have active monitoring but it would have had similar problems. The monitoring system alerted that the disk was filling up and "df" agreed; however "du" said the disk was mostly empty. I was called in and managed to track it down; a developer had kicked off a long running job in debug mode so he could track down an issue. He logged the output to a file /var/tmp. Once he found the problem he simply removed the file rather than restarting the job. Oops. The developer was unaware that deleting an open file on Unix doesn't reclaim the disk space and so the "invisible" file was still growing in size and slowly filling up /var/tmp. If we had still used the active monitoring solution then log files would have been rotated unnecessarily!
24.2.5 pg 518
This isn't necessarily the responsibility of the SA team; it's more an AD issue. However it's always nice to integrate the AD monitoring and alerting system into the SAs pervasive monitoring solution so the AD team can gain the advantage of all the benefits (escalation, problem tickets, paging gateways etc) without having to duplicate infrastructure. In effect the AD team becomes a customer of the SA teams monitoring service.
--
StephenHarris - 25 Aug 2006