Eliminate Spurious Errors
Software must not emit warnings that are not errors. Needless errors clog log files, increase data processing and storage costs, and greatly complicate log analysis. At best, a new hire will debug a script, and waste time asking “is this message normal?” Better sites might Wiki “ignore this log” and hope the new hire can find it. Best sites kill off the message (or lower the severity to notice or below), and the time is never wasted wondering, documenting, and retraining. Without clear mappings of log levels to actions, one enters expression hell, where long action lists evolve: warnings X, Y, and Z require action but not M, Q, or Y. Except on Tuesday. Maintaining such lists is both time consuming and error prone.
Instead make logs actionable: specific priorities must map to specific actions. For example, a emerg or alert syslog(3) message always results in a severity 1 (highest priority) ticket and a page, crit or err messages a severity 2 ticket and page, warning messages a severity 3 ticket but no page, and no action for any lower priority. Simple to code for, and easy to decide what sort of response (and therefore priority) a new log message requires. Actionable log levels also create automation. Under Tomcat, developers could mandate any FATAL logs mean the instance requires immediate restart. Easy to check for in a log file, and automatically thread dump, kill, and restart java should a FATAL turn up.
In-house code benefits most from actionable logs. Vendor software may emit no logs, or use bizarre priority levels for trivial data: automount on Mac OS X used to log the automount version under daemon.err!. Worse, stock syslogd(8) omit the facility and priority information by default. Use monitoring software such as Nagios to trigger actionable events where vendor logs lack good information, and reserve log-based actions generated by tools like sec.pl to well known errors, such as disk full or kernel panics.
No news is good news: also eliminate spam from cron(8) jobs. Larger sites with high turnover may end up with hundreds of daily notifications, mostly junk, mixed with a few critical messages. Identify the required messages, and direct their output to role based mailing lists (never root or directly to a user), then kill off everything else. If the notification message confirms something ran, instead write a low priority log message (or touch a last-ran-on status file), then have another utility warn if the message (or the last-ran-file) was last updated too long ago.