« Rollei Scanfilm CN 400 | Main | North Facing Windows »

Disk Space

$ df; df -i; sudo lsof | grep deleted

On unix systems, when disk space runs low, the first three commands to run are df, which shows how much space file contents occupy; df -i, whether available inodes have been consumed; and sudo lsof | grep deleted, what deleted files remain held open by a running process. Using this information, partitions can then be investigated for the source of the problem, or processes holding a large file open restarted.

Note that some file systems will never run out of inodes, as they use B+ tree or other solutions instead of a count set when the filesystem was created.

Monitoring

Disk space usage should be monitored with software such as Nagios. Alarming is best done at two levels: a non-paging ticket when disk space reaches warning levels, and a paging ticket when disk space is critically short. Using this model, runaway disk space use will quickly cause a page, while slow consumption of space can be acted on well before a page is necessary.

Different alarm levels must be set for different classes of systems: database servers might well run certain partitions up to 99% full under normal use, while other systems may require a warning ticket well before 90% usage.

Remediation

In my experience, logging causes the majority of disk space problems. Eliminate verbose and debug logging in production, as these messages hardly justify the disk space and I/O operations required. Where possible, eliminate stack trace spam: a remote connection failure warning never needs 100 lines saying where in the code the failure took place; a single line suffices, as the problem has nothing to do with the code.

Estimate disk space consumption: the amount of data logging per transaction or request in production should be known, and can be multiplied by the anticipated traffic levels, plus the duration logs must remain around, plus padding for growth and filesystem overhead. This should be done before the service launches, not afterwards!

Use proper log file handling, not problematic logrotate implementations. messages.1 to messages.2 rotation thwarts rsync style archival, and does not divide the logs into time-based buckets as better solutions do.

Technorati Tags: ,