« March 2008 | Main | May 2008 »

April 21, 2008

Locking from Cron

Periodic jobs must often not run more than one instance at a time. Unfortunately, simple solutions often fail to account for common edge cases. For example, assume a need to synchronize files each hour with rsync. On unix, a cron job is perhaps the quickest solution:

# run just past the top of the hour # as many other things run then 7 * * * * rsync -e ssh -az --delete /source desthost:/dest

However, this solution has a major edge case that can bring down the system. Worse, simplistic attempts to fix this fault can result in rsync not running.

Technorati Tags:

Resource Usage Spiral

rsync, run directly from cron, will run until the file transfer has completed, or a bug causes the rsync to hang, or some fault causes rsync to terminate unexpectedly: bad memory, a system reboot, a cowboy admin getting frisky with kill -9, and so forth. The most worrying is when rsync operates normally, but takes too long to complete. In this case, cron will launch subsequent rsync, and over time, if the other rsync never exit, the system will eventually fail.

The rsync --timeout=999 option is useful, but not complete. This option ensures rsync will eventually exit. However, it will not help when crond launches the next instance while rsync is still transferring files. A wrapper script around rsync is necessary to prevent multiple instances from running.

Naïve Locking Schemes

< Lovecraft> thrig: its a matter of making a script that looks for a lock file. if exist <lockfile>, don't start. Ifnot exist <lockfile> make one and start. < thrig> Lovecraft: and what else? < Lovecraft> Thats it

File locking requires slightly more thought than looking for a lock, and not running if it exists. Lovecraft’s incomplete solution will cause problems should the system crash, should the system restart normally without the script handling the shutdown signal properly, should the script be terminated by kill -9 (even if it trapped other signals properly), should a hardware fault cause the script to exit, should a system configuration issue prevent the lock file from being created. Worst, if poorly written and poorly monitored, nobody may know the rsync process is not running—until some need reveals the lack of recent files on the destination server, which could be weeks or months since things went awry.

#!/bin/sh # Skeleton locking with signal handling. # Much more sanity checking required! PID_FILE="/var/lock/foopid" cleanup () { rm -- "$PID_FILE" } trap "cleanup" 0 1 2 13 15 # Race condition probably not a concern, # due to the infrequency of rsync runs. [ -e "$PID_FILE" ] && exit touch "$PID_FILE" || exit rsync --timeout=999 -e ssh -az \ /sourcedir/ desthost:/destdir cleanup

Lock files introduce a new problem: the lock file suggests—but by no means proves—the associated process is actually running. That is, depending on the implementation, the rsync may be running, and no lock file created—a permissions problem coupled with an “ignore errors creating lock file” implementation—or the rsync process may not be running, and a lock file exists, for the various reasons outlined above. Consider locking against the process name, not a file on disk.

Ideas for Improvements

Software besides crond, such as CFEngine, provide locking functionality. If possible, use these solutions, as they are likely better tested than an in-house shell script. CFEngine or similar configuration management software can also ensure the lock directory exists and has the correct permissions, if a lock file scheme is used.

With Perl, one solution is to lock the script itself via the special __DATA__ filehandle, which will avoid the various problems of an external lock file. I generally prefer Perl over the shell, as the shell lacks the equivalent of perl -c, makes writing unit tests difficult, and has a number of scary edge cases that can delete entire disks.

Another implementation wraps the rsync inside a loop. This prevents multiple rsync from running, but pushes the locking and monitoring to the wrapper script instead. This runs the process an hour after the previous one completes, not once every hour.

while sleep 3600; do rsync ... done

Monitoring whether rsync actually did anything is another can of worms. This monitoring should not be an e-mail, as if frequent it will become cron spam, filtered and ignored. Monitoring should also not report transitory errors, where the target is temporarily unavailable, as investigating false alarms waste time.

April 20, 2008

On Unicode and 𝌡

Folks often wander into #perl, asking about “Wide character …” messages. For unknown reasons, some people are reluctant to read the perldiag documentation that explains this message, and, having been cajoled into reading the documentation, reluctant to follow the advice it outlines, instead insisting on a regex solution to perhaps strip the offending characters. An unrelated discussion at work educed two must reads regarding Unicode:

  1. Thou shalt grok Unicode!
  2. Unicode-processing issues in Perl and how to cope with it

More Unicode.

Technorati Tags: ,

Seattle Weather Links

Mostly for outdoor photography reference by myself:

April 16, 2008

Seattle Japanese Garden

Teahouse

The Japanese Garden in Seattle offers several photography sessions each year. These let you use a tripod and other gear not normally permitted in the garden. Well worth the money, as there were only three other photographers, and the weather mostly cooperated. Most useful lenses were the 18-70 kit zoom and 105mm macro.

April 12, 2008

Chocolate!

Chocolate!

This photo required six bars of chocolate, three vertical on each side. There is a lego roof keeping the bars vertical, between two boxed sets of the Tale of Genji (Seidensticker and Tyler). The only lighting is from a desk lamp, the direct shine of which being masked off from the camera by shirts and chess boards. The Ferrero Rocher (picked up pretty much at random from the store; I had no idea how large the central chocolate should have been) rests atop a book case, and the background is a cork board. A polarizer was used, though I set it when the chocolate was wrapped, not unwrapped, so that might have been adjusted better. Had trouble lining the camera up square with the opening, which ball heads make difficult. May want to pickup a head that allows finer adjustment.

Technorati Tags:

April 10, 2008

Islamabomb

Roughly 20 years ago, Ojhri went up in smoke following a very loud bang. Shells rained down on the International School of Islamabad, totaling—if memory serves—13½ hits, including one to the auditorium we had gathered in. This was followed by several days off, while folks from the U.S. Carrier docked in Karachi dug out ordnance from people's driveways. Others did not fare so well.