« July 2007 | Main | September 2007 »

August 28, 2007

Delete Element from Array

A common Perl programming question is “how can I remove one or more elements from an array?” Some approaches to this problem have unexpected side-effects best revealed by Data::Dumper and minimal test data.

#!/usr/bin/perl -w use strict; use Data::Dumper; my @a = qw(a b c);

The following methods may or may not work correctly:

Technorati Tags: ,

  • shift off the unwanted element. Buggy, and illustrative why not reading the documentation for shift can get you in trouble.

    for my $element ( @a ) { if ( $element eq 'b' ) { shift @a; } } print Dumper \@a;

    Results in the first element of the array being removed, as specified in the documentation for shift. Yes, someone on #perl asked why this method did not work.

    $VAR1 = [ 'b', 'c' ];

  • delete the unwanted element. This might be an obvious choice after pawing through perlfunc a bit.

    for my $i ( 0 .. $#a ) { if ( $a[$i] eq 'b' ) { delete $a[$i]; } } print Dumper \@a;

    This requires a slower C-style loop, and is also buggy! delete does not remove middle elements, instead replacing them with undef values. At best, this will result in Use of uninitialized value … errors. At worst, some horrible show stopping bug:

    $VAR1 = [ 'a', undef, 'c' ];

  • splice the unwanted element. Also requires a slow C-style loop, and reveals a problem with modifying an array while looping over it:

    for my $i ( 0 .. $#a ) { if ( $a[$i] eq 'b' ) { splice @a, $i, 1; } } print Dumper \@a;

    Use of uninitialized value in string eq at test line 8. $VAR1 = [ 'a', 'c' ];

    The element is correctly removed, though a warning is generated due to the array having shrunk so the former $#a value the for loop uses is now one beyond the current number in @a. And where there are warnings, be wary of bugs. For example, what happens when the array contains qw(a b b c)? The second b is not deleted, as it slides into the position of the first b after the splice, and is therefore missed by the next iteration!

  • Use grep and remove the unwanted element.

    @a = grep { $_ ne 'b' } @a; print Dumper \@a;

    Short, elegant, and supports the qw(a b b c) case that splice failed on. What I would use.

Other options would include building up a new array from things-that-are-not-b, or building up a list of element numbers to remove from the array, then splicing those (highest to lowest index!) from the original array. But why, when there’s grep?

August 25, 2007

Smothered Mate

White to move, mate in five:

FEN 1r4k1/6pp/4Np2/3Q4/8/8/1q3PPP/6K1 w - - 2 1

Technorati Tags:

Each White move must be a check, as otherwise the White King will be mated on the back rank. White has a reveal check, allowing the Knight to move to otherwise attacked squares with impunity. The threat of mate-in-one (Qf7#, supported by the Knight) will force the King to bounce between h8 and g8 as the Knight repositions itself for the smothered mate:

  1. Ng5+ Kh8
  2. Nf7+ Kg8
  3. Nh6+ Kh8
  4. Qg8+ Rxg8
  5. Nf7#

(The Knight could also begin with Nd8 for the same result.)

August 22, 2007

Unix Hard vs. Soft Links

An exploration of Unix symbolic (soft) and hard links. Commands used: ln(1), ls(1), and touch(1).

$ ls $ touch original $ ln original hardlink $ ln -s original softlink $ ls -li total 8 4891103 -rw-rw-r-- 2 jdoe jdoe 0 Aug 17 23:18 hardlink 4891103 -rw-rw-r-- 2 jdoe jdoe 0 Aug 17 23:18 original 4891112 lrwxrwxr-x 1 jdoe jdoe 8 Aug 17 23:18 softlink -> original $ touch four $ ln -s four longername $ ls -l four longername -rw-rw-r-- 1 jdoe jdoe 0 Aug 17 23:27 four lrwxrwxr-x 1 jdoe jdoe 4 Aug 18 11:55 longername -> four

Observations:

  • The hardlink file uses the same inode number as the original.

  • Both the hardlink and original are empty, and share the same file type (a hyphen).

  • The softlink file is not empty, and has a l instead of a hyphen in the file type column.

  • The softlink file has a different inode number than the hard linked file.

  • The size of the symbolic link file appears directly proportional to the length of the filename it points to.

  • ls -l somehow knows how to display the file the symbolic link points to.

Technorati Tags:

Hard links may only exist within a single partition, as each new hard link is simply a new filename-to-inode entry in a directory file. In contrast, a symbolic link is a special Unix file type (), and contains a file path that points elsewhere. ls and other programs use the readlink(2) system call to determine what file a symbolic link file points to, and then other system calls to reach the actual file. The stat(2) and other system calls automatically follow symbolic links, making them largely transparent. stat(2) also details the various allowed file types, including “plain text” (regular) and symbolic link files:

     #define S_IFMT   0170000  /* type of file */
     #define S_IFIFO  0010000  /* named pipe (fifo) */
     #define S_IFCHR  0020000  /* character special */
     #define S_IFDIR  0040000  /* directory */
     #define S_IFBLK  0060000  /* block special */
     #define S_IFREG  0100000  /* regular */
     #define S_IFLNK  0120000  /* symbolic link */
     #define S_IFSOCK 0140000  /* socket */
     #define S_IFWHT  0160000  /* whiteout */

In closing, the whiteout type has nothing to do with Snow Crash.

August 18, 2007

Displacement Attacks

In chess, a displacement attack drags a defender away from the piece it guards, allowing a capture of a previously guarded piece. Usually this involves a sacrifice, a check, or both. Black to move:

FEN 6k1/2b3p1/p7/2rp4/8/P1P2P2/1B1qQK2/5R2 b - - 6 1

Technorati Tags:

Bg3+ delivers a huge advantage for Black, as the White King may only capture the Bishop, or less ideally flee to g1. This removes the White King as the only defender of the Queen, allowing Black to capture the Queen for a sizable advantage.

This position is based on a Chess Ladder game at work (25 minute games played over lunch with a chess timer), and mistakes were made on both sides, as either Black had a mate-in-four involving the rook (or a longer mate-in-six with the Bishop check) or the White King could have migrated safely to f3 (hence there being a Pawn there now), but I cannot remember the exact position or what better moves were missed at the end.

Speaking of Chess Ladders, the largest problem is that the same people end up playing one another, and a joiner will fairly quickly rise to their skill level, and usually play the same two to three people week after week. Perhaps with a larger chess ladder, more players of equivalent rank would allow for more frequent changes in position?

August 17, 2007

Random Unix Shell Tips

Hyphens in Filenames

Fun with deleting a file named -rf (or some other name that the shell will blindly trip over). Easier than my previous method of “find the inode, and delete by that”, or using a language that does not blow up when leading hyphens or other $IFS characters appear in filenames:

$ ls -rf $ rm -rf $ ls -rf $ rm * $ ls -rf $ rm -- -rf $ ls $ touch ./-rf $ ls -rf $ rm ./-rf $ ls $

The leading path (./) hides the otherwise “hey! a command option” from the shell. The -- only works on systems with getopt(3) libraries that support that (good luck making scripts portable, and also not prone to scary bugs or unpredictable behavior when leading hyphens or spaces sneak into the filename).

Missing Trailing Newlines

Files on Unix without trailing newlines (/proc/$A_PID/cmdline on Linux or files edited on Windows editors that omit the trailing newline by default) can cause problems. Certain crond will not run the last command listed in a crontab file, if that line does not end with a newline. Programs such as vi will warn if the trailing newline is missing, and for good reason! Utilities to debug this problem include od(1) or hexdump(1):

$ echo -n nonewline > thefile $ tail -c 4 thefile | od -bc 0000000 154 151 156 145 l i n e 0000004 $ tail -c 4 thefile | hexdump -C 00000000 6c 69 6e 65 |line| 00000004

If only shell builtins are available, such as when a system cannot fork new processes, the read builtin has trouble with files that lack a trailing newline (such as the aforementioned /proc/$A_PID/cmdline on Linux):

$ while read line; do echo $line; done < thefile $

The read failed, and nothing was printed due to the lack of a trailing character. To workaround this, simply print the variable to see the “missing” data:

$ while read line; do echo $line; done < thefile $ read line < thefile $ echo $? 1 $ echo $line nonewline $

Technorati Tags:

August 14, 2007

Recent Readings

Light summer reading includes:

  • The Children of Húrin. If Greek, a chorus would sing the Hero’s doom. Doom, doom, doom. Doom. Spare, epic writing, enjoyable.
  • King Lear (old Crowell critical library edition from a random University district used book store). Annotations variously useful, useless, or missing to spice things up. 乱 [Ran] still excellent, having read King Lear only after watching the movie.

And on slightly less depressing notes, more excellent works by Le Guin:

And I now possess a new stack of “you really have to read this!” books from coworkers and friends…

August 07, 2007

Double Check

In chess, double checks force the King to move, as no one piece can block or capture both attackers.1 If the King cannot move, then the check is mate. In the following example, White is to move next.

FEN 6rk/8/p7/1p6/8/2R5/1B4PP/1b4K1 w - - 0 1

Technorati Tags:

Material is even, though Rh3# wins for White, checking the King with both the Rook and now revealed Bishop. Against a individual check by the White Rook or Bishop, the Black Bishop or Black Rook could interpose against a single attacker, or the King could escape to g7 or h7. If the Black King was not constrained by the Rook on g8, a reveal check of Rc1+ would be best, and win the Bishop.

Double check should be possible in Shogi, though probably far less likely. Most Shogi pieces lack the range (Generals only attack adjacent squares) or maneuverability (Knights only move far forward) required for a reveal check to work, and drops keep the board cluttered with pieces.

1 Options for a King in check: the King flees, a piece is interposed between the attacker and King, or the attacking piece is eaten2. Bobby Fischer Teaches Chess covers these rules with plenty of tests—a decent book to read through on the bus, as the tests require no board.

2 Russians use “eat”3 instead of “capture” to describe a piece being taken.

3 Because I cannot resist a double-nested footnote.

August 02, 2007

E-mail Administration Methods

Mail Transport Agents (MTA) such as Sendmail relay e-mail from senders to recipients, except when things break. This and subsequent articles cover methods to debug e-mail delivery problems. Focus will remain on understanding and debugging SMTP and Unix MTA, though hopefully the methods will abstract to other systems.

A mail administrator should be able to answer the following low-level questions regarding MTA and SMTP. Knowledge of networking protocols and debugging Unix systems will be very helpful.

  • How do DNS and hostname settings affect e-mail, and MX records in particular?
  • In what ways does the MTA accept e-mail?
  • Where does it send the e-mail to?
  • Does it route or reject depending on the envelope addresses?
  • What is the difference between a envelope address, and a body address?
  • Where is the configuration for the MTA located?
  • How is the configuration updated? When does the MTA need to be restarted following configuration changes?
  • Where do the MTA logs go?
  • What do the MTA logs show? Can a message be traced across a system, and then looked up on the next and subsequent SMTP servers?
  • What commands show the state of the MTA queue directories?
  • What is the difference between a synchronous and an asynchronous bounce of a message?
  • How are rejected messages handled?

A mail administrator must be able to generate command line or SMTP test messages, and know how to vary the envelope and body content of these test messages. For example, a report of a mail server that “does not work” should prompt “can I send e-mail to it?” and “what happens to that sent e-mail?” questions easily answered by firing off test messages. Test messages can also narrow the scope of a problem. If a remote server across a Wide Area Network (WAN) has problems receiving e-mail from a server of a different type, a quick way to rule out the WAN would be to test the same set of SMTP software over a Local Area Network (LAN). If the LAN message also fails, the two servers are likely incompatible. If not, what is wrong with the WAN? Is there a firewall that mangles the message? Is the link too slow, or corrupting traffic? Something else?

Also, a mail administrator should also understand the big picture at a site:

  • Does the site have any e-mail infrastructure? If not, what do they need?
  • What MTA does the site use?
  • If more than one MTA, how do they interact? Were any special settings made to support this interaction?
  • How does mail route? Is the system centralized, or decentralized?
  • How many different e-mail workflows are there? This would include both user e-mail, and any newsletter or mailing list type systems. Do different departments use different e-mail systems?
  • Where does bounce e-mail arrive? How is it handled? How does this e-mail feed into different departments that need to handle bounces?

Subsequent articles will cover debugging methods and big picture thoughts in more detail.

Technorati Tags: , ,