Main

May 02, 2008

Big Blobs

Perl coding can evolve towards the use of a Big Blob—a large structure of deeply nested data—once perldsc and perllol are mastered. That is, input from various sources is assembled into the Big Blob, any required munging performed, and the data structure iterated over to emit some sort of output or change. This method does work, though suffers from a number of avoidable flaws.

if ( $line =~ m/^ { $/x ) { $rule_target[-1]->[-1]->{_subrules} = []; push @rule_target, $rule_target[-1]->[-1]->{_subrules}; }

First, consider instead providing an Object Oriented interface, thus hiding the Big Blob. However, this may defeat a “well, I’m just trying to mangle X into Y, not waste time with class struggles” coding effort. Whether OO makes sense depends on the project. A standalone data conversion script probably does not justify OO. Code that other code will use, or a service interface, especially one used by other groups or users, will likely benefit from OO.

Secondly, Big Blobs could be a solution looking for a problem. The coder knows how to parse data into the blob, then iterate over the mess, but never considers whether a blob should have been used.

my %big_blob = load_from_file($filename); upload_to_database(\%big_blob);

In many cases, the entirety of data need not be loaded into memory, and instead only the minimum necessary data retained in memory before acting on it:

while <$fh> { my %line_data; # ... parse line into line_data hash # upload line_data contents to database $db->... } continue { if ($. % 1000) { $db->commit(); } }

This method scales better, as it no longer is bound by memory, and will not require DB_File or refactoring should the data set exceed available memory. The question: “do I really need to parse all the data to memory, or is there a more efficient solution?” will help prevent inappropriate use of Big Blobs.

Technorati Tags:

January 18, 2008

Many Small Errors

Even after being warned of the various drawbacks of glob, a #perl user decided to use a sloppy glob call to count the number of files in a directory. Example code:

#!/usr/bin/perl -wl use strict; my $count = () = glob("/the/dir/* /the/dir/.*"); # Maybe need -2 to remove the Unix standard . and .. (and # then only on Unix! How do you know that?) print $count;

The reasons we warned against this quick and dirty code include:

  • Perl glob performs lstat - even if you are only counting files, Perl is also running an lstat call against each and every file. Why even pick a solution that does this?

  • The glob may not return the correct count, should something in the file path conflict with the special rules of glob. Spaces in the directory name is one way to trigger this. There might be more. Would you code with glob if it might randomly fail, especially when a solution without the wacky edge conditions exists?

The proper solution—opendir, readdir loop, and a counter—perhaps was “too long” or “not one line”, so disfavored by the user. This is bad laziness: a count_files_in_dir subroutine can contain the actual code, and be called from elsewhere. A subroutine also allows unit tests, unlike a single random line of code mixed into a larger block. Finally, the glob solution the user picked has holes in it—“oh, they would never happen”—sure they wouldn’t. For a while, and then people forget about the limited, buggy code, copy it somewhere else, and then bam! Bug.

Many small edge conditions and inefficiencies needlessly count against performance, and worse, increase the odds that some condition will tickle a bug, possibly causing a massive outage. I’ve seen a single misplaced $ in a shell script cause a multi-million dollar outage. Granted, there were also no unit tests and other best practices, but the point remains: avoid the small errors that could blow up if a condition changes, especially when other solutions abound.

A proper solution using opendir and readdir follows.

#!/usr/bin/perl -w use strict; my $directory = shift || die "Usage: $0 directory\n"; print count_files_in_dir($directory); sub count_files_in_dir { my $directory = shift; opendir( DIR, $directory ) || die "cannot open $directory: $!\n"; my $count = 0; # grep out the . and .. files on Unix here, # if necessary $count++ while readdir(DIR); return $count; }

Technorati Tags: ,

January 09, 2008

amazon-util - thin wrapper around Net::Amazon

amazon-util wraps the search() method of Net::Amazon, and templates the output. This allows book searches to quickly be turned into product links:

$ amazon-util mode books power 'title: Amazon Hacks' <a href=http://www.amazon.com/o/ASIN/0596005423…

And the results quickly included in HTML, especially with the help of pbcopy(1) on Mac OS X. Example product link: Amazon Hacks.

Technorati Tags:

December 18, 2007

++$Perl == 20

Onwards

As reported on use Perl; and Perl Buzz. For details on Perl releases, see perlhist. And, as a reminder, our $Perl = 19; $Perl++ == 20 would fail, as the increment would apply after—not before—the equality test.

This space intentionally left blank.

November 09, 2007

Even Another Way

The & operator provides another way to check whether a number is even or odd, for example when interleaving new data following each even line number:

$ (echo 1; echo 2; echo 3; echo 4) \
| perl -nle 'print; print "stuff" unless $. & 1' 1 2 stuff 3 4 stuff $ (echo 1; echo 2; echo 3; echo 4) \
| perl -nle 'print; print "stuff" unless $. % 2' 1 2 stuff 3 4 stuff

That is all.

October 20, 2007

Data Cleanup Patterns

When data is being made ready for some other purpose, and the source data is messy or requires significant cleanup, there are several ways to approach the problem. Example data (fairly clean, as these things go):

| 10957 | Aardvark | | 3079 | Badger | | 10696 | Bobolink | | 1030 | Capybara | | 2659 | Cuckoo | | 10305 | Dodo | | 21473 | Emu | | 6603 | Octopus | | 14042 | Rook |

A requirement to produce comma separated data could be met with several different approaches. The first method trims away the dressing around the data: evict the spaces and vertical bars, remove the blank lines, add a comma between the number and word. The second method ignores these dressings, and extracts the data itself: match the ID number, and the word, then print these with a comma between them.

Technorati Tags: ,

Continue reading "Data Cleanup Patterns" »

October 19, 2007

Premature Optimization is 97% Evil

There is no doubt that the grail of efficiency leads to abuse. Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.1

Read the article by Knuth for more context and elaboration. In Perl, a common mistake is to suffix regular expressions with the /o flag. I can personally attest to this being a bad idea: as Knuth predicted, a bug later emerged due to /o being set in one of my scripts. Instead, use qr// to create the expression, and later m/$re/.

# Good to define these early on, # not buried deep in the code. # Use a better variable name. my $re = qr{e/ge}x; … if ($string =~ m/$re/) { …

Inexperienced coders should not fret long over optimizations. First, hammer out something that works. Then have someone review it, or revisit the code some time later, asking:

  • Where does a profiler show time being wasted?
  • Can anything make the expensive spots less so?
  • Would caching of some sort help?
  • Does any of the code seem strange or awkward?
  • Could a comment help explain the wacky bits?
  • Is there any other way to write the code?
  • Can the code be reused? Scale to new uses or more input or different output formats?
  • In the future, would you code it differently?
  • How long would it take to fix any problems? (Managers love time estimates.)

1 Knuth, Donald E. "Structured Programming with go to Statements." Computing Surveys Vol. 6 No. 4, December 1974: 268.

September 10, 2007

Random Perl Links

Technorati Tags:

August 28, 2007

Delete Element from Array

A common Perl programming question is “how can I remove one or more elements from an array?” Some approaches to this problem have unexpected side-effects best revealed by Data::Dumper and minimal test data.

#!/usr/bin/perl -w use strict; use Data::Dumper; my @a = qw(a b c);

The following methods may or may not work correctly:

Technorati Tags: ,

Continue reading "Delete Element from Array" »

July 26, 2007

Shell Escapes in Perl Example

Example why Shell Escapes in Perl are usually a mistake. Problems with the PATH and similar portability problems will not be discussed.

`find $dir $prune -o -type f -print0 \ | xargs -0 grep @Args \ | grep -v '^[ ]*--'`

No idea what the user on #perl was trying to accomplish, claimed their use case require this horrid escape. Problems include the need to run quotemeta on the interpolated variables, ensuring $prune is not empty (otherwise find(1) blows up). No idea why @Args is an array, as $" can influence how an array will interpolate, which would change the grep(1) results unexpectedly. And, more than one element would break grep

Error handling and debugging also problematic: if nothing returned, is that expected, or due to a bug somewhere in the pipe chain ($dir not a directory or permissions problem; $prune not set; grep(1) passed regular expression with a typo; other)? Where is standard error going, in the event an exit code is not zero?

Better code would use the File::Find or similar module, read through the files, skip unwanted lines, and return the matches. More lines than the shell code takes to write, but easier to debug, and far more portable.

Technorati Tags:

July 08, 2007

Shell Escapes in Perl

Apparently, programmers still write “shell scripts inside Perl”, using backticks `` and system, often where a pure Perl solution could replace the shell calls. These programmers seem ignorant of the portability, security, and maintainability problems of shell code:

  • Portability

    Shell that uses find(1) with options appropriate to only one flavor of find(1) will fail if moved to a new system. Or, if the new system does have a compatible version of find(1) in a different path, errors may result if the PATH environment variable changes (a maintainability problem). Many other commands suffer from portability problems: consult Portable Shell Programming for tips on how to mitigate these issues.

  • Security

    Handling backticks in Perl covers the security pitfalls of shell escapes in more detail.

  • Maintainability

    Shell escapes mix a new language into Perl code, complicating syntax checking: the Perl may check out while the shell escapes still contain bugs. These would be time consuming and difficult to write unit tests for, and troublesome to debug if portability problems emerge.

    Quoting is another problem with shell escapes, especially if multiple levels of shell commands are executed. Maintaining the correct set of quotes, backslashes, and quotemeta calls is again time consuming and difficult.

Technorati Tags:

June 09, 2007

map for x

In Perl, the map function can simplify the code required to SQL quote a string:

#!/usr/bin/perl -w use strict; #use Data::Dumper; my @list = ( qw{abe baker}, "can't" ); my $result = sqlify_with_map( \@list ); #print Dumper \@list; print $result, "\n"; $result = sqlify_with_for( \@list ); #print Dumper \@list; print $result, "\n"; sub sqlify_with_for { my $list_ref = shift; my $result; for my $i ( 0 .. $#$list_ref ) { my $item = $list_ref->[$i]; $item =~ s/'/\\'/g; $result .= "'$item'"; if ( $i != $#$list_ref ) { $result .= ','; } } return $result; } sub sqlify_with_map { my $list_ref = shift; return join ',', map { my $item = $_; $item =~ s/'/\\'/g; "'$item'"; } @$list_ref; }

The $item variable in both cases avoids modification of the original list elements. The map allows the escaped and quoted items to be passed as a list to join, while the for loop requires more code to decide when not to add a joining comma (leading to many false checks in a loop, which is usually not a Good Thing). Another option: always add a comma, then strip off the final comma after the for loop. But why bother, with map available?

Technorati Tags: ,

Continue reading "map for x" »

May 26, 2007

Punish Typos!

AUTOLOAD { my @targets = grep /^[\w:]+$/, keys %::; goto &{$targets[rand @targets]}; }

Handy Perl to punish those who typo subroutine names with random, goto infested behavior. With a fuzzy string module, you could even make educated guesses at what the user wanted. Another option: profile the code, and insert interactive recommendations based on subroutine popularity!

To quiz yourself, try How's your Perl?

Technorati Tags:

April 26, 2007

Debugging CPAN Build Problems

Debugging CPAN Build Problems expanded from the parent Life with CPAN page to better cover common problems encountered when building Perl modules.

On a related note, introducing captive user interfaces into what otherwise would be an automated build process hurts, especially when the prompt routine does not show the question, or indicate why the test process now mysteriously hangs. Instead of relying on fragile interactivity, require environment variables or a configuration file, and bail out with a clear error message if anything required is missing.

Technorati Tags:

April 22, 2007

strftime++

The strftime(3) call should replace convoluted code required after gmtime(3) or localtime(3) calls. For comparison, using Perl, two subroutines that both produce a ISO 8601 calendar date such as 2007-04-14, based on the current time:

sub date_from_gmtime { my ( $mday, $mon, $year ) = (gmtime)[ 3 .. 5 ]; $year += 1900; $mon += 1; return sprintf( '%4d-%02d-%02d', $year, $mon, $mday ); } sub date_from_strftime { use POSIX qw(strftime); return strftime( '%Y-%m-%d', gmtime ); }

strftime requires fewer lines of code, requires no local array slicing and number fiddling, and easily supports different date templates.

Technorati Tags: ,

April 16, 2007

Incident Handling

Software applications issue logs, allowing log scanning software to detect and act on these events. For example, sec.pl can detect a disk full log message, and cut a trouble ticket. This article considers a different approach, one that does not rely on log scanners. The method best suits applications with low incident counts, those where new incidents appear on a weekly or longer basis.

Method Overview

Application software, upon detecting a fault, writes an *.error file into an incident directory. These files contain logs and other data concerning the fault. Monitoring software periodically checks the incident directory, and cuts a ticket if at least one error file exists. An operator investigates the issue, and after resolving the problem, moves the incident file aside.

Advantages include simplified monitoring software: alert if an incident file exists, rather than continuously scanning a logfile for rare events. By including relevant data in the error file, the operator need not delve through gigabytes of logs during the investigation. Monitoring software could also submit the entire error file as part of a ticket auto-cut, instead of simply alerting on the presence of error files.

Disadvantages include an operator mistakenly moving aside unresolved incident files, or where a major problem creates a flood of files, well beyond the low numbers this method assumes. I have not yet used this method in practice, so other disadvantages may exist.

Technorati Tags: ,

Continue reading "Incident Handling" »

April 02, 2007

Perl Crossword Puzzle

Dislike newspaper crossword puzzles, as they reference things I know very little to nothing about. Hence, a Perl crossword puzzle!

Technorati Tags: ,

Continue reading "Perl Crossword Puzzle" »

March 15, 2007

Word Boundaries

Dangers of mixing word characters and boundaries with grammars such as English:

$ perl -le '$_="Can\047t touch this"; print for /\b(\w+)\b/g' Can t touch this

\w can be expanded to [\w\047], or perhaps words where the single quote is somewhere in the middle of the word, plus handling of 'tis and other oddities, if necessary. Caveat Scriptor

On a related note, Link Grammar is worth experimentation with phrases such as “time flies like an arrow”.

Technorati Tags:

March 10, 2007

Constant Dangers

Perl Best Practices (p. 56-57) warns against use constant, and advises using the Readonly module if possible. use constant constants cannot be interpolated, nor can they be created at run time:

use constant ( PI => 3 ); print 'PI is ' . PI . "\n"; use Readonly; Readonly my $PI => atan2( 0, -1 ); print "PI is $PI\n";

Additional dangers of use constant involve open calls that mistakenly pick a file handle name that conflicts with a constant:

#!/usr/bin/perl -w use strict; use constant ( PASSWD_FILE => '/etc/passwd' ); my $tmp_passwd_filename = shift || die "Usage: $0 filename\n"; # … and much later, a nasty bug results open PASSWD_FILE, '<', $tmp_passwd_filename or die "error: could not open: file=" . $tmp_passwd_filename . "\n";

Instead, always use variables to hold file handles, and avoid the risk of a constant name conflicting with a file handle. This method also avoids conflicts between subroutines imported from other modules and constants:

open my $passwd_fh, '<', $tmp_passwd_filename;

Technorati Tags: ,

February 23, 2007

Perl Module Use

Avoid plain use Module::Name in favor of either an empty import list, or a specific list of items to trample on the local namespace with:

use CGI; # expensive! use CGI (); # much better use POSIX; # slow and much trampling use POSIX qw(strftime); # just this subroutine

Limiting module imports saves time, and lessens the risk that two functions with the same name will be used. A quick review of the use statements will generally reveal exactly what has been imported, such as strftime in the example above. Should a new module version export more by default, the explicit import list will avoid any namespace changes to scripts.

Technorati Tags:

Continue reading "Perl Module Use" »

December 20, 2006

Optional Perl Module Loading

Since the following information is not spelled out between the use, require, and import documentation. The following example only loads the CGI module if it is available, then pollutes the current namespace with header. Somewhat equivalent to use CGI qw/header/, but different.

#!/usr/bin/perl -w use strict; eval { require CGI; }; if (! $@) { CGI->import( qw/header/ ); } print header();

The above should always work, as CGI has shipped with Perl for quite some time. Use the same pattern for other modules:

#!/usr/bin/perl eval { require Win32; }; if (! $@) { Win32->import( qw/SW_SHOWNORMAL/ ); }

If possible, skip the import, and use object oriented access methods to minimize namespace pollution. To see how use differs from require, try placing the use statements at the end of a file, or inside an END block.

#!/usr/bin/perl print header(); END { use CGI qw(header) }

Technorati Tags:

December 17, 2006

Perl glob performs lstat

Avoid the quick glob function if no lstat related data will be used, as glob performs an lstat on each matching file. Instead, use opendir and readdir to efficiently work through many filenames, or a module such as File::Find to recurse through a directory tree.

Technorati Tags: ,

Continue reading "Perl glob performs lstat" »

November 24, 2006

Unix Epoch Calculations

Coding bugs in Unix epoch time calculations may be avoided using the graphing method outlined here. Reasoning, however logical and detailed, does not work for me: I need to reference the visual diagram before trusting the resulting code. But first, a diversion down memory lane.

HISTORY
  A time() function appeared in Version 2 AT&T UNIX 
  and used to return time in sixtieths of a second in
  32 bits, which was to guarantee a crisis every 2.26
  years. Since the Version 6 AT&T UNIX time() scale
  was changed to seconds extending the pre-crisis
  stagnation period up to a total of 68 years.

Source: OpenBSD time(3) Documentation.

Technorati Tags:

Continue reading "Unix Epoch Calculations" »

November 19, 2006

Perl Dæmons

Wrote Running Perl Daemon Processes since perl daemon searches all somehow miss the perlipc documentation. If writing a service for Unix, ensure it runs properly in the background. Notably, many Java programs fail to daemonize, perhaps due to the platform specific nature of daemonization. Other random links:

Technorati Tags: ,

November 18, 2006

Programming Recommended Reading

Great programming Recommended Reading links put together by Limbic~Region.

Technorati Tags:

November 16, 2006

Loop without Loop

Today’s installment of execrable Perl inspired by inexperienced coders asking koan-esque questions “how do I loop over something, but without looping over it?” Actual user question in this case turned out to be $ref->foo->foo->… iteration, which would require a slightly different recursive call. But this let me fiddle with &… prototypes to good effect.

#!/usr/bin/perl -w # Abuse recursion and so forth to # "loop without looping". use strict; # So don't need () and can specify # a sub-less sub. sub loop (&$;$); my @list = qw(a b c d); # No way known to specify @list, then # use reference in subsequent calls loop { print } \@list; sub loop (&$;$) { # Use @_ directly instead of named # variables as prototype disallows # repassing code ref (insists # on a block...) local $_ = $_[1]->[ $_[2]++ ]; eval { &{ $_[0] } }; if ($@) { # ed(1) style error messages :) die "!\n"; } $_[2] > $#{ $_[1] } || &loop; }

November 09, 2006

Chess960 position generator

Wrote chess960open to generate Chess960 opening positions for White. Here’s one possibility:

Bishop Rook King Rook Knight Queen Knight Bishop

Technorati Tags:

November 06, 2006

Sending E-mail with Perl

Since so many coders invoke sendmail directly, never the wiser: sending e-mail with Perl.

Technorati Tags:

November 01, 2006

convert-date - match dates and perform TZ conversions

Quick Perl script to match dates in input data by regex, and convert between the specified time zones.

#!/usr/bin/perl -w # # Usage: convert-date < input > output use strict; use Date::Manip qw(UnixDate ParseDate Date_ConvTZ); # should be arguments my $from_tz = 'UTC'; my $to_tz = 'PDT'; my $output_format = '%Y-%m-%d %H:%M:%S'; while (<>) { # How to match the date in the input data. May # need to munge to something Date::Manip can # grok, or add HH:MM:SS, depending. s/ (\d{4}-\d\d-\d\d [T\s] \d\d:\d\d:\d\d) /fix_date($1)/ex; print; } sub fix_date { my $input = shift; my $date_in = ParseDate($input); warn "$0: error: could not parse: date=$input\n" if length $date_in < 1; my $date_out = Date_ConvTZ( $date_in, $from_tz, $to_tz ); warn "$0: error: could not convert: " . "date=$input, from=$from_tz, to=$to_tz\n" if length $date_out < 1; my $output = UnixDate( $date_out, $output_format ); return $output; }

Technorati Tags: ,

October 27, 2006

Minimal Perl Review

Reviewed Minimal Perl: For UNIX and Linux People on Amazon. Learned new commands, such as the often overlooked nl(1), and uses for sed(1) and awk(1) had never learned since I started out with Perl.

On a related note, Tim Maher is presenting at Seattle Area System Administrators Guild November 9th.

Technorati Tags: , ,

October 26, 2006

Evil Perl Function Calling

Evil Perl, not for use anywhere near production code. Means to bypass prototype (and strict) restrictions in Perl.

#!/usr/bin/perl -wl use strict; sub noproto { print "Got: @_"; } sub proto($$) { print "Got: @_"; } # fails, as expected #proto(42); { local @_ = 42; &proto; } # these work &proto(42); &proto(); # not happy #&proto 42; # this bypasses 'strict' and prototypes, but adds # 'main' to @_ my $be_evil = 'proto'; main->$be_evil(42); # there we go... __PACKAGE__->can($be_evil)->(42);

Technorati Tags:

October 15, 2006

Improve Log Messages

Many programs either contain poor or nonexistent logging, or log so much that any useful messages drown in the noise. This post concerns improving logs generated by scripts running on Unix, which usually suffer from the poor or nonexistent logging. Witness the succinct ! log message from the original ed(1) in contrast to Java stack trace barfs. Middle ground for usable logs can be found.

Technorati Tags: , ,

Continue reading "Improve Log Messages" »

October 08, 2006

Perl function calling conventions

Great summary of function calling conventions for Perl.

# disfavored for some reason! push @_, 42 && &churn_away;

Technorati Tags:

September 29, 2006

RSA data length limits

The length of a RSA signature varies in direct proportion to the RSA key size, not the amount of data encrypted. The Perl script below demonstrates the length of signatures for several RSA key sizes. Also, larger keys allow more data to be encrypted with RSA, minus overhead for various encoding and security measures. Large amounts of data should be encrypted using a symmetric cipher, and the private key for this cipher encrypted via RSA.

#!/usr/bin/perl -wl use strict; use Crypt::OpenSSL::Random; use Crypt::OpenSSL::RSA; Crypt::OpenSSL::Random::random_status() or die "single and thine image dies with thee\n"; my $string = 'foo'; KEYSIZE: for my $ks (qw{512 1024 2048}) { my $pk = Crypt::OpenSSL::RSA->generate_key($ks); my $sig = $pk->sign($string); print $ks, ' -> ', length $sig; } __DATA__ 512 -> 64 1024 -> 128 2048 -> 256

Technorati Tags: , ,

Continue reading "RSA data length limits" »

September 24, 2006

Path Parser and Permissions Previewer Utility for Unix

Use parsepath to report Unix directory paths, or check whether a user or group has permissions to access named files. Great to quickly check CGI permissions where some parent directory sets the wrong permissions.

$ parsepath % /Users/jmates d 1775 root:admin / d 1775 root:admin /Users d 0755 jmates:jmates /Users/jmates $ parsepath +w /var/tmp $ parsepath +w /etc/passwd ! unix-other +w fails: f 0644 root:wheel /etc/passwd

Script originally filed under my debugging Unix pages.

Technorati Tags: ,

September 17, 2006

Tell and Seek

Test Perl code illustrating how to read a file from the position last read to. Handy for log processing agents run multiple times on a growing file, where repeated scans would otherwise duplicate previous matches. Re-reads entire file if last position past end of current file contents.

#!/usr/bin/perl use Fatal qw(open); # filename => last read offset my %file_position_stash = ( test => 5 ); my $file = shift || die "Usage: $0 filename\n"; open my $fh, '<', $file; # Try to resume where left off if ( exists $file_position_stash{$file} ) { seek $fh, $file_position_stash{$file}, 0 or warn "whoa: $!\n"; # If at end of file already, file truncated # since last read? Start from beginning, unless # file same size as last read position. if ( eof $fh and $file_position_stash{$file} != -s $fh ) { seek $fh, 0, 0; } } while (<$fh>) { print; } # Save where read to $file_position_stash{$file} = tell $fh or warn "whoa: $!\n"; use Data::Dumper; warn Dumper \%file_position_stash;

If possible, avoid copying and truncating log files. Instead, use software such as httplog to direct logs into files by date-based patterns.

Technorati Tags: ,

September 08, 2006

rename - mangle filenames using Perl expressions

rename - my enhancement of the original rename script by Larry Wall. Adds preview and copy support, plus documentation with examples. Assumes working knowledge of Perl. Did I mention the preview support?

Warning! Certain vendors install a useless but conflicting rename command under /usr/bin.

Technorati Tags:

September 01, 2006

Key and Certificate Conversion

Use convert2der to convert TLS key and certificate files from PEM to DER format and back again.

$ convert2der *.prv *.crt $ convert2der --inform=DER --outform=PEM *.prv *.crt

Great when a vendor tool only supports the DER format, but other tools and vendors generate PEM by default. Convenient wrapper around the openssl rsa(1) and x509(1) subcommands.

Technorati Tags:

August 20, 2006

Customize @INC via PERL5LIB

Using CPAN with a non-root account details customizing ~/.cpan/CPAN/MyConfig.pm and the Unix shell to use a custom directory for perl modules. While the following customize the PERL5LIB and MANPATH environment variables, editing the shell code is difficult:

if [ -d $HOME/lib/perl5 ]; then PERL5LIB=${PERL5LIB:+$PERL5LIB:}$HOME/lib/perl5 fi MANPATH=${MANPATH:+$MANPATH:}$HOME/share/man export MANPATH PERL5LIB

I currently use the following, which allows multiple directories to be listed, and the ordering of the directories changed easily in a text editor:

while read pf; do PERL5LIB=${PERL5LIB:+$PERL5LIB:}$pf done << EOPERL5LIB /sw/lib/perl5 $HOME/lib/perl5 EOPERL5LIB typeset -U PERL5LIB export PERL5LIB while read pf; do if [ -d $pf ]; then MANPATH=${MANPATH:+$MANPATH:}$pf fi done << EOMANPATH /sw/share/man /sw/man /usr/share/man /usr/X11R6/man /usr/local/share/man /usr/local/man /usr/local/pgsql/man $HOME/share/man EOMANPATH typeset -U MANPATH export MANPATH

This method works for all colon delimited environment variables, such as LD_LIBRARY_PATH, CLASSPATH, and others. Duplicate supression could be added, though would require additional code.

An even better option may be to template the shell configuration files, depending on the system. This way, complex source files would be rendered into minimal shell configuration files that need not perform any number of calculations for each new shell.

Technorati Tags: ,

August 17, 2006

Handy perl Functions for ZSH

The following ZSH functions, once added to ~/.zshrc and loaded, allow convenient lookup of Perl module versions and locations.

function pm-version { perl -M$1 -le "print \$$1::VERSION" } function pm-path { perl -l -M$1 \ -e "(\$mp=q{$1})=~s{::}{/}g;\$mp.=q{.pm};" \ -e "print \$INC{\$mp}" }

For example:

$ pm-version Text::Template 1.44 $ pm-path Text::Template /home/jmates/lib/perl5/Text/Template.pm

This leads to quick perldoc(1) lookups or viewing of module source code:

$ pm-path Text::Template /home/jmates/lib/perl5/Text/Template.pm $ perldoc `!!` … $ less !$

More perl tricks available in Perl One Liners.

Technorati Tags: ,

July 28, 2006

Regular Expressions poorly match Internet Addresses

Internet addresses (IP) lead to highly complex regular expressions (regex) that attempt to match only valid addresses. Regex deal poorly with number ranges, and must account for optional portions of the IP address. For example, \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3} matches the invalid 999.999.999.999 string, and fails to match the valid IP address of 127.1:

$ ping -c 1 127.1 PING 127.1 (127.0.0.1): 56 data bytes 64 bytes from 127.0.0.1: icmp_seq=0 ttl=255 time=0.062 ms --- 127.1 ping statistics --- 1 packets transmitted, 1 packets received, 0.0% packet loss round-trip min/avg/max/std-dev = 0.062/0.062/0.062/0.000 ms

Instead, use a well tested and community supported regex from Regexp::Common or similar module. Another option: perform a loose match, then feed the results through the inet_aton and inet_ntoa functions:

$ perl -MSocket -le 'print inet_ntoa inet_aton shift' \ 127.1 127.0.0.1 $ perl -MSocket -le 'print inet_ntoa inet_aton shift' \ 999.999.999.999 Bad arg length for Socket::inet_ntoa, length is 0, should be 4 at -e line 1.

These functions also provide quick hostname resolution:

$ perl -MSocket -le 'print inet_ntoa inet_aton shift' \ sial.org 69.90.43.86

Matches may also fail if IPv4 mapped addresses appear, for example when performing OpenSSH security checks.

Technorati Tags: ,

July 25, 2006

One True Language

As seen in Mordor: the One True Coding Language! Spend a few years on an IRC channel, and experience the stream of “what’s the best language to code with?” questions and resulting hilarity. Ugh.

  • Looking for a job? Pick a language used in the field or company that interests you.
  • Unix systems administration? Learn shell scripting, and try out Perl, Python, or Ruby, and see what you still use five years from now (if anything).
  • Experimenting? Pick something at random. Or at least from a different family of programming: functional versus object oriented and so forth.
  • This language sucks! Patches welcome, or see above.

For the record, I use Perl, plus a dash of Unix shell scripting. Perl suits me perfectly: great for text processing, systems administration, and other random tasks. Huge amounts of code available in CPAN. No need to learn another language for my job (gainful computer entropy reduction), and other projects more important right now.

Next week: the One True Filesystem Layout!

Technorati Tags: ,

July 09, 2006

Match the first unique character in a string

Possible job interview question:

“Write a program to match the first unique character in a string.”

First, clarify what character means, as a C program without libraries will act differently on Unicode data than other languages might. One solution, using Perl:

#!/usr/bin/perl # # Returns first unique character in a string passed as argument. use warnings; use strict; die "Usage: $0 the string\n" if not @ARGV; my @characters = split //, "@ARGV"; my %char_count; for my $char (@characters) { $char_count{$char}++; } for my $char (@characters) { if ($char_count{$char} == 1) { print $char, "\n"; last; } }

Note the use of warnings, strict, a script usage message, and the quick summary at the top of the script. These enable better code checks, and help people unfamiler with the script or code learn without wasting time looking through the code. Scripts without these elements should be fixed before production use. Avoid bad code from the start by creating a template for new scripts. This template should include standard option handling, leading documentation and license information, and for longer scripts, a perldoc section.

Another challenge: only use a Perl regular expression instead of the data structures shown above.

Technorati Tags: ,

Continue reading "Match the first unique character in a string" »