Possible job interview question:
“Write a program to match the first unique character in a string.”
First, clarify what character means, as a C program without libraries will act differently on Unicode data than other languages might. One solution, using Perl:
#!/usr/bin/perl
#
# Returns first unique character in a string passed as argument.
use warnings;
use strict;
die "Usage: $0 the string\n" if not @ARGV;
my @characters = split //, "@ARGV";
my %char_count;
for my $char (@characters) {
$char_count{$char}++;
}
for my $char (@characters) {
if ($char_count{$char} == 1) {
print $char, "\n";
last;
}
}
Note the use of warnings, strict, a script usage message, and the quick summary at the top of the script. These enable better code checks, and help people unfamiler with the script or code learn without wasting time looking through the code. Scripts without these elements should be fixed before production use. Avoid bad code from the start by creating a template for new scripts. This template should include standard option handling, leading documentation and license information, and for longer scripts, a perldoc section.
Another challenge: only use a Perl regular expression instead of the data structures shown above.
Technorati Tags: coding, Perl
Challenge: write a Perl regular expression that matchs the first unique character in a string. An initial attempt matched a character not followed by itself:
$ echo abc | perl -nle 'print $1 if /(.)(?!.*\1)/'
a
However, this expression fails on the string aab. The expression treats the second a as unique, lacking knowledge of the preceding a:
$ echo aab | perl -nle 'print $1 if /(.)(?!.*\1)/'
a
This leads to a testing problem: how to identify a successful expression apart from the (many) buggy ones? One solution: test driven development with the Test::More module. Building Testing Libraries discusses testing in more detail. A sample test script:
#!/usr/bin/perl
# Test script: regular expression to match first unique character
# in a string.
use warnings;
use strict;
use Test::More qw(no_plan);
my $regex = qr{ (.) (?! .* \1) }x;
while ( my $line = ) {
chomp $line;
my ( $string, $expected ) = split ' ', $line, 2;
$expected ||= q{};
cmp_ok(
( $string =~ m/$regex/ )[0], 'eq', $expected,
"$string:$expected"
);
}
# test string, space, expected character (omit if no match)
__DATA__
a a
ab a
aab b
abab
Invalid expressions now quickly show errors:
$ ./regex-test
ok 1 - a:a
ok 2 - ab:a
not ok 3 - aab:b
# Failed test (./regex-test at line 15)
# got: 'a'
# expected: 'b'
not ok 4 - abab:
# Failed test (./regex-test at line 15)
# got: 'a'
# expected: ''
1..4
# Looks like you failed 2 tests of 4.
Now other regex can be experimented with. However, I have not yet found a regular expression (excepting, possibly, with (?{code})) that returns the first unique character of a string.