Wed 17 Jun 2020
Tags: perl, golang
I started using perl back in 1996, around version 5.003, while working at UBC
in Vancover. We were working on a student management system at the time,
written in C, interfacing to an Oracle database. We started experimenting with
this Common Gateway Interface thing (CGI) that had recently appeared, and let
you write interactive applications on the web (!). perl was the tool of
choice for CGI, and we were able to talk to Oracle using a perl module that
spoke Oracle Call Interface (OCI).
That project turned out to be pretty successful, and I've been developing in
perl ever since. Perl has a reputation for being obscure and abstruse, but I
find it a lovely language - powerful and expressive. Yes it's probably too
easy to write bad/unreadable perl, but it's also straightforward to write
elegant, readable perl. I routinely pick up code I wrote 5 years ago and have
no problems reading it again.
That said, perl is showing its age in places, and over the last few years
I've also been using other languages where different project requirements
made that make sense. I've written significant code in C, Java, python,
ruby, and javascript/nodejs, but for me none of them were sufficiently
attractive to threaten perl as my language of choice.
About a year ago I started playing with Go at $dayjob, partly interested
in the performance gains of a compiled language, and partly to try out the
concurrency features I'd heard about. Learning a new language is always
challenging, but Go's small footprint, consistency, and well-written
libraries really made picking it up pretty straightforward.
And for me the killer feature is this: Go is the only language besides
perl in which I regularly find myself writing a good chunk of code, getting
it syntactically correct, and then testing it and finding that it Just
Works, first time. The friction between thinking and coding seems low enough
(at least the way my brain works) that I can formulate what I'm thinking
with a pretty high chance of getting it right. I still get surprised when it
happens, but it's great when it does!
Other things help too - it's crazy fast, especially coming from mostly
interpreted languages recently; the concurrency stuff really is nice, and
let's you think about concurrent flows pretty intuitively; and lots of the
the language decisions like formatting, tooling, and composition just seem
to sit pretty well with me.
So while I'm still very happy writing perl, especially for less
performance-intensive applications, I'm now a happy little Go developer
as well, and enjoying exploring some more advanced corners of a new
language home.
Yay Go!
If you're interested in learning Go, the online docs are pretty great: start
with the Tour of Go and
How to write Go code, and then read
Effective Go.
If you're after a book-length treatment, the standard text is
The Go Programming Language by Donovan and Kernighan.
It's excellent but pretty dense, more textbook than tutorial.
I've also read Go in Practice,
which is more accessible and cookbook-style. I thought it was okay, and learnt
a few things, but I wouldn't go out of your way for it.
Sat 20 Sep 2014
Tags: perl, parallel
Did a talk at the Sydney Perl Mongers group on Tuesday night,
called "Parallelising with Perl", covering AnyEvent, MCE, and
GNU Parallel.
Slides
Tue 02 Sep 2014
Tags: perl, csv, data wrangling
Well past time to get back on the blogging horse.
I'm now working on a big data web mining startup,
and spending an inordinate amount of time buried in large data files, often
some variant of CSV.
My favourite new tool over the last few months is is Karlheinz Zoechling's
App::CCSV
perl module, which lets you do some really powerful CSV processing using
perl one-liners, instead of having to write a trivial/throwaway script.
If you're familiar with perl's standard autosplit functionality (perl -a
)
then App::CCSV will look pretty similar - it autosplits its input into an
array on your CSV delimiters for further processing. It handles
embedded delimiters and CSV quoting conventions correctly, though, which
perl's standard autosplitting doesn't.
App::CCSV uses @f
to hold the autosplit fields, and provides utility
functions csay
and cprint
for doing say
and print
on the CSV-joins
of your array. So for example:
# Print just the first 3 fields of your file
perl -MApp::CCSV -ne 'csay @f[0..2]' < file.csv
# Print only lines where the second field is 'Y' or 'T'
perl -MApp::CCSV -ne 'csay @f if $f[1] =~ /^[YT]$/' < file.csv
# Print the CSV header and all lines where field 3 is negative
perl -MApp::CCSV -ne 'csay @f if $. == 1 || ($f[2]||0) < 0' < file.csv
# Insert a new country code field after the first field
perl -MApp::CCSV -ne '$cc = get_country_code($f[0]); csay $f[0],$cc,@f[1..$#f]' < file.csv
App::CCSV can use a config file to handle different kinds of CSV input.
Here's what I'm using, which lives in my home directory in ~/.CCSVConf
:
<CCSV>
sep_char ,
quote_char """
<names>
<comma>
sep_char ","
quote_char """
</comma>
<tabs>
sep_char " "
quote_char """
</tabs>
<pipe>
sep_char "|"
quote_char """
</pipe>
<commanq>
sep_char ","
quote_char ""
</comma>
<tabsnq>
sep_char " "
quote_char ""
</tabs>
<pipenq>
sep_char "|"
quote_char ""
</pipe>
</names>
</CCSV>
That just defines two sets of names for different kinds of input: comma
,
tabs
, and pipe
for [,\t|]
delimiters with standard CSV quote conventions;
and three nq
("no-quote") variants - commanq
, tabsnq
, and pipenq
- to
handle inputs that aren't using standard CSV quoting. It also makes the comma
behaviour the default.
You use one of the names by specifying it when loading the module, after an =
:
perl -MApp::CCSV=comma ...
perl -MApp::CCSV=tabs ...
perl -MApp::CCSV=pipe ...
You can also convert between formats by specifying two names, in
<input>,<output> format e.g.
perl -MApp::CCSV=comma,pipe ...
perl -MApp::CCSV=tabs,comma ...
perl -MApp::CCSV=pipe,tabs ...
And just to round things off, I have a few aliases defined in my bashrc
file
to make these even easier to use:
alias perlcsv='perl -CSAD -MApp::CCSV'
alias perlpsv='perl -CSAD -MApp::CCSV=pipe'
alias perltsv='perl -CSAD -MApp::CCSV=tabs'
alias perlcsvnq='perl -CSAD -MApp::CCSV=commanq'
alias perlpsvnq='perl -CSAD -MApp::CCSV=pipenq'
alias perltsvnq='perl -CSAD -MApp::CCSV=tabsnq'
That simplifies my standard invocation to something like:
perlcsv -ne 'csay @f[0..2]' < file.csv
Happy data wrangling!
Wed 24 Nov 2010
Tags: perl, parallel, fork
Needed to parallelise some processing in perl the last few days, and
did a quick survey of some of the parallel processing modules on CPAN,
of which there is the normal
bewildering diversity.
As usual, it depends exactly what you're trying to do. In my case I
just needed to be able to fork a bunch of processes off, have them
process some data, and hand the results back to the parent.
So here are my notes on a random selection of the available modules.
The example each time is basically a parallel version of the following
map:
my %out = map { $_ ** 2 } 1 .. 50;
Object oriented wrapper around 'fork'. Supports parent callbacks.
Passing data back to parent uses files, and feels a little bit clunky.
Dependencies: none.
use Parallel::ForkManager 0.7.6;
my @num = 1 .. 50;
my $pm = Parallel::ForkManager->new(5);
my %out;
$pm->run_on_finish(sub { # must be declared before first 'start'
my ($pid, $exit_code, $ident, $exit_signal, $core_dump, $data) = @_;
$out{ $data->[0] } = $data->[1];
});
for my $num (@num) {
$pm->start and next; # Parent nexts
# Child
my $sq = $num ** 2;
$pm->finish(0, [ $num, $sq ]); # Child exits
}
$pm->wait_all_children;
[Version 0.7.9]
Basically a parallel version of 'map'. Dependencies: none.
use Parallel::Iterator qw(iterate);
my @num = 1 .. 50;
my $it = iterate( sub {
# sub is a closure, return outputs
my ($id, $num) = @_;
return $num ** 2;
}, \@num );
my %out = ();
while (my ($num, $square) = $it->()) {
$out{$num} = $square;
}
[Version 1.00]
Provides parallel versions of 'foreach' and 'while'. It uses 'tie' to allow
shared data structures between the parent and children. Dependencies:
Parallel::ForkManager.
use Parallel::Loops;
my @num = 1 .. 50;
my $pl = Parallel::Loops->new(5);
my %out;
$pl->share(\%out);
$pl->foreach( \@num, sub {
my $num = $_; # note this uses $_, not @_
$out{$num} = $num ** 2;
});
You can also return values from the subroutine like Iterator, avoiding
the explicit 'share':
my %out = $pl->foreach( \@num, sub {
my $num = $_; # note this uses $_, not @_
return ( $num, $num ** 2 );
});
[Version 0.03]
Provides an interesting perlish forking interface using blocks. No built-in
support for returning data from children, but provides examples using pipes.
Dependencies: Exporter::Tidy.
use Proc::Fork;
use IO::Pipe;
use Storable qw(freeze thaw);
my @num = 1 .. 50;
my @children;
for my $num (@num) {
my $pipe = IO::Pipe->new;
run_fork{ child {
# Child
$pipe->writer;
print $pipe freeze([ $num, $num ** 2 ]);
exit;
} };
# Parent
$pipe->reader;
push @children, $pipe;
}
my %out;
for my $pipe (@children) {
my $entry = thaw( <$pipe> );
$out{ $entry->[0] } = $entry->[1];
}
[Version 0.71]
Like Parallel::ForkManager, but adds better signal handling. Doesn't
seem to provide built-in support for returning data from children.
Dependencies: Proc::Wait3.
[Version 0.08]
More complex module, loosely based on ForkManager (?). Includes better signal
handling, and supports scheduling and dependencies between different groups
of subprocesses. Doesn't appear to provide built-in support for passing data
back from children.
[Version 1.232]
Wed 10 Jun 2009
Tags: perl, osd, desktop
Here's a quick hack demonstrating a nice juxtaposition between the
power of a CPAN module - in this case Christopher Laco's
Finance::Currency::Convert::WebserviceX
- and the elegance and utility of the little known osd_cat
, putting
together a desktop currency rates widget in a handful of lines:
#!/usr/bin/perl
use strict;
use IO::File;
use Finance::Currency::Convert::WebserviceX;
# Configuration
my @currencies = map { uc } @ARGV || qw(USD GBP);
my $base_currency = 'AUD';
my $refresh = 300; # seconds
my $font = '9x15bold';
# X colours: http://sedition.com/perl/rgb.html
my $colour = 'goldenrod3';
my $align = 'right';
my $pos = 'top';
my $offset = 25;
my $lines = scalar @currencies;
my $osd_refresh = $refresh + 1;
my $osd = IO::File->new(
"|osd_cat -l $lines -d $osd_refresh -c '$colour' -f $font -p $pos -A $align -o $offset"
) or die "can't open to osd_cat $!";
$osd->autoflush(1);
local $SIG{PIPE} = sub { die "pipe failed: $!" };
my $cc = Finance::Currency::Convert::WebserviceX->new;
while (1) {
my $output = '';
$output .= "$_ " . $cc->convert(1, $base_currency, $_) . "\n" for @currencies;
$osd->print($output);
sleep $refresh;
}
Most of this is just housekeeping around splitting out various osd_cat
options for tweaking, and allowing the set of currencies to display
to be passed in as arguments. I haven't bothered setting up any option
handling in order to keep the example short, but that would be
straightforward.
To use, you just run from the command line in the background:
./currency_osd &
and it shows up in the top right corner of your screen, like so:
Tweak to taste, of course.
Sat 28 Mar 2009
Tags: scripting, perl
Was going home on the train with Hannah (8) this afternoon, and she says,
"Dad, what's the longest word you can make without using any letters with
tails or stalks?". "Do you really want to know?", I asked, and whipping
out the trusty laptop, we had an answer within a couple of train stops:
egrep -v '[A-Zbdfghjklpqty]' /usr/share/dict/words | \
perl -nle 'chomp; push @words, $_;
END { @words = sort { length($b) cmp length($a) } @words;
print join "\n", @words[0 .. 9] }'
noncarnivorousness
nonceremoniousness
overcensoriousness
carnivorousnesses
noncensoriousness
nonsuccessiveness
overconsciousness
semiconsciousness
unacrimoniousness
uncarnivorousness
Now I just need to teach her how to do that.
Fri 07 Nov 2008
Tags: perl, css, javascript, minify, yslow, catalyst, mason
I've been playing with the very nice
YSlow firefox plugin recently,
while doing some front-end optimisation on a
Catalyst web project.
Most of YSlow's tuning tips were reasonably straightforward, but I
wasn't sure how to approach the concatenation and minification of
CSS and javascript files that they recommend.
Turns out - as is often the case - there's a very nice packaged
solution on CPAN.
The File::Assets
module provides concatentation and minification for CSS and
Javascript 'assets' for a web page, using the
CSS::Minifier
(::XS) and
JavaScript::Minifier
(::XS)
modules for minification. To use, you add a series of .css and .js
files in building your page, and then 'export' them at the end,
which generates a concatenated and minified version of each type
in an export directory, and an appropriate link to the exported
version. You can do separate exports for CSS and Javascript if you
want to follow the Yahoo/YSlow recommendation of putting your
stylesheets at the top and your scripts at the bottom.
There's also a
Catalyst::Plugin::Assets
module to facilitate using File::Assets from Catalyst.
I use Mason for my Catalyst views (I prefer
using perl in my views rather than having another mini-language to
learn) and so use this as follows.
First, you have to configure Catalyst::Plugin::Assets in your
project config file (e.g. $PROJECT_HOME/project.yml):
Plugin::Assets:
path: /static
output_path: build/
minify: 1
Next, I set the per-page javascript and and css files I want to
include as mason page attributes in my views (using an arrayref
if there's more than one item of the given type) e.g.
%# in my person view
<%attr>
js => [ 'jquery.color.js', 'person.js' ]
css => 'person.css'
</%attr>
Then in my top-level autohandler, I include both global and
per-page assets like this:
<%init>
# Asset collation, javascript (globals, then per-page)
$c->assets->include('js/jquery.min.js');
$c->assets->include('js/global.js');
if (my $js = $m->request_comp->attr_if_exists('js')) {
if (ref $js && ref $js eq 'ARRAY') {
$c->assets->include("js/$_") foreach @$js;
} else {
$c->assets->include("js/$js");
}
}
# The CSS version is left as an exercise for the reader ...
# ...
</%init>
Then, elsewhere in the autohandler, you add an exported link at
the appropriate point in the page:
<% $c->assets->export('text/javascript') %>
This generates a link something like the following (wrapped here):
<script src="http://www.example.com/static/build/assets-ec556d1e.js"
type="text/javascript"></script>
Beautiful, easy, maintainable.
Fri 05 Sep 2008
Tags: perl, catalyst, screen
I'm an old-school developer, doing all my hacking using terms, the command
line, and vim, not a heavyweight IDE. Hacking perl
Catalyst projects (and I imagine other
MVC-type frameworks) can be slightly more challenging in this kind of
environment because of the widely-branching directory structure. A single
conceptual change can easily touch controller classes, model classes, view
templates, and static javascript or css files, for instance.
I've found GNU screen to work really
well in this environment. I use per-project screen sessions set up
specifically for Catalyst - for my 'usercss' project, for instance, I have
a ~/.screenrc-usercss
config that looks like this:
source $HOME/.screenrc
setenv PROJDIR ~/work/usercss
setenv PROJ UserCSS
screen -t home
stuff "cd ~^Mclear^M"
screen -t top
stuff "cd $PROJDIR^Mclear^M"
screen -t lib
stuff "cd $PROJDIR/lib/$PROJ^Mclear^M"
screen -t controller
stuff "cd $PROJDIR/lib/Controller^Mclear^M"
screen -t schema
stuff "cd $PROJDIR/lib/$PROJ/Schema/Result^Mclear^M"
screen -t htdocs
stuff "cd $PROJDIR/root/htdocs^Mclear^M"
screen -t static
stuff "cd $PROJDIR/root/static^Mclear^M"
screen -t sql
stuff "cd $PROJDIR^Mclear^M"
select 0
(the ^M
sequences there are actual Ctrl-M newline characters).
So a:
screen -c ~/.screenrc-usercss
will give me a set of eight labelled screen windows: home, top, lib,
controller, schema, htdocs, static, and sql. I usually run a couple of
these in separate terms, like this:
To make this completely brainless, I also have the following bash function
defined in my ~/.bashrc
file:
sc ()
{
SC_SESSION=$(screen -ls | egrep -e "\.$1.*Detached" | \
awk '{ print $1 }' | head -1);
if [ -n "$SC_SESSION" ]; then
xtitle $1;
screen -R $SC_SESSION;
elif [ -f ~/.screenrc-$1 ]; then
xtitle $1;
screen -S $1 -c ~/.screenrc-$1
else
echo "Unknown session type '$1'!"
fi
}
which lets me just do sc usercss
, which reattaches to the first detached
'usercss' screen session, if one is available, or starts up a new one.
Fast, flexible, lightweight. Choose any 3.
Wed 09 Apr 2008
Tags: web, perl
I've been playing around with SixApart's
TheSchwartz for the last few days.
TheSchwartz is a lightweight reliable job queue, typically used for
handling relatively high latency jobs that you don't want to try and
handle from a web process e.g. for sending out emails, placing orders
into some external system, etc. Basically interacting with anything
which might be down or slow or which you don't really need right away.
Actually, TheSchwartz is a job queue library rather than a job queue
system, so some assembly is required. Like most Danga/SixApart
software, it's lightweight, performant, and well-designed, but also
pretty light on documentation. If you're not comfortable reading the
(perl) source, it might be a challenging environment to setup.
Notes from the last few days:
Don't use the version on CPAN, get the latest code from
subversion
instead. At the moment the CPAN version is 1.04, but current
svn is at 1.07, and has some significant additional
functionality.
Conceptually TheSchwartz is very simple - jobs with opaque
function names and arguments are inserted into a database
for workers with a particular 'ability'; workers periodically
check the database for jobs matching the abilities they have,
and grab and execute them. Jobs that succeed are marked
completed and removed from the queue; jobs that fail are
logged and left on the queue to be retried after some time
period up to a configurable number of retries.
TheSchwartz has two kinds of clients - those that submit
jobs, and workers that perform jobs. Both are considered
clients, which is confusing if you're thinking in terms of
client-server interaction. TheSchwartz considers both
sides to be clients.
There are three main classes to deal with: TheSchwartz
,
which is the main client functionality class;
TheSchwartz::Job
, which models the jobs that are submitted
to the job queue; and TheSchwartz::Worker
, which is a
role-type class modelling a particular ability that a worker
is able to perform.
New worker abilities are defined by subclassing
TheSchwartz::Worker
and defining your new functionality in
a work()
method. work()
receives the job object from the
queue as its only argument and does its stuff, marking the
job as completed or failed after processing. A useful real
example worker is TheSchwartz::Worker::SendEmail
(also by
Brad Fitzpatrick, and available on CPAN) for sending emails from
TheSchwartz.
Depending on your application, it may make sense for workers
to just have a single ability, or for them to have multiple
abilities and service more than one type of job. In the latter
case, TheSchwartz tries to use unused abilities whenever it
can to avoid certain kinds of jobs getting starved.
You can also subclass TheSchwartz
itself to modify the standard
functionality, and I've found that useful where I've wanted more
visibility of what workers are doing that you get out of the box.
You don't appear at this point to be able to subclass
TheSchwartz::Job
however - TheSchwartz always uses this as the
class when autovivifying jobs for workers.
There are a bunch of other features I haven't played with yet,
including job priorities, the ability to coalesce jobs into
groups to be processed together, and the ability to delay jobs
until a certain time.
I've actually been using it to setup a job queue system for a cluster,
which is a slightly different application that it was intended for,
but so far it's been working really well.
I'm still feeling like I'm still getting to grips with the breadth
of things it could be used for though - more experimentation required.
I'd be interested in hearing of examples of what people are using it
for as well.
Recommended.
Thu 27 Mar 2008
Tags: perl, tips
I wasted 15 minutes the other day trying to remember how to do this,
so here it is for the future: to find out if and when a perl module
got added to the core, you want Richard Clamp's excellent
Module::CoreList.
Recent versions have a 'corelist' frontend command, so I typically
use that e.g.
$ corelist File::Basename
File::Basename was first released with perl 5
$ corelist warnings
warnings was first released with perl 5.006
$ corelist /^File::Spec/
File::Spec was first released with perl 5.00405
File::Spec::Cygwin was first released with perl 5.006002
File::Spec::Epoc was first released with perl 5.006001
File::Spec::Functions was first released with perl 5.00504
File::Spec::Mac was first released with perl 5.00405
File::Spec::OS2 was first released with perl 5.00405
File::Spec::Unix was first released with perl 5.00405
File::Spec::VMS was first released with perl 5.00405
File::Spec::Win32 was first released with perl 5.00405
$ corelist URI::Escape
URI::Escape was not in CORE (or so I think)
Mon 17 Mar 2008
Tags: perl
Saw this post fly past in the twitter stream today:
"http://linuxshellaccount.blogspot.com/2008/03/perl-directory-permissions-difference.html".
It's a script by Mike Golvach to do something like a `diff -r`, but also
showing differences in permissions and ownership, rather than just content.
I've written a CPAN module to do stuff like this -
File::DirCompare - so
thought I'd check how straightforward this would be using File::DirCompare:
#!/usr/bin/perl
use strict;
use File::Basename;
use File::DirCompare;
use File::Compare qw(compare);
use File::stat;
die "Usage: " . basename($0) . " dir1 dir2\n" unless @ARGV == 2;
my ($dir1, $dir2) = @ARGV;
File::DirCompare->compare($dir1, $dir2, sub {
my ($a, $b) = @_;
if (! $b) {
printf "Only in %s: %s\n", dirname($a), basename($a);
} elsif (! $a) {
printf "Only in %s: %s\n", dirname($b), basename($b);
} else {
my $stata = stat $a;
my $statb = stat $b;
# Return unless different
return unless compare($a, $b) != 0 ||
$stata->mode != $statb->mode ||
$stata->uid != $statb->uid ||
$stata->gid != $statb->gid;
# Report
printf "%04o %s %s %s\t\t%04o %s %s %s\n",
$stata->mode & 07777, basename($a),
(getpwuid($stata->uid))[0], (getgrgid($stata->gid))[0],
$statb->mode & 07777, basename($b),
(getpwuid($statb->uid))[0], (getgrgid($statb->gid))[0];
}
}, { ignore_cmp => 1 });
So this reports all entries that are different in content or permissions or
ownership e.g. given a tree like this (slightly modified from Mike's
example):
$ ls -lR scripts1 scripts2
scripts1:
total 28
-rw-r--r-- 1 gavin gavin 0 Mar 17 16:41 script1
-rw-r--r-- 1 gavin gavin 0 Mar 17 16:41 script1.bak
-rw-r--r-- 1 gavin gavin 0 Mar 17 16:41 script2
-rw-r--r-- 1 gavin gavin 0 Mar 17 16:41 script2.bak
-rw-r--r-- 1 gavin gavin 0 Mar 17 16:41 script3
-rw-r--r-- 1 gavin gavin 0 Mar 17 16:41 script3.bak
-rw-r--r-- 1 gavin gavin 0 Mar 17 16:49 script4
scripts2:
total 28
-rw-r--r-- 1 gavin users 0 Mar 17 16:41 script1
-rw-r--r-- 1 gavin users 0 Mar 17 16:41 script1.bak
-rw-r--r-- 1 gavin gavin 0 Mar 17 16:41 script2
-rw-r--r-- 1 gavin gavin 0 Mar 17 16:41 script2.bak
-rwxr-xr-x 1 gavin gavin 0 Mar 17 16:41 script3*
-rwxr-xr-x 1 gavin gavin 0 Mar 17 16:41 script3.bak*
-rw-r--r-- 1 gavin gavin 0 Mar 17 16:49 script5
it will give output like the following:
$ ./pdiff2 scripts1 scripts2
0644 script1 gavin gavin 0644 script1 gavin users
0644 script1.bak gavin gavin 0644 script1.bak gavin users
0644 script3 gavin gavin 0755 script3 gavin gavin
0644 script3.bak gavin gavin 0755 script3.bak gavin gavin
Only in scripts1: script4
Only in scripts2: script5
This obviously has dependencies that Mike's version doesn't have, but it
comes out much shorter and clearer, I think. It also doesn't fork and parse
an external ls
, so it should be more portable and less fragile. I should
probably be caching the getpwuid
lookups too, but that would have made it
5 lines longer. ;-)