Parallel Processing Perl Modules
Needed to parallelise some processing in perl the last few days, and did a quick survey of some of the parallel processing modules on CPAN, of which there is the normal bewildering diversity.
As usual, it depends exactly what you're trying to do. In my case I just needed to be able to fork a bunch of processes off, have them process some data, and hand the results back to the parent.
So here are my notes on a random selection of the available modules. The example each time is basically a parallel version of the following map:
my %out = map { $_ ** 2 } 1 .. 50;
Parallel::ForkManager
Object oriented wrapper around 'fork'. Supports parent callbacks. Passing data back to parent uses files, and feels a little bit clunky. Dependencies: none.
use Parallel::ForkManager 0.7.6; my @num = 1 .. 50; my $pm = Parallel::ForkManager->new(5); my %out; $pm->run_on_finish(sub { # must be declared before first 'start' my ($pid, $exit_code, $ident, $exit_signal, $core_dump, $data) = @_; $out{ $data->[0] } = $data->[1]; }); for my $num (@num) { $pm->start and next; # Parent nexts # Child my $sq = $num ** 2; $pm->finish(0, [ $num, $sq ]); # Child exits } $pm->wait_all_children;
[Version 0.7.9]
Parallel::Iterator
Basically a parallel version of 'map'. Dependencies: none.
use Parallel::Iterator qw(iterate); my @num = 1 .. 50; my $it = iterate( sub { # sub is a closure, return outputs my ($id, $num) = @_; return $num ** 2; }, \@num ); my %out = (); while (my ($num, $square) = $it->()) { $out{$num} = $square; }
[Version 1.00]
Parallel::Loops
Provides parallel versions of 'foreach' and 'while'. It uses 'tie' to allow shared data structures between the parent and children. Dependencies: Parallel::ForkManager.
use Parallel::Loops; my @num = 1 .. 50; my $pl = Parallel::Loops->new(5); my %out; $pl->share(\%out); $pl->foreach( \@num, sub { my $num = $_; # note this uses $_, not @_ $out{$num} = $num ** 2; });
You can also return values from the subroutine like Iterator, avoiding the explicit 'share':
my %out = $pl->foreach( \@num, sub { my $num = $_; # note this uses $_, not @_ return ( $num, $num ** 2 ); });
[Version 0.03]
Proc::Fork
Provides an interesting perlish forking interface using blocks. No built-in support for returning data from children, but provides examples using pipes. Dependencies: Exporter::Tidy.
use Proc::Fork; use IO::Pipe; use Storable qw(freeze thaw); my @num = 1 .. 50; my @children; for my $num (@num) { my $pipe = IO::Pipe->new; run_fork{ child { # Child $pipe->writer; print $pipe freeze([ $num, $num ** 2 ]); exit; } }; # Parent $pipe->reader; push @children, $pipe; } my %out; for my $pipe (@children) { my $entry = thaw( <$pipe> ); $out{ $entry->[0] } = $entry->[1]; }
[Version 0.71]
Parallel::Prefork
Like Parallel::ForkManager, but adds better signal handling. Doesn't seem to provide built-in support for returning data from children. Dependencies: Proc::Wait3.
[Version 0.08]
Parallel::Forker
More complex module, loosely based on ForkManager (?). Includes better signal handling, and supports scheduling and dependencies between different groups of subprocesses. Doesn't appear to provide built-in support for passing data back from children.
[Version 1.232]