Tuesday, July 29, 2008

Harnessing the power of multicore

Most of us are working on multi - core processors , but the code we generally write do not utilize the extra cpu given to us. I do lot of "Align EST and genomic DNA sequences" using est2genome from EMBOSS. This is a task which is embarrassingly amenable to parallelization. So how do we go about doing this using PERL? Well following a posting at PerlMonks, I got hold of Parallel::ForkManager:
wget http://search.cpan.org/CPAN/authors/id/D/DL/DLUX/Parallel-ForkManager-0.7.5.tar.gz
and extracted to:
/scratch/misc/parallel/Parallel
[if you have root access to your machine, I would recommend to install this module instead of going for 'use lib'].
Once this is done, all you need is follow the examples and get going :)
For example, I have EST in fasta format files ending with .fas. The genome I am aligning to is NC_010336.fna. For this I can write the following code:

use strict;
use warnings;
use lib '/scratch/misc/parallel/Parallel';
use LWP::Simple;
use Parallel::ForkManager;
system("ls -1 *.fas > list.tmp");
my $command="est2genome";
my $genome="NC_010336.fna";
my @tasks;
open(F,"list.tmp");
while(< F >){chomp;push(@tasks,$_);}
close F;
my $tasksize= @tasks;
print "There are # $tasksize \n";
my $pm = new Parallel::ForkManager($tasksize);
foreach my $task (@tasks) {
$pm->start and next; # do the fork
system("$command $task $genome $task.$genome.out");
$pm->finish;
}
$pm->wait_all_children;

and store is as est2genome_parallel.pl and run it as:
perl est2genome_parallel.pl
above code can be downloaded from http://sharma.animesh.googlepages.com/est2genome_parallel.pl, enjoy :)

3 comments:

Adam said...

That's really useful! Thanks Animesh!

Rahmi Lale said...

got you...

Rahmi Lale said...

Yoy, it's getting drier here. You need to water once in a while, huh?