Tuesday, July 29, 2008

Harnessing the power of multicore

Most of us are working on multi - core processors , but the code we generally write do not utilize the extra cpu given to us. I do lot of "Align EST and genomic DNA sequences" using est2genome from EMBOSS. This is a task which is embarrassingly amenable to parallelization. So how do we go about doing this using PERL? Well following a posting at PerlMonks, I got hold of Parallel::ForkManager:
wget http://search.cpan.org/CPAN/authors/id/D/DL/DLUX/Parallel-ForkManager-0.7.5.tar.gz
and extracted to:
/scratch/misc/parallel/Parallel
[if you have root access to your machine, I would recommend to install this module instead of going for 'use lib'].
Once this is done, all you need is follow the examples and get going :)
For example, I have EST in fasta format files ending with .fas. The genome I am aligning to is NC_010336.fna. For this I can write the following code:

use strict;
use warnings;
use lib '/scratch/misc/parallel/Parallel';
use LWP::Simple;
use Parallel::ForkManager;
system("ls -1 *.fas > list.tmp");
my $command="est2genome";
my $genome="NC_010336.fna";
my @tasks;
open(F,"list.tmp");
while(< F >){chomp;push(@tasks,$_);}
close F;
my $tasksize= @tasks;
print "There are # $tasksize \n";
my $pm = new Parallel::ForkManager($tasksize);
foreach my $task (@tasks) {
$pm->start and next; # do the fork
system("$command $task $genome $task.$genome.out");
$pm->finish;
}
$pm->wait_all_children;

and store is as est2genome_parallel.pl and run it as:
perl est2genome_parallel.pl
above code can be downloaded from http://sharma.animesh.googlepages.com/est2genome_parallel.pl, enjoy :)

Monday, July 28, 2008

Blast /usr/share/dict/linux.words

After reading 'Waq’s Words and World', where the author tries to find anagrams by scanning through dictionary, I got interested in checking which is the biggest word in /usr/share/dict/linux.words which can give a significant hit with blasp against swissprot. So I vimmed dict.pl and wrote:

#!/usr/bin/perl
while(<>){chomp;split(/\s+/);$c++;$dictast{length(@_[0])}.=">s.$c\n@_[0]\n";}
foreach $w (sort {$b<=>$a} keys %dictast){
open(FI,">temp.blast.in");
print "Blasting $w length word(s) file\n";
print FI"$dictast{$w}";
close FI;
system("blastcl3 -p blastp -d swissprot -i temp.blast.in -o temp.blast.out");
open(FO,"temp.blast.out");
while(<>){
print $_;
if($_=~/^Sequences producing significant/){
close FO;
die;
}
}
}


The program died aligining SPECTROCOLORIMETRY with SPDCAERCGIMRLMDTRY from ">sp|Q0U3Y6|DDI1_PHANO DNA damage-inducible protein 1":

Query: 1 SPECTROCOLORIMETRY 18
SP+C C + R+M+TRY
Sbjct: 229 SPDCAERCGIMRLMDTRY 246

To replicate or explore-further, the code can be downleaded from http://sharma.animesh.googlepages.com/dict.pl ad run as:
perl dict.pl /usr/share/dict/linux.words
[ /usr/share/dict/linux.words is the dictionary file coming with fedora 9, you may have it in different location, to find, do 'find / -name dict' ].

Anyway, to avoid getting rusty with programming (as I have with football), I wrote a Java program to scan through list of English words and find anagram pairs. Below are the interesting ones.


  1. antagonist stagnation

  2. ascertain sectarian

  3. bacterial calibrate

  4. coordinate decoration

  5. courteous outsource

  6. eroticism isometric

  7. excitation intoxicate

  8. prettiness persistent

  9. satirical racialist

  10. shattering straighten

  11. supersonic percussion



blog it

Sunday, July 27, 2008

Kyoto Prize goes to Karp

Prof. Richard M. Karp [ http://www.cs.berkeley.edu/~karp/biography.html ] won the prestigious Kyoto Prize this year in Information Science category [ http://en.wikipedia.org/wiki/List_of_Kyoto_Prize_winners ].Good to see a Bioinformatics hero getting something like Nobel.
clipped from www.bizjournals.com

UC-Berkeley professor Richard Karp has won the 2008 Kyoto Prize, Japan's equivalent of the Nobel Prize, in recognition of his lifetime achievements in the field of computer theory.

Karp is one of three laureates named Friday, entitling him to a cash gift of about $460,000, a diploma and a gold medal recognizing his work in defining the field of theoretical computer science.

 blog it

Switiching via common structural framework

clipped from www.thinkgene.com

Virtually all proteins have to be folded-some in complex configurations-in order to function properly, and many are known to require a molecule called a chaperone to fold them. Frydman estimates that perhaps 10 percent of the proteins needing chaperones must have one that, like TRiC, is part of the subset called chaperonins. Other work done in Frydman’s lab has shown that proteins that have very complex folds seem to require chaperonins.

 blog it

Biopython 1.47

Hope you guys are convinced to explore Python for you job after reading http://www.bioinformaticszen.com/2008/02/bioinformatics-zen-faq/ and more recently Mark Bieda's resentment at http://markbieda.wordpress.com/2008/06/10/i-wish-i-had-started-with-python-earlier/ and some reasons and his ways at http://markbieda.wordpress.com/2008/06/18/python-for-perl-programmers-and-bioinformatics-people/ on how to go about it. I have one more reason for you to follow the wave, the biopython's release of version 1.4.7 [ http://biopython.org/wiki/Download ] ... I am also noticing a recent rise in interest for the language LittleB based on lisp [story via Eureka].
clipped from biopython.org

The latest release is Biopython 1.47, released on 5 July 2008. Get it from our Download page.


blog it

BZ: Bioinformatics Career Survey 2008

Mike is running a survey about Bioinformatics Career [ http://www.bioinformaticszen.com/2008/07/creating-a-picture-of-different-careers-in-bioinformatics/ ]. Although it is running for quite some time, but I just came back to blog-world. I would appeal to all the people in the domain to fill this. The more the samples, the better the survey and more over this is time of data driven science [ http://www.wired.com/science/discoveries/magazine/16-07/pb_theory ] though Deepak has some issues with this development [ http://mndoci.com/blog/2008/06/25/chris-anderson-you-are-wrong/ ].
When I wrote a post about my opinion of doing a career in bioinformatics I got the impression from the comments that this was something many people wanted more information about. Pedro and I had some discussion, and thought it might be interesting to create an online survey, to get current researchers’ opinions of working in the field of bioinformatics. So, the below survey begins today (July 1) and is filled out by as many people as possible over the course the next month. The data is then released into the public domain at the start of next month (August 1). Anyone who is then interested can contribute back analysis of this data, so that on September 1 hopefully we can compile together lots of interesting statistics and graphs into a handy document discussing the highs and lows of career in bioinformatics.
 blog it

A Book Review from Adaptive Complexity: Microcosm E. coli and The New Science of Life

After reading this review [ http://www.scientificblogging.com/adaptive_complexity/e_coli_as_biologys_decoder_ring? ] from Michael White [ http://www.scientificblogging.com/mwhite74/feed ] on a book from Carl Zimmer [ http://blogs.discovermagazine.com/loom/ ] titled 'Microcosm: E. coli and The New Science of Life' [ http://www.amazon.com/gp/search?ie=UTF8&keywords=Zimmer%20e.%20Coli&tag=funnierthanyo-20&index=books&linkCode=ur2&camp=1789&creative=9325 ], I am quite eager to read this book.
Presently I am reading a book recommended by Myers [ http://scienceblogs.com/pharyngula/ ] titled 'Developmental Plasticity and Evolution' [ http://www.amazon.com/gp/product/0195122356 ] by Mary Jane West-Eberhard [ http://www.stri.org/english/scientific_staff/staff_scientist/scientist.php?id=35 ]. This along with the article referred in the above review 'Evolutionary Dynamics of Prokaryotic Transcriptional Regulatory Networks' [ http://dx.doi.org/10.1016/j.jmb.2006.02.019 ] by Madan Babu and Eisen talking about one of the heroes of bioinformatics at http://phylogenomics.blogspot.com/2008/06/connection-between-video-games-and.html , I am pretty much convinced that Evolutionary biology has lot of opportunity for Bioinformatics guys to be busy for eternity, not just "500 years of exciting problems" [ http://tex.loria.fr/historique/interviews/knuth-clb1993.html ]!

For such a slender book, Microcosm covers a wide-ranging selection of science. Zimmer begins by recapping the key events in the history of molecular biology, events in which E. coli was frequently a central player. Once scientists realized that E. coli had genes just like animals and plants, this gut bacterium gradually became one of the favorite model research systems in the then hot, new science of molecular biology. Experiments in E. coli revealed how genes are structured, how DNA is replicated, and how genes are controlled. Marshall Nirenberg and his colleagues cracked the essentially universal genetic code using cellular components from E. coli. Joshua Lederberg used bacterial 'sex' to bring the formidable tools of genetics to bacterial studies. Today, E. coli is the most well-mapped organism on the planet.

 blog it

Saturday, July 26, 2008

Last Lecture

Randy Pausch [ http://en.wikipedia.org/wiki/Randy_Pausch ] passed away yesterday morning. I admire him a lot and whenever I feel down, I watch his Last Lecture [ http://www.youtube.com/watch?v=ji5_MqicxSo ]. May his soul rest in peace ...
 blog it