Tuesday, July 29, 2008

Harnessing the power of multicore

Most of us are working on multi - core processors , but the code we generally write do not utilize the extra cpu given to us. I do lot of "Align EST and genomic DNA sequences" using est2genome from EMBOSS. This is a task which is embarrassingly amenable to parallelization. So how do we go about doing this using PERL? Well following a posting at PerlMonks, I got hold of Parallel::ForkManager:
wget http://search.cpan.org/CPAN/authors/id/D/DL/DLUX/Parallel-ForkManager-0.7.5.tar.gz
and extracted to:
/scratch/misc/parallel/Parallel
[if you have root access to your machine, I would recommend to install this module instead of going for 'use lib'].
Once this is done, all you need is follow the examples and get going :)
For example, I have EST in fasta format files ending with .fas. The genome I am aligning to is NC_010336.fna. For this I can write the following code:

use strict;
use warnings;
use lib '/scratch/misc/parallel/Parallel';
use LWP::Simple;
use Parallel::ForkManager;
system("ls -1 *.fas > list.tmp");
my $command="est2genome";
my $genome="NC_010336.fna";
my @tasks;
open(F,"list.tmp");
while(< F >){chomp;push(@tasks,$_);}
close F;
my $tasksize= @tasks;
print "There are # $tasksize \n";
my $pm = new Parallel::ForkManager($tasksize);
foreach my $task (@tasks) {
$pm->start and next; # do the fork
system("$command $task $genome $task.$genome.out");
$pm->finish;
}
$pm->wait_all_children;

and store is as est2genome_parallel.pl and run it as:
perl est2genome_parallel.pl
above code can be downloaded from http://sharma.animesh.googlepages.com/est2genome_parallel.pl, enjoy :)

Monday, July 28, 2008

Blast /usr/share/dict/linux.words

After reading 'Waq’s Words and World', where the author tries to find anagrams by scanning through dictionary, I got interested in checking which is the biggest word in /usr/share/dict/linux.words which can give a significant hit with blasp against swissprot. So I vimmed dict.pl and wrote:

#!/usr/bin/perl
while(<>){chomp;split(/\s+/);$c++;$dictast{length(@_[0])}.=">s.$c\n@_[0]\n";}
foreach $w (sort {$b<=>$a} keys %dictast){
open(FI,">temp.blast.in");
print "Blasting $w length word(s) file\n";
print FI"$dictast{$w}";
close FI;
system("blastcl3 -p blastp -d swissprot -i temp.blast.in -o temp.blast.out");
open(FO,"temp.blast.out");
while(<>){
print $_;
if($_=~/^Sequences producing significant/){
close FO;
die;
}
}
}


The program died aligining SPECTROCOLORIMETRY with SPDCAERCGIMRLMDTRY from ">sp|Q0U3Y6|DDI1_PHANO DNA damage-inducible protein 1":

Query: 1 SPECTROCOLORIMETRY 18
SP+C C + R+M+TRY
Sbjct: 229 SPDCAERCGIMRLMDTRY 246

To replicate or explore-further, the code can be downleaded from http://sharma.animesh.googlepages.com/dict.pl ad run as:
perl dict.pl /usr/share/dict/linux.words
[ /usr/share/dict/linux.words is the dictionary file coming with fedora 9, you may have it in different location, to find, do 'find / -name dict' ].

Anyway, to avoid getting rusty with programming (as I have with football), I wrote a Java program to scan through list of English words and find anagram pairs. Below are the interesting ones.


  1. antagonist stagnation

  2. ascertain sectarian

  3. bacterial calibrate

  4. coordinate decoration

  5. courteous outsource

  6. eroticism isometric

  7. excitation intoxicate

  8. prettiness persistent

  9. satirical racialist

  10. shattering straighten

  11. supersonic percussion



blog it

Sunday, July 27, 2008

Kyoto Prize goes to Karp

Prof. Richard M. Karp [ http://www.cs.berkeley.edu/~karp/biography.html ] won the prestigious Kyoto Prize this year in Information Science category [ http://en.wikipedia.org/wiki/List_of_Kyoto_Prize_winners ].Good to see a Bioinformatics hero getting something like Nobel.
clipped from www.bizjournals.com

UC-Berkeley professor Richard Karp has won the 2008 Kyoto Prize, Japan's equivalent of the Nobel Prize, in recognition of his lifetime achievements in the field of computer theory.

Karp is one of three laureates named Friday, entitling him to a cash gift of about $460,000, a diploma and a gold medal recognizing his work in defining the field of theoretical computer science.

 blog it

Switiching via common structural framework

clipped from www.thinkgene.com

Virtually all proteins have to be folded-some in complex configurations-in order to function properly, and many are known to require a molecule called a chaperone to fold them. Frydman estimates that perhaps 10 percent of the proteins needing chaperones must have one that, like TRiC, is part of the subset called chaperonins. Other work done in Frydman’s lab has shown that proteins that have very complex folds seem to require chaperonins.

 blog it

Biopython 1.47

Hope you guys are convinced to explore Python for you job after reading http://www.bioinformaticszen.com/2008/02/bioinformatics-zen-faq/ and more recently Mark Bieda's resentment at http://markbieda.wordpress.com/2008/06/10/i-wish-i-had-started-with-python-earlier/ and some reasons and his ways at http://markbieda.wordpress.com/2008/06/18/python-for-perl-programmers-and-bioinformatics-people/ on how to go about it. I have one more reason for you to follow the wave, the biopython's release of version 1.4.7 [ http://biopython.org/wiki/Download ] ... I am also noticing a recent rise in interest for the language LittleB based on lisp [story via Eureka].
clipped from biopython.org

The latest release is Biopython 1.47, released on 5 July 2008. Get it from our Download page.


blog it

BZ: Bioinformatics Career Survey 2008

Mike is running a survey about Bioinformatics Career [ http://www.bioinformaticszen.com/2008/07/creating-a-picture-of-different-careers-in-bioinformatics/ ]. Although it is running for quite some time, but I just came back to blog-world. I would appeal to all the people in the domain to fill this. The more the samples, the better the survey and more over this is time of data driven science [ http://www.wired.com/science/discoveries/magazine/16-07/pb_theory ] though Deepak has some issues with this development [ http://mndoci.com/blog/2008/06/25/chris-anderson-you-are-wrong/ ].
When I wrote a post about my opinion of doing a career in bioinformatics I got the impression from the comments that this was something many people wanted more information about. Pedro and I had some discussion, and thought it might be interesting to create an online survey, to get current researchers’ opinions of working in the field of bioinformatics. So, the below survey begins today (July 1) and is filled out by as many people as possible over the course the next month. The data is then released into the public domain at the start of next month (August 1). Anyone who is then interested can contribute back analysis of this data, so that on September 1 hopefully we can compile together lots of interesting statistics and graphs into a handy document discussing the highs and lows of career in bioinformatics.
 blog it

A Book Review from Adaptive Complexity: Microcosm E. coli and The New Science of Life

After reading this review [ http://www.scientificblogging.com/adaptive_complexity/e_coli_as_biologys_decoder_ring? ] from Michael White [ http://www.scientificblogging.com/mwhite74/feed ] on a book from Carl Zimmer [ http://blogs.discovermagazine.com/loom/ ] titled 'Microcosm: E. coli and The New Science of Life' [ http://www.amazon.com/gp/search?ie=UTF8&keywords=Zimmer%20e.%20Coli&tag=funnierthanyo-20&index=books&linkCode=ur2&camp=1789&creative=9325 ], I am quite eager to read this book.
Presently I am reading a book recommended by Myers [ http://scienceblogs.com/pharyngula/ ] titled 'Developmental Plasticity and Evolution' [ http://www.amazon.com/gp/product/0195122356 ] by Mary Jane West-Eberhard [ http://www.stri.org/english/scientific_staff/staff_scientist/scientist.php?id=35 ]. This along with the article referred in the above review 'Evolutionary Dynamics of Prokaryotic Transcriptional Regulatory Networks' [ http://dx.doi.org/10.1016/j.jmb.2006.02.019 ] by Madan Babu and Eisen talking about one of the heroes of bioinformatics at http://phylogenomics.blogspot.com/2008/06/connection-between-video-games-and.html , I am pretty much convinced that Evolutionary biology has lot of opportunity for Bioinformatics guys to be busy for eternity, not just "500 years of exciting problems" [ http://tex.loria.fr/historique/interviews/knuth-clb1993.html ]!

For such a slender book, Microcosm covers a wide-ranging selection of science. Zimmer begins by recapping the key events in the history of molecular biology, events in which E. coli was frequently a central player. Once scientists realized that E. coli had genes just like animals and plants, this gut bacterium gradually became one of the favorite model research systems in the then hot, new science of molecular biology. Experiments in E. coli revealed how genes are structured, how DNA is replicated, and how genes are controlled. Marshall Nirenberg and his colleagues cracked the essentially universal genetic code using cellular components from E. coli. Joshua Lederberg used bacterial 'sex' to bring the formidable tools of genetics to bacterial studies. Today, E. coli is the most well-mapped organism on the planet.

 blog it

Saturday, July 26, 2008

Last Lecture

Randy Pausch [ http://en.wikipedia.org/wiki/Randy_Pausch ] passed away yesterday morning. I admire him a lot and whenever I feel down, I watch his Last Lecture [ http://www.youtube.com/watch?v=ji5_MqicxSo ]. May his soul rest in peace ...
 blog it

Friday, June 20, 2008

454

I have been trying to understand the pyrosequencing technology, popularly known as 454. The very first question which came to my mind is WHY it is called 454? Is it because .454 kasool Handgun is sleeker then Shotgun ( Cf. Whole Genome Shotgun Sequence ) and slowly becoming as powerful ... or is it because of the fact that "In one instrument run sequence a minimum of 20 million base pairs in 4.5 hours" ... just a conjecture, if you guys have a proof, please share !
Anyways coming back to the technology, what I understand (below is a crude attempt to visualize it) is that there are 4 containers containing A,T,G and C which sequentially pass over the bowl containing a mixture of Polymerase, Sulfurylase, Luciferase, Apyrase and the DNA fragment to be sequenced. The addition of the nucleotide will light up ( thanks to lucifer-ase ) the CCD (a thing close to what we have in a digital camera and the costliest part of the instrument). Since we know the culprit nucleotide container, peak telling how many of that nucleotide is added, we can sequentially proceed and keep adding letters to the reads... this is happening in a massively parallel fashion and that brings out the power of 454...



Saturday, June 14, 2008

Biocomputing

Haynes et al. in the article "Engineering bacteria to solve the Burnt Pancake Problem" [ http://www.jbioleng.org/content/2/1/8#IDAJQH0E ] demonstrate how one can take advantage of direction sense embedded in DNA by coding 5'-3' as 0 and 3'-5' as 1 binary coding to convert cells into computers.
clipped from www.msnbc.msn.com

Ron Weiss, an assistant professor of electrical engineering and molecular biology at Princeton University, said the new study provides a “nice demonstration that there’s some capability to instruct cells to carry out computational tasks. And I think this certainly brings a new aspect to what’s been demonstrated before.” Weiss, who wasn’t involved with the research, said the effort is another sign that the field is progressing, despite its relative infancy. “We really are at the beginning, at the vacuum tube stage or something like that, if you use the comparison with computer electronics,” he said.

 blog it

Saturday, June 7, 2008

International Conference on Functional Programming 2008

The 11th ICFP ( http://icfpcontest.org/ ) Programming Contest is from Friday, July 11, 2008 to Monday, July 14, 2008. More details on what the contest is like, check out http://cpoucet.wordpress.com/2008/06/11/icfp-contest-2008/ .
Mail queries to Tim Sheard at sheard@cs.pdx.edu.

The ICFP Programming Contest is one of the most advanced and
prestigious programming contests, as well as being a chance to show
off your programming skills, your favorite languages and tools, and
your ability to work as a team. The contest is affiliated with the
International Conference on Functional Programming. Teams consisting
of one or more participants, from any part of the world, using any
programming language, may enter.

blog it

Tuesday, June 3, 2008

Unofficial Google Shell

I have been waiting for google shell. I thought they will name it gshell, but it turns out:
goosh.org - the unofficial google shell ( http://goosh.org/ ).
It is pretty neat and useful in its neonatal state itself, like to read a feed, you can type, for eg:

r http://www.google.com/reader/public/atom/user/11136726768885096694/label/cbn-roll

and gets default 4 latest feeds, or you can check out a ncbi accession by typing, for eg:

NP_001108376

and get to the Fugu fish refseq link.

The most useful thing is the fact that it it text based and quite easy to parse visually as well as via scripts.
Another cool feature for me right now is the shell translation, like to get the english word for the norwegian word 'hvor', all I have to do is type:

t no en hvor

which correctly says:
translating "hvor" from "no" to "en":

"where"

More discussion going on at http://tech.slashdot.org/article.pl?sid=08/06/02/222234&from=rss .
Enjoy!
clipped from goosh.org
Goosh goosh.org 0.4.3-beta #1 Mon, 02 Jun 08 22:28:01 UTC Google/Ajax
help

commandaliasesparametersfunction
web(search,s,w)[keywords]google web search
lucky(l)[keywords]go directly to first result
images(image,i)[keywords]google image search
wiki(wikipedia)[keywords]wikipedia search
clear(c)
clear the screen
help(man,h,?)[command]displays help text
news(n)[keywords]google news search
blogs(blog,b)[keywords]google blog search
feeds(feed,f)[keywords]google feed search
open(o)<url>open url in new window
go(g)<url>open url
more(m)
get more results
in(site)<url> <keywords>search in a specific website
load
<extension_url>load an extension
video(videos,v)[keywords]google video search
read(rss,r)<url>read feed of url
place(places,map,p)[address]google maps search
lang
<language>change language
addengine

add goosh to firefox search box
translate(trans,t)[lang1] [lang2] <words>google translation

blog it