BLAST result alignment view  for Spike glycoprotein of Coronavirus (sp|P0DTC2)

Fig. 1. BLAST result alignment view  for Spike glycoprotein of Coronavirus (sp|P0DTC2)

One of the sequences in the pre-release proteome of 2019-nCoV Wuhan Coronavirus [1] is Spike glycoprotein (sp|P0DTC2). This sequence, when aligned with BLAST against nucleotide database NCBI [2], shows a peptide insert PRRA (Figure 1) with respect to Bat coronavirus RaTG13 [2].
2

Finally got some quality time to spend with a birthday gift from Katharina. It is one of those dry EEG sensor containing devices which can capture the EEG signal just by placing the sensor on the forehead, no gel, no hell ;) To top it all, NeuroSky mobile, can connect to any Bluetooth device and through a good collection of APIs, EEG signals and be collected and processed. This felt like the easiest way to enter the amazing world of brain-computer interface :) and so I did...

I had to process multiple Orbitrap raw files using the same parameter with Max Quant, was not able to find a simple tool for this, thus I wrote a batch script which can be downloaded from https://docs.google.com/file/d/0BxbjZeVL8S4EQW1zbVd4TzRYSDg/edit .

It needs a preconfigured parameter file to begin with. For that, i generally open the Max Quant for a dummy file called TestFile.raw and set the desired parameters and fasta file for search through its GUI.

“Insanity is repeating the same mistakes and expecting different results.” 

- Narcotics Anonymous

I have been wondering about this for a long long time... I feel our daily lives are often a mixture of random events and  results of decisions we make as we go on living. Brain as an advanced pattern recognition device is constantly trying to make sense of this, attributing events to nature and/or our decision(s).

Recently Mohan asked me an interesting question about positions where "GATC" motif occurs in an E.coli genome(s). While figuring this out, I observed that mean of the distance between consecutive GATC is around ~250 for these genomes (242.6 for the linked one). This made us wonder if this is special or just the expected value? With a lot of help from Pragun and loads of generalization,  we found it to be around ~255.

In the last post I talked about the overlap-layout-consensus (OLC) way of Genome assembly. The approach which is (really!) getting popular these days is the other one, de-Bruijn-graphs (DBG). It is based on the simple idea of converting the hard problem of finding Hamiltonian to relatively simpler Eulerian for assembling biological sequences. There is a beautiful tutorial like introduction co-authored by the 'father' of this idea, Prof. Pevzner.

Just came across Andrew's Hamiltonian Cycle finder (HCF)

get:

wget https://raw.github.com/ahh/ahh-toys/master/ham.sh

simple check:

bash ham.sh

a b

b c

c a

[followed by Ctrl-D should produce]

a b c

based purely on shell commands and thought of testing it out as a Genome assembler. The genome assembly problem is closely related to finding the shortest common superstring (S) of a given set of strings (s1, s2… sn).

Seems like the Ensembl API is not providing a way to get list of results (?) for web query, thus decided to write a simple perl script.

Google's prediction API is now available to everyone with nice tutorial on how to get going :) so I decided to check its performance using gene expression values for 5 selected genes using online-feature-selection from the classic AML/ALL dataset (training/test).

The dataset needed to be formatted (training/test) and copied to Google storage to get the default scripts working.
3
Link List
Link List
Total Pageviews
Total Pageviews
216381
Popular Posts
Popular Posts
Misc blogs
Misc blogs
Link List
Blog Archive
Subscribe
Subscribe
About Me
About Me
Loading