Converting older SCARF format to FASTQ

0 comments
Recently, I was asked by a collegue if I recognize the following raw data format coming from a quite old dataset which came out from the first next generation sequencers and relatively old software which was used for base calling:


203K0:1:1:626:335:ATTCCATTCCATTCCATTCCATTCCATTCCAT:[[[[[[[[[[[[[[[[[[[[UUUUUUUUOUUU
203K0:1:1:119:614:TAAAAACTAGATAGAAGCAATGTCAGAACTTT:[[[[[[[[[[[[[[W[[[[[UUUUUUUUUUUU
203K0:1:1:114:772:TCCTAGCTAGTTCCCTGCAGCTTTTTATTAAC:[[[[[[[[[[[[[[[[[[WWUUUUUUUCIUUU
203K0:1:1:490:490:GTTGGTGCTTAAAAGTCTTGGATTTTGAAACA:[[[[[[[[[[[[[[W[[[[[UUUUUUOOIUUU

Is Bioinformatics really hard?

2 comments
I would like to share some thoughts that came to my mind today, after a specific event having to do with bioinformatics training. First of all, don't get confused by the title of the post. It might sound like a selfish, elitistic or even racist comment! No, it has nothing to do with selfishness... Let me explain.

During my MSc in Bioinformatics, I met four kinds of people.

  1. Biologists or other people coming from life sciences and bench work that wanted either to switch to bioinformatics or to get basic training but failed to do so because of their fear to sit down and face the evil "black screen" of a Unix command line, let alone other hierarchically lower demons such as basic statistics, or "for" loops, or hierarchically higher demons such as R, algorithmics, basic Perl etc.
  2. Computer scientists and/or mathematicians (like myself) that wanted to apply their background knowledge to a more "practical" and at the same time still scientific level, than predicting algorithm complexities, solving partial differential equations or wandering inside Banach spaces. However, the "application" turned out to be quite difficult as the the types of RNA, the thousands of genes with strange names and those blurry gel images seemed more noisy than an elegant solution of a differential equation or one more O(nlogn) algorithm optimization.
  3. People of the first kind that made it.
  4. People of the second kind that made it.

Creating stranded signal tracks for the UCSC Genome Browser

2 comments

Recently, I was asked by one of my collegues if there was a way to display stranded wiggle signal files. A couple of years ago I would say that it is possible, however quite messy, as the only way to display stranded wiggle signal files was to split the original genomic co-ordinate file (BED, SAM) per strand and then create two separate wiggle (or bigWig) tracks. This was happening because the wiggle (and later bedGraph) specification does not allow for overlapping signals. Now it is much clearer (slighlty more complicated though) by using the ability of the UCSC Genome Browser to overlay tracks, by creating a super/parent tracks and assigning children tracks to it. Unfortunately, this ability is only possible with track hubs.
In this post, I will show you how to create stranded wiggle (or in this example, bigWig) files by setting up a track hub that will host your signal files and which you can upload to the genome browser.
Copyright © Bioinformatics dance