Google Summer of Code 2011 project proposal

Project Title

Implementation of a new quality control program for RNA sequencing data


Modern high throughput sequencers can generate tens of millions of sequences in a single run. Before analysing this sequences to draw biological conclusions one should always perform some simple Quality Control (QC) checks to ensure that the raw data looks good and there are no problems or biases in the data which may affect how you can usefully use it.

Most sequencers will generate a Quality Control report as part of their analysis pipeline, but this is usually only focused on identifying problems which were generated by the sequencer itself.

For the Google Summer of Code 2011 I would like to develop a stand-alone Python program that performs base-level and transcript-level Quality Control measures on RNA sequencing data from various sequencing platforms (e.g., Illumina paired-end, ABI Solid).

The program will build on currently existing scripts in Perl and R and go far beyond existing QC programs, such as FastQC, by providing analyses on known transcripts, exons and junctions.


Community Bonding Period (April 25 – May 22):
Get to know mentors, read documentation, read FastCQ source code, determine with mentors how we can integrate this project as closely as possible into the GenMAPP, Cytoscape, and WikiPathways workflows.

Start of Program (May 23 – July 14)
Start coding, port the Java, Perl and R codebases to Python, implement the different algorithms.

Midterm Evaluation (July 15 – Augustus 15)
Write documentation, write tests, format output in nice tables and graphical plots.

Pencils down (Augustus 15 – Augustus 21)
Scrub code, improve tests and documentation.

Final evaluation deadline (August 26)

List of Deliverables

A stand-alone Python program with:

  • base-level Quality Control:
    • analysis of the base composition
    • information about the error rates (e.g., quality per base position over read length)
    • alignment statistics (mapped, unmapped, non-unique mappings)
  • transcript-level Quality Control:
    • transcript read density variation (5’ vs. 3’, exon vs. junction, exon vs. intron, normalization correction bias)
    • replicate comparison (quantile-quantile aligned read count plots)
    • known versus novel exon/junctions and expression of a panel of known housekeeping genes

Output will include tables and graphical plots (PMW). Implementation without calls to external Python libraries (e.g., NumPy) would be preferrable.


In 2004 I obtained a Masters degree in Biology from the University of Antwerp, Belgium, on a bioinformatics thesis whose title translates to “comparison of methods to prioritize candidate disease genes according to their presumed involvement in human hereditary diseases”. For it, I wrote a computer program in Perl which combined and compared four techniques to prioritize candidate disease genes.

Then I worked a few years in the computer industry at EDS (now part of Hewlett-Packard), obtained a Java EE software developer certificate and did an internship at a startup where I wrote an extension in Eclipse BIRT for their real-time quantitative PCR analytics software.

Since September I am again a full-time student, now at the Catholic University of Leuven, Belgium, and studying towards a Master of Bioinformatics degree.


Besides using R and Matlab for homework assignments, the main scripting language I use is Python. I love solving Project Euler problems with it. Ocasionally, I also play with Haskell, J and Scheme.

Since 2002, I have been using Linux as my only operating system (first Red Hat, then Freebsd, now Ubuntu). It has given me a good understanding of UNIX system administration.

I have wide ranging interests, see for example my bookmarks. Tags that frequently appear include bioinformatics (479 times), data visualization (57) and machine learning (64).


Jeroen Van Goey

On the net, also known as BioGeek (for example at reddit)

Preferred method of communication

Google Talk, Skype

Kite Aerial Photography along the coast of Spain and Portugal.

In the beginning of April 2009 I made a post on the Kite Aerial Photography forum and asked to recommend me a beginner rig and kite. I explained that I would join friends on a sailing trip from Breskens in the Netherlands to Lisbon in Portugal, and that that trip would be the perfect opportunity to finally get started in Kite Aerial Photography (KAP).

I ordered a Sutton Flowform 16, a BEAK-Servo and a GentLED CHDK from KAPshop. I had asked for the “Becot” variant of the kite, so Peter – the owner of KAPshop- had to do some modifications. And because I had posted the question only a week for departure, the kit didn’t arrive in time at my home address. Luckily, a fourth crew member was delayed for personal reasons as well, and when he hopped on board in Cherbourg, France, he had the KAP gear with him. As soon as he was on board we departed for a 4-day non stop sailing trip to cross the Bay of Biscaye, so the first time I actually opened the box with KAP items and could take a decent look at its contents was in La Coruña in Spain.

There is a small castle on an island not far from the yacht harbour (Castillo de San Antón) that looked like an ideal target for my first try at KAPing. The wind was quite light, but I did get the kite up without any problems. Since this was my first kite flight since childhood I used it mainly to get acquainted with the Flowform. At the end of the session I even attached the BEAK to the line, but the wind was too light to lift the rig. I unhooked it and carefully put the rig back on the ground. For this I used both my hands: one for holding the kite-line the other for unclipping and with the reel under my right foot. Just as I put the rig down on the ground the reel slipped from under my foot and the kite flew away! The reel tumbled down the rocks of the peer and the kite fell into the water. After climbing down the rocks to the reel, I easily dragged the kite back in (together with some seaweed), but first lesson learned: never let go of the reel. Since then I always made sure that I had my climbing harness on while KAPing, and attatched my kite-line to my figure-eight.

At that point I still hadn’t succeeded in putting CHDK on my G9 (due to the 4GB card being FAT32 instead of FAT16). A few days later I also broke the LCD screen of my G9 in a non-KAP related accident and hence the menu options weren’t accessible any more. That meant I couldn’t use the GentLED CHDK for triggering the camera. Luckily I had also ordered a servo mechanism for pushing the camera button (which I had intended to use with a watertight film camera) so in Vigo, Spain I installed that and went for my first picture-taking KAP flight. The conditions were very good, strong stable wind blowing in the direction of the peer I was standing on and I’m very happy with the results I got:

KAP flight preparation in action
Picture taken from the boat with me getting the FlowForm airborne

Mijn fototoestel aan een vlieger gehangen, en dan krijg je foto's zoals dit!
The result from the air

The four-masted barque in the background of the right picture is the 114.4 m (376 ft) Kruzenshtern which was in Vigo to participate in the Tall Ships Atlantic Challenge 2009. (And yes, I have contemplated doing a KAP session closer to her, but since I was just getting started with this KAP thing I didn’t feel at ease doing this standing somewhere in the middle of the massive crowd visiting the boat nor did I want to have my kite line tangled up with her yards.)

A few days later we anchored in front of the Islas Cies, a group of islands near Baiona. Once a pirates’ haunt, Cies is now an uninhabited and pristine national park and the beach was number one in a list of ‘Top 10 beaches of the world’ in a Guardian article. Stunning vistas there so I tried to lift the kite several times, but each time there was barley enough wind to lift the kite, let alone the rig. We even hiked to the top of the island in the hope of finding more wind there, but alas. (As an alternative, I took some shots from the top of the mast.) The best shot I got was this, while anchored before the southern Illa de San Martiño, with the camera only one meter out before I had to reel it back in.

KAP from a boat while for anchor

Poging tot vliegerfotografie terwijl we voor anker liggen
My friends looking intensly while I try not to drop the camera in the water with the low winds

The second succesfull KAP flight was in Baiona, famous for its Parador (now a four star hotel) built in the style of a Galician manor house within the walls of a medieval fortress. The fort was built to protect (not always successfully) the port of Baiona from enemies and pirates. Again, I am very satisfied with the obtained results (except for the upper left corner in the fourth picture which was overexposed by the sun).

Zelfportret vanuit de vlieger
I was standing behind a wall with canons with steady wind from the sea.

Strand van Baiona
The beach and yacht harbour, parts of the medieval fortres wall visible to the right

Speedboot vaart uit
A speedboat enters the ‘Ria’ of Baiona

Parador de Baiona - luxues viersterrenhotel binnen de oude fortengordel die de haven beschermt
The four star Hotel Conde de Gondomar

The next opportunity I got to go KAPing was in Lisbon. I had chosen my spot carefully, and had positioned myself with the Torre de Belém on my right side, with the Padrão dos Descobrimentos to my left side, a nice lighthouse tower behind me, and the wind pushing the kite steady over the river Tagus. Except, this time I had pointed my camera way too low, so most of the 50+ pictures where of boring grey river water without any features. The few shots that did include some scenery also failed to impress: there was one with a small, unsharp Torre de Belém in the upper right corner and the shot from the Padrão dos Descobrimentos was heavily overexposed by the sun. I tried to save them in post-processing, but didn’t manage. So another lesson learned: besides looking for a good location and keeping an eye on the wind, take also in account the position of the sun and your camera angle.

The only picture that I have slightly tilted in post-processing, all the other pictures are straight out of camera

Is there anybody here who can make a decent picture out of this?

So, in summary: a great hobby, I had lots of fun, got some mighty nice shots that I wouldn’t be have gotten otherwise and will certainly come back for more.

(For those interested, the complete set of pictures from my sailing trip can be found on my Flickr page.)

Cycling to a wedding in Poland

Friends of Kristien got married. He is a software engineer from Belgium who now works for Google in Zurich. She is a pretty Polish girl. Together they decided to marry in her home country, in a little castle in woods of Tuczno. The place is pretty hard to reach, so Kristien and i decided to make it even more adventurous by taking the train to Berlin, renting two bicycles there and cycling the 200 km from the German border to Tuczno.

Taking the lead

The landscape in Poland was beautiful: lots of small villages…

Setting sun

…desolate farms…

So old it becomes beautiful again

"Once upon a time there were three pigs who went out into the world to seek their fortune..."

…big lakes and huge forests…

Starting the day in early sunshine


… and immense wide open plains. They can best be seen in this small movie I made from video-snippets I took with my Canon G9.

All in all, the trip was a real joy!


The long green road to the Mediterranean Sea

What do you do when you have two weeks of vacation and don’t know yet how to spend them? You start flipping through the atlas.

While doing this it occurred to me that friends of my parents own a little castle (Château de Pierrefitte) in the middle of France, about 500 km straight South from where I live in Antwerp. And we have an uncle and aunt who own a holiday home in the Cévennes, again about 500 km straight South as the crow flies. And if I reached their place, I could as well continue to the Mediterranean Sea.

A naive estimate was that the distance over the road would be about 1300 km. I had never undertaken a multiple-day bike trip before but since I don’t have a drivers license I do everything by bike. I figured that should be able to cycle 100km/day which left me the last day to come back by train.

View The initial plan for my cyling holiday in a larger map

A few days later I had booked my Thalys ticket from Montpellier to Antwerp, bought a good trekking bike, bought the cycle guide “De Groene Weg naar de Middellandse Zee” (which contains a route that follows mostly secondary roads and avoids the big mountain passes of the Vogezes, Jura and Alpes) and set off.

At night I slept in a small one-person tent, mostly on campings but sometimes in a field, hidden from sight from the main road.

I hadn’t even left Belgium as disaster struck. I was testing the limits of my new bike, going downhill as fast as I could. If I just freewheeled I reached speeds of around 50 km/h. Crouching myself over my handlebars and making myself as aerodynamical as possible increased the speed to 55 km/h. And if I started peddaling while going downhill I pushed my speeds up to 60 km/h … 61 km/h … at 62 km/h the frame of my bike started to shake so violently that I couldn’t control it anymore and made a big slide on a hillside near Saint Hubert. Result: the iron frame of my front pannier completely plied.

I also had scratch wounds on my left palm, on big parts of my right leg and I had to go to the hospital to get seven stiches in my elbow. (Click the image to see a video of the damage. Warning, video shows my naked butt and opens in a new window.)

After a day of rest I decided to continue anyway. Because I had now lost a day in my schedule and because the total cycling distance was in reality closer to 1500 km, I had to cycle way more than the 100 km a day initially planned. This, however, didn’t stop me from taking pictures of all the fauna

and flora I met underway (one of the advantages of cycling solo).

When my friends saw the above photos, they started making jokes about how I managed to incorporate my bike in almost every picture. That’s then one of the disadvantages of cycling solo, I didn’t have a girlfriend to put into the frame, so my bike became my muse. After cycling over several mountain passes,

and visiting charming villages,

I finally reached my destination: the warm water of the Mediterranean Sea.

The route I followed can be seen on the map below:

View The green road to the Mediterranean Sea in a larger map

If you still haven’t had enough, you can see the rest of the pictures in this slide show (captions in Dutch):

Winner overall of the Stellendam Regatta

In the weekend of 23 – 24 april 2005 I participated in the VanUden-Reco Stellendam Regatta on board of Taraké, a brand new Hanse 371. She’s a Judel and Vrolijk design, and if you know that Rolf Vrolijk is the designer responsible for the winning America’s Cup boat Alinghi, then you know that she was build to sail fast. And it showed. The first day we finished second in our class (SW), some 5 minutes on corrected time behind Gorgeous, a Jeanneau Sun Fast 40. On Sunday, we made good a lot of distance on her under gennaker while Gorgeous was flying a spinnaker. In the last straigth line before the finish, we put in an extra effort, crossing the finnish line about a minute behind her. Because we have a sligthly better rating (83,4 versus 79,5) we knew that on corrected time, the difference would be in the range of seconds. So we were a bit dissapointed to end second also the second day. But our skipper already had a feeling that the results of the second day were not quite correct. The message that the organisation’s computer had crashed only added to that. And indeed, today I found out that we indeed finnished first on corrected time, on Sunday. Because Gorgous dropped from a first place to a fifth place, she also lost her first place overall to us!

And today I got a phone call with the question if i could jump in for someone to participate in the North Sea Regatta onboard a Grand Soleil 45.