15 Jan 2001 (updated 24 Sep 2006 at 03:46 UTC) »
Hey, I'm doing my news via blogger now, come check it out at blog.sness.net
I've moved again:
You can also checkout my webpage at: www.sness.net
1 Nov 2000 (updated 2 Nov 2000 at 00:30 UTC) »
Holy, long time since I've written. Time to do some more writing of our current projectscode
Working on FPC a bit still, but mainly working on our new project seed. More information to follow
Cool! I just made a program way way way faster! This is FPC, specifially reading in a .fpc database with markers in it. It used to take a hour and a half to read in a database on a Sun, and 20 minutes on a 1GHz AMD-K6. Now it takes about 1 minute! I've been working on this for the past three days, the first day was just getting my bearings, yesterday I tried another approach that still did the adding in O(N^2), even though I thought it would be lots faster, it wasn't, and I was sad. Last night, I must have figured it out in my dream, cos' when I woke up, I knew the answer. It only took the morning to do the code, and now it's O(N) and way faster. Cool!
Whew code is going really well again today, made a first test version of a simple band identification routine. Really simple, yet it works very well, almost as well as Bandleader does... I've put up a snapshot. The algorithm is from a small part of Bandleader, the algorithm there is:
% Function to find all local peaks in a vector. A local peak isdefined % as a value x(n) satisfying x(n-2) < x(n-1) < x(n) > x(n+1) > x(n+2), % and for which the peak amplitude is greater than twice the local mean, % where the local mean is taken over +/- m samples.
Got a cute little quickcam, updates every now and then at snesscam. Check it out.
Added buttons to zoom into a gel image and also a menu button to select the different kinds of filtering. Tomorrow will either improve the graphing code or integrate the stuff in olddisp.c to do the loading of all gelfiles into geldisp.c which is my current development target. Starting to rethink doing this all in C, I think I might add classes and do some of it in C++, but I'm a little nervous about integrating this with the acedb graphics libraries. Maybe I'll just do things with very detailed structs. email me if you have an opinion. Also, let me know how you like geldisp.c, comments are welcome. It's been a few months since I've done hardcore coding, it's mostly been bug fixes, and it always takes me a bit to get back into the swing of coding.
Whoo hoo! Got the pixel removal working perfectly! Also, added the old averaging function in, but cleaned up the code and made it easier to understand. Here's a picture of what it looks like now.The first column is what FPC used to do, with averaging all the 5 pixels. The second column is that data stretched out, which is already a big improvement for humans looking at the data, and the third lane is the data cleaned up by my neighbourhood cleanup routine. The fourth lane is the actual data in the gel file.
Having really good coding days lately. On Monday I figured out the gel format, it's actually 5 pixels wide, and FPC was just averaging all the pixels to display in each lane. This was fine, except that there is a problem with dust particles in the gel which come out as very high intensity peaks, when these are averaged with their low intensity neighbours, you get artifacts, bands that are a single pixel high, and make it hard to see what is going on.more codeFirst fix was just to spread these pixels out across the image, which is a big improvement, this took 15 lines of code down to 3 lines (two "for" statements and one line that did the work. :)
Second fix was more interesting and is almost done, to do post processing and remove the dust peaks. I'm looking at the neighbouring points and deciding if one peak is too intense, if it is, remove it.
Also yesterday my boss needed a little program to do the Fisher Exact test, so I learned what it was and coded it yesterday. Lots of math, but mostly trying to find efficient ways of doing division of big factorials. I came up with one solution, but as you can guess, numbers can get pretty big when you're doing 100! (that's the factorial notation) If you know a good way to code division of large factorials, let me know at sness@sness.net. Anyways, it's a very interesting little test, and gives you exact probabilities of certain combinations of two categorical variables (usually expressed in a 2x2 table). Cooler than the Chi-squared test, since it gives you an exact probability. It does this by enumerating all possible combinations of matrices that add up to the same column and row totals of your desired variables. Coool... Mmmm, feels great to write code...
Props out to the dudes at Stormix, the new site looks totally kickin' totally May/00, very trippy with the new javascript changing colors and the new design! And they're just finishing the alpha of version 2, which kicks ass, I can't wait to see it! So so happy to hear that they're using my hardware database code in the new release, I think it should be much easier to maintain than the old code. Shouts out to my homies, now they just need to sell the Stormix underwear "I've got a Storm in my pants". :) Yeah, Stormix kicks ass, it was a really fun time and I learned a lot of low-level programming there. Now I'm back to the high-level object oriented scientific code, quite a bit of a shift in perspective!
code
<RANT MODE ON>Well, Hans has a good handle on it, I gave him some code I whipped up a few weeks ago to use optind, so things should be much happier in the house now :)Spent the morning examining and suggesting fixes for some spectaularily badly written C code. No one here wrote it, it's some old phylogenetics code that does some really crazy and bad things. Made my head hurt! Brought down our network yesterday, no fault of anyone at the Genome Centre, as we now see, it's just this old code that is causing the problems, it's interactive prompt driven code that really really sucks. There are like 30 .c files in one directory that repeat the same file opening function over and over, instead of putting it in a library. Things like not checking gets() (bad anyways!) for return value, not checking strcpy, and using string[0]=="\0" for error checking! It gets called from cgi-bin, so the problems are magnified 100 fold...
<RANT MODE OFF>
FPC
Weirdness with Image file formats, so I'm writing a gel viewer, this is going fairly slowly, finding it hard to switch back into "coding mode" from "theoretical mode", but that's one of the really exciting things about this project, the cool mix of theory and coding. I can't really make my own standalone program, since FPC has everything I need, including great visualization, and FPC is *almost* there. Pretty amazing what it can do actually, to look at the data, it's pretty intense. Coding is going better and better, so next week will be full on coding mode! Whoo hoo!
Making a gel image viewer app. This is helping me to learn the acedb graphics library more, and will be useful in visualizing the actual gel files that FPC uses. Going to put some cool visualizations into it as well, like the graph that Image makes. This will all be very useful for my June trip to St. Louis, where we'll be cleaning up the Image source code for open source release!
This is very exciting, because Image is a really great program, it's fast, it looks cool and it WORKS. However, right now, it's closed source, and this makes integrating code like BandLeader impossible. This is another big win for open source, IMHO. And I'm excited to be part of it.
FOAF updates: Trust rankings are now exported, making the data available to other users and websites. An external FOAF URI has been added, allowing users to link to an additional FOAF file.
Keep up with the latest Advogato features by reading the Advogato status blog.
If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!