Older blog entries for peat (starting at number 8)

    Still here, still alive, still kicking.

    Recap: After having dinner with a colleague, I realized why I couldn't finish my thesis: Having followed my supervisor's advice (read: orders) to take his approach to analysing my data, the results were indeterminate. After 3 years and $200K spent on equipment, salaries, etc, and the only answer I could give was 'we can't find an effect using this approach', I finally accepted that I couldn't submit a thesis that could be easily co-opted for political and other ends. Back to the drawing board.

    After my NT box crashed (boom) and took my research db (and two years of my work) with it, I had little choice but to redo all my calculations from scratch using postgres. Relearning everything I thought I knew (from originally using access, natch) is somewhat painful, the kind of 'character building' activity that our parents told us about years ago, but we never truly believed. so far, I've successfully reimported the three years of data from our study catchments, and have ~80% of the data prep queries working properly. The remaining 20% are touchy in that they depend somewhat on the nature of the data returned by the earlier queries, so it's time consuming. On the other hand, I have 1 kloc of beautifully commented SQL. :)

    My biggest criticisms of postgres are the lack of left outer join (tho I know it's coming in the next couple of months) and lack of native crosstabulation capabilities (TRANSFORM / PIVOT predictes), crosstabs being queries having aggregation at both the row and column level. That said, some reasearch and more error and trial later,

    crosstabulation queries are possible !!!

      SELECT lakename,
        COUNT(CASE WHEN year = 1996 THEN depth ELSE NULL END) AS SY1996,
        COUNT(CASE WHEN year = 1997 THEN depth ELSE NULL END) AS SY1997,
        COUNT(CASE WHEN year = 1998 THEN depth ELSE NULL END) AS SY1998
      FROM crosstab
      GROUP BY lakename
      ;

      lakename|sy1996|sy1997|sy1998
      --------+------+------+------
      C12a    |     0|     0|     1
      C23a    |     1|     0|     1
      C24a    |     7|     5|     7
      C29a    |     2|     0|     3
      C2a     |     4|     3|     5
      C40a    |    20|    18|    16
      C44a    |    11|     9|    10
      C48b    |     5|     1|     0
      C9b     |    11|    11|     7
      FBP10a  |     5|     8|     3
      FBP9b   |     0|     6|    11
      FP15b   |     7|     9|     5
      FP24a   |     4|     4|     6
      FP27a   |    14|    15|     9
      FP2a    |     5|     5|     3
      FP30a   |     6|     7|     6
      FP31a   |     8|     7|     5
      FP32a   |     5|     5|     4
      N106a   |     4|     4|     2
      ...
      
    Now, granted, this isn't nearly as nice as:

      TRANSFORM count(depth)
      SELECT lakename, year
      FROM crosstab
      GROUP BY lakename
      PIVOT ON year
      ;

    ... But even SQL Server 7 doesn't do that either.:)

    Still, it works, at least on this limited scale. I'd hate to have used this approach on some species diversity work I did awhile ago, on a few massive datasets (well, massive by ecology standards :) of 50-200k rows, and about 50-80 different species of interest (usually the species names are used as the column heads, so that means a table with with about 50-80 cols and x rows (one row / plot / date). Something's gonna have to give here, cuz I may have another one to do after I finish. Something in either Perl or Python to build the table. Hmmm. <rubs hands>

    And in the meanwhile, back to our regularly scheduled data work up.

Hi. peat isn't here right now. His two fish, acip and esox are writing this for him as he snores in the other room.

peat spent much of the day with one of the developers from the TINY (Tiny's Independence 'N Yet) Linux effort , working on a good distro for low-power hardware (386 and better). Have a look at TINY - it's a REALLY good idea!

We were both quite happy to see that peat was able to get most of his presentation for the upcoming Linux Expo Ameriques conference ready. (At this rate, he might actually be ready for the presentation before he gives it. Now that would be a swell change!) If you're in Montreal this coming Wednesday, you might even be able to catch him presenting in the Linux in Education track.

In the meanwhile, we'll finish this diary entry here and let him sleep. We'd write more, but the keys are getting rather slippery. What do you expect? We're fish, ya know...

It's been a neat day.

Some time ago, I wrote some fairly simple routines to front end a spreadsheet to a database to simplify some fairly complex calculations. The goal here was to develop a way to rapidly get at info that would otherwise take far too long to calculate manually, or would require a phenomenal handle of SQL. I'm not yet such a person, but that may yet happen.

I found myself re-writing one of the main functions merging disparate data, and dammit, if the rewritten version wasn't a heckuva lot more logical than the first one. Trouble was, tho, that both versions were really, really expensive for disk calls. Two am this morning I realized that I wouldn't get the numbers if I relied on my current machine only. By tonight, it will have crunched 18 k rows of the data set, and there are twice as many records in the set.

A couple of email last night secured some time on a couple of kick-ass boxes locally, and I spent most of the day going from place to place to set up the db and program. End result: I have my 36 k rows crunched, and can finish things up tonight. At this rate, I might even sleep :P I'm going to have to, I can't really work as it stands.

This solution ("More power! Ar! Ar! Ar!") worked fine this time, but this is clearly not workable for the future. The problem with the spreadsheet approach is that it will make repeated calls for a similar - but not identical - subset of data, in no particular order - caching may not be effective in this case. I have to think about this more carefully. After sleep. :)

Another late night. Happy April Fools morning. :)

I finally have an update for the lake hacking thing, and had written two or three paragraphs to describe it. Or, at least I thought I had written two or three paragraphs, until I re-read them. That's usually the sign that I rely on to tell me that it's time for bed.

I'll post something at some point, I promise. In the meanwhile, I'll get some sleep.

pete kernel Wed Mar 29 12:49:40 EST Caffeine level too low. Trying harder.
pete kernel Wed Mar 29 12:49:40 EST Last message repeated 12 times.

Yawn. It's after twelve noon and I'm still trying to wake up. Trying to figure out what to do with the rest of the day, now that I've gotten some of the email out and a few things actually done for a change.

pete kernel Wed Mar 29 12:53:48 EST Making coffee.

Started rereading 'Learning Perl' for the umpteenth time while taking a bath and after copious amounts of port wine last night. I'm not sure whether it was the wine, the bath or the book that kept me up till ... well .... at least 1 am judging by the fact that CBC had already switched to international broadcasts. I guess this is what they mean by 'gaining wisdom' :)

pete kernel Wed Mar 29 13:01:17 EST Coffee is ready. Have a nice DOS.

Ooo. Bonus. Time to get back to work. =)

wizened creature of time
trying to form me to your image
oblivious to possibility
you trade the truth for clarity
your perspective for myopia
unwilling to be reborn
believing the image
others see

look inside that shell
projected and believed
see your youth and strength
well and alive
torch in hand skin against metal
shiny and new
into the breech
and see clear again

lifetimes of memories
why place them on a shelf
you are not finished
despite the reminders
time to learn and share
time to teach
pages are frozen in time
a true legacy lives

Sunday night. 3 am. Another evening that went far too late.

Another chapter coming together, mostly. I now see why TeX works the way it does. If only I'd known about this a few years ago. Oh well, no time like the present. Heh. Scary to think that I can finally do away with most of those despicable word processors. Sadly, most journals in my field wouldn't know a TeX document from a ... um ... you get the idea.

I was approached to give a talk at my alma mater on Linux and Virtual Communities to a course on Cyberspace. Last year, I did the same with a talk comparing Linux and NT for an assembly language course. The irony is particularly delicious: I graduated from the Environmental and Resouce Studies program, and not CS - I didn't take any CS courses during my time there.

I figure I have five days to get the talk together. Fun. So many possibilities. Got to write them down tomorrow. After sleep.

Linux is making inroads into the environmental science field, finally; The ERS program chair mentioned that he'd like a few faculty and students to meet with us and 'chat' informally about what Linux could do for them. I wish I had more than 30 mins. Small school, small classes, great faculty-student ratio, little cash. Lots of potential. Some days I wish either I had a sponsor or I was wealthy enough to go around to places like this and consult / assist on migrating the lab / teaching IT infrastructure to Linux.

Hey, I can dream, right? :)

Weather is starting to get warm again. Been in shorts the last three days, walked a fair bit. 8 km Fri, 9 on Sat, and 3 today. Not stiff either, surprisingly. Metrep looks good for tomorrow. I should walk downtown (~6 km), and around a bit. Maybe time to check Mont Royal's walking trails.

Sleep well. (TTAGGG)n for now.

Woof.

It's been a really busy few days here. I've mostly been hanging out with a colleague (Lyta) and watching her team play in the National Goalball championships. Neat game.

After my conversations with Lyta, it's becoming clearer to me that I'm going to have to learn more about docbook and sgml in general. There's so much info out there, and it seems to me that as more of it becomes marked up in that way, it should be easier to get that information out into formats more amenable to allowing the content to be presented in different ways (seen / heard / felt).

While I'm at it, I guess I should add php to that list. After reading the discussion on the OSWG web site and mailing list, it's finally getting through my small brain that the relationship between content and presentation can be very tenuous indeed. I wonder how hard it would be to take a freshmeat-like site and set up two or three different themes, say 'eye-popping' ,'nice' and 'text'. hm.

Workwise, it's been slow (a change is good, tho) ... I lost mail for a couple of days (translation: I'm coming to hate NT more and more on a daily basis). The gateway is now up, however, thanks to yugami, vicman, ales and TomG; and though I still have to tighten some rules, I think things are in decent shape for the time being. dsl is nice. Better yet, I'm finally using esox for things like web browsing. I can hardly wait to convert acip (dual p200 box currently running en tee) to linux. I suspect it'll be a much happier fish that way.

Speaking of work, it's midnight already, and I have a report to write before I can go to bed. Fare thee well.

Thesis paper written up, given to supervisor. Need to assemble rest of doc (title page, main intro, etc). It gets handed in soon.

Yay!

I finally have time to learn C, perl, python and php! :)

First job is to port the SGPL code from some previous work to Gnumeric.

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!