Older blog entries for peat (starting at number 11)

    More dreams of databases and lakes.

    Not much to report lately, mostly been fixing some SQL that seems to keep breaking. I have the underlying data ready, so this should end today. I'm kinda sad about that, because I'm starting to get all kinds of ideas about how ecological databases can be used. I have several functions set out already that I want to port from VBA to Postgres, simple things like temperature conversions, oxygen saturation, etc.

    Having these functions inside the db proper makes sense, mostly because they can have utility in large scale data workup (like I'm doing right now). From a design standpoint, however, other functions should not be inside the db proper at all. I'm thinking of some of the more complex functions that can be brought to bear on data subsets. Much of the data we deal with has both spatial and temporal structure, usually both (even if one is only implied), so conventional SQL breaks down for complex calculations. Besides, for profile analysis, Octave or something similar is most appropriate.

    Interfaces to data are also important. I had a really warped idea using infobot or another bot as the basis for an information retrieval interface, albeit a very simplistic one, for a db. SQL is often overkill, not to mention confusing, for simple queries. I can see something like this happening:

      <pete>fish, list tables to me
      <fish>sending /msg to pete
      (I get a list of tables pasted as a /msg)
      <pete>fish, list years for lake 'pete' in temperature table
      <pete>fish, list dates for lake 'pete' for year '1996' in temperature table
      and so on...

    Granted, this is a little contrived, because now the user wanting to drill down further and further in order to get at data, and not using SQL syntax for this is not wise. Also, at this granularity, adding a generic user to a given table and letting said person "play" in the tables (read only, of course) is probably more intelligent. This latter approach is lacking somewhat because it means that only one person can see the data, rather than everyone on channel which was the intent behind the 'fish' infobot mods.

    The other nice thing about this approach is that the bot logs all of its communications, so finding out what people are trying to do (which, of course, almost NEVER matches the spec of the system, 'cuz <cynical>Users Don't Read </cynical> :) provides hints for altering the query model.

    Other data does not lend itself well to being viewed textually, in that this spatial / temporal structure remains hidden until seen graphically. Oxygen and temperature profiles are good examples of this sort of data. I had some initial work done on an interface for profile data, but this was put aside due to lack of time. My recent departure from the Windows/VB world means the opportunity to do this 'properly' (read: reimplementing this using X / OpenGL), and better yet, there are open source examples of distributed data visualization apps I can draw on for this.

    Cool. I can hardly wait. ;)

    ADMiSSeS is mostly recovered now. Woo!

    After having made what in retrospect, turned out to be some pretty silly decisions when reintroducing link data, i managed to miss two fields in my primary data tables. These are fixed now, and IU've learned a fair bit about the way postgres handles date types. In particular, I got thrown when the elephant[1] figured out that some of the data was taken during eastern DAYLIGHT time, and not eastern STANDARD time. heh. mumble mumble 3 am mumble. Oh well, at least the 900 lines of SQL ran without breaking. I really hope to finish the data work up tomorrow.

    Had a neat conversation tonight with Jody and miguel about future directions of gnumeric. There had apparently been discussion about separating the front end (interface) from the back end logic (core functionality) at some point in the future. This would be great for a few reasons...

  1. it would expose core logic behind gnumeric to other code, allowing it to be extended in all kinds of sundry ways. So long as the strict separation of data from code is maintained (unlike much VBA code), it should keep the avenues of exploitation reduced. I have a list of things I want to implement once this is available (most of the SGPL and PLT code, for starters), so I look forward to future developments
  2. Permit a the development of a text-based front end to gnumeric. I can think of a few reasons why a text mode interface would be useful for a spreadsheet - broader possible use (esp. on more mature hardware), for one. The longer I'm in this field , though, the more people I meet who are using specialized peripnerals like eyetrackers, speech synthesizers and especially braille displays. Adding speech synth support to an app is a great idea, save that as an interface it is rather clunky when dealing with complex data (try running festival on math notation, f'rinstnace :). In many cases, braille displays are more useful for intricate work
  3. That said, some functionality exposed via the gui would likely be lost at least initially to text mode users. However, given (1) above, some functions otherwise primarily presented via the gui can be presented via "scripting" (probably the wrong term to use here, but. it's 3 am, and I'm tired. so there. :)

gnumeric already rocks. I can hardly wait to see how it develops from here.

What else... Not much. Too tired to write much more, and hopefully what's already typed is somewhat intelligible. 03:15 hours, waaaay past my bed time.

[1] The elephant being the mascot for postgres, of course

    ADMiSSeS is sick, and that is *not* good. :(

    Found a not-so slight problem with my research database (which I'm using as a prototype for a project I intend to release under the GPL later). Fixing it will likely break the 1 kloc of SQL I've already written to do my thesis calcs, but...

    So far, the main data table's been fixed, but all of the tables linking to it have to be updated. Rather than tempting fate and trying to fix everything at once (while hungry and distracted), I decided to take a few hours off and go downtown. Walking through my usual haunts just north of downtown usually clears my head. Had dinner downtown at a little deli, bought a book of poetry (a first) written by a friend (his first book), then headed up to the grad pub for an hour and read. David O'Meara's Storm Still is a wonderful book, btw (insert shameless plug here).

    Alas, it looks like tonight is a wash for database work. I've had a couple of pints of beer this eve (Hey, do I turn down a friend offering a pint of Guiness?? I think not! :) and coming in the door tonight, my housemate hands me a glass of Jamaican rum.

    So much for productivity.

    Given that DROP TABLE queries don't forgive, I'm off to bed to sleep off this feeling, and perhaps I'll find tomorrow a better and productive day. My copy of Desai's Intro to DBMS awaits.

    Still here, still alive, still kicking.

    Recap: After having dinner with a colleague, I realized why I couldn't finish my thesis: Having followed my supervisor's advice (read: orders) to take his approach to analysing my data, the results were indeterminate. After 3 years and $200K spent on equipment, salaries, etc, and the only answer I could give was 'we can't find an effect using this approach', I finally accepted that I couldn't submit a thesis that could be easily co-opted for political and other ends. Back to the drawing board.

    After my NT box crashed (boom) and took my research db (and two years of my work) with it, I had little choice but to redo all my calculations from scratch using postgres. Relearning everything I thought I knew (from originally using access, natch) is somewhat painful, the kind of 'character building' activity that our parents told us about years ago, but we never truly believed. so far, I've successfully reimported the three years of data from our study catchments, and have ~80% of the data prep queries working properly. The remaining 20% are touchy in that they depend somewhat on the nature of the data returned by the earlier queries, so it's time consuming. On the other hand, I have 1 kloc of beautifully commented SQL. :)

    My biggest criticisms of postgres are the lack of left outer join (tho I know it's coming in the next couple of months) and lack of native crosstabulation capabilities (TRANSFORM / PIVOT predictes), crosstabs being queries having aggregation at both the row and column level. That said, some reasearch and more error and trial later,

    crosstabulation queries are possible !!!

      SELECT lakename,
        COUNT(CASE WHEN year = 1996 THEN depth ELSE NULL END) AS SY1996,
        COUNT(CASE WHEN year = 1997 THEN depth ELSE NULL END) AS SY1997,
        COUNT(CASE WHEN year = 1998 THEN depth ELSE NULL END) AS SY1998
      FROM crosstab
      GROUP BY lakename

      C12a    |     0|     0|     1
      C23a    |     1|     0|     1
      C24a    |     7|     5|     7
      C29a    |     2|     0|     3
      C2a     |     4|     3|     5
      C40a    |    20|    18|    16
      C44a    |    11|     9|    10
      C48b    |     5|     1|     0
      C9b     |    11|    11|     7
      FBP10a  |     5|     8|     3
      FBP9b   |     0|     6|    11
      FP15b   |     7|     9|     5
      FP24a   |     4|     4|     6
      FP27a   |    14|    15|     9
      FP2a    |     5|     5|     3
      FP30a   |     6|     7|     6
      FP31a   |     8|     7|     5
      FP32a   |     5|     5|     4
      N106a   |     4|     4|     2
    Now, granted, this isn't nearly as nice as:

      TRANSFORM count(depth)
      SELECT lakename, year
      FROM crosstab
      GROUP BY lakename
      PIVOT ON year

    ... But even SQL Server 7 doesn't do that either.:)

    Still, it works, at least on this limited scale. I'd hate to have used this approach on some species diversity work I did awhile ago, on a few massive datasets (well, massive by ecology standards :) of 50-200k rows, and about 50-80 different species of interest (usually the species names are used as the column heads, so that means a table with with about 50-80 cols and x rows (one row / plot / date). Something's gonna have to give here, cuz I may have another one to do after I finish. Something in either Perl or Python to build the table. Hmmm. <rubs hands>

    And in the meanwhile, back to our regularly scheduled data work up.

Hi. peat isn't here right now. His two fish, acip and esox are writing this for him as he snores in the other room.

peat spent much of the day with one of the developers from the TINY (Tiny's Independence 'N Yet) Linux effort , working on a good distro for low-power hardware (386 and better). Have a look at TINY - it's a REALLY good idea!

We were both quite happy to see that peat was able to get most of his presentation for the upcoming Linux Expo Ameriques conference ready. (At this rate, he might actually be ready for the presentation before he gives it. Now that would be a swell change!) If you're in Montreal this coming Wednesday, you might even be able to catch him presenting in the Linux in Education track.

In the meanwhile, we'll finish this diary entry here and let him sleep. We'd write more, but the keys are getting rather slippery. What do you expect? We're fish, ya know...

It's been a neat day.

Some time ago, I wrote some fairly simple routines to front end a spreadsheet to a database to simplify some fairly complex calculations. The goal here was to develop a way to rapidly get at info that would otherwise take far too long to calculate manually, or would require a phenomenal handle of SQL. I'm not yet such a person, but that may yet happen.

I found myself re-writing one of the main functions merging disparate data, and dammit, if the rewritten version wasn't a heckuva lot more logical than the first one. Trouble was, tho, that both versions were really, really expensive for disk calls. Two am this morning I realized that I wouldn't get the numbers if I relied on my current machine only. By tonight, it will have crunched 18 k rows of the data set, and there are twice as many records in the set.

A couple of email last night secured some time on a couple of kick-ass boxes locally, and I spent most of the day going from place to place to set up the db and program. End result: I have my 36 k rows crunched, and can finish things up tonight. At this rate, I might even sleep :P I'm going to have to, I can't really work as it stands.

This solution ("More power! Ar! Ar! Ar!") worked fine this time, but this is clearly not workable for the future. The problem with the spreadsheet approach is that it will make repeated calls for a similar - but not identical - subset of data, in no particular order - caching may not be effective in this case. I have to think about this more carefully. After sleep. :)

Another late night. Happy April Fools morning. :)

I finally have an update for the lake hacking thing, and had written two or three paragraphs to describe it. Or, at least I thought I had written two or three paragraphs, until I re-read them. That's usually the sign that I rely on to tell me that it's time for bed.

I'll post something at some point, I promise. In the meanwhile, I'll get some sleep.

pete kernel Wed Mar 29 12:49:40 EST Caffeine level too low. Trying harder.
pete kernel Wed Mar 29 12:49:40 EST Last message repeated 12 times.

Yawn. It's after twelve noon and I'm still trying to wake up. Trying to figure out what to do with the rest of the day, now that I've gotten some of the email out and a few things actually done for a change.

pete kernel Wed Mar 29 12:53:48 EST Making coffee.

Started rereading 'Learning Perl' for the umpteenth time while taking a bath and after copious amounts of port wine last night. I'm not sure whether it was the wine, the bath or the book that kept me up till ... well .... at least 1 am judging by the fact that CBC had already switched to international broadcasts. I guess this is what they mean by 'gaining wisdom' :)

pete kernel Wed Mar 29 13:01:17 EST Coffee is ready. Have a nice DOS.

Ooo. Bonus. Time to get back to work. =)

wizened creature of time
trying to form me to your image
oblivious to possibility
you trade the truth for clarity
your perspective for myopia
unwilling to be reborn
believing the image
others see

look inside that shell
projected and believed
see your youth and strength
well and alive
torch in hand skin against metal
shiny and new
into the breech
and see clear again

lifetimes of memories
why place them on a shelf
you are not finished
despite the reminders
time to learn and share
time to teach
pages are frozen in time
a true legacy lives

Sunday night. 3 am. Another evening that went far too late.

Another chapter coming together, mostly. I now see why TeX works the way it does. If only I'd known about this a few years ago. Oh well, no time like the present. Heh. Scary to think that I can finally do away with most of those despicable word processors. Sadly, most journals in my field wouldn't know a TeX document from a ... um ... you get the idea.

I was approached to give a talk at my alma mater on Linux and Virtual Communities to a course on Cyberspace. Last year, I did the same with a talk comparing Linux and NT for an assembly language course. The irony is particularly delicious: I graduated from the Environmental and Resouce Studies program, and not CS - I didn't take any CS courses during my time there.

I figure I have five days to get the talk together. Fun. So many possibilities. Got to write them down tomorrow. After sleep.

Linux is making inroads into the environmental science field, finally; The ERS program chair mentioned that he'd like a few faculty and students to meet with us and 'chat' informally about what Linux could do for them. I wish I had more than 30 mins. Small school, small classes, great faculty-student ratio, little cash. Lots of potential. Some days I wish either I had a sponsor or I was wealthy enough to go around to places like this and consult / assist on migrating the lab / teaching IT infrastructure to Linux.

Hey, I can dream, right? :)

Weather is starting to get warm again. Been in shorts the last three days, walked a fair bit. 8 km Fri, 9 on Sat, and 3 today. Not stiff either, surprisingly. Metrep looks good for tomorrow. I should walk downtown (~6 km), and around a bit. Maybe time to check Mont Royal's walking trails.

Sleep well. (TTAGGG)n for now.

2 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!