Older blog entries for peat (starting at number 19)

Quote of the day...
"Neil, it's not Project Thingy's policy to hire livestock ... but we felt like you could use a sidekick." -- Project Thingy

Don't ask me why I found that so funny.

Thesis, thesis, thesis

    Spent the last w'end (Canadian Thanksgiving) working on the sum of my existance for the last four years. My introduction to LaTeX and other documentation systems is going swimingly (as in I'm still managing to keep my head above water. mostly). Am glad to report that I am about to work on the last table for the document tonight, perhaps finishing it. TODO: add graphics, rework citations. The more I use latex, the more I like it. Granted, the learning curve is substantial and it helps to have good documentation (and colleagues online who can answer questions). That said, with time and a bit of forethought, I'm starting to be fairly well equipped to start on the phd after this is done (yes, even after all of this shit).

    That pretty much all of the news for now. I'm catching up on my sleep, er, reading rfcs and other neat docs. :) I'm learning a lot about the more interesting aspects of system administration; I hope to get my hands really dirty in a coding project soon (or many, for that matter). I've been watching with great interest how some acquaintances have set up personal, db-based web sites. Pretty impressive, I must say. I can see some ways in which a lot more could be done tho, and I've started sketching out what I'd like to see in a more focused site. Stay tuned.

    Back to thesis.

Oh, but before I do...

    And in other news, said credit card company bozos have decided to send me a platinum card application. Maybe if I write them back something really imaginative ("addressee used as stunt double for polar bear scenes on Chilly Beach" or something) they'll send me one for a titanium card.

    yeah. right. :)

    There is this one credit card company here in Montreal (the name isn't important, and besides, why would I give these bozos free publicity) that keeps mailing me a credit card application. And every time I get the "invite," I reply with a creative note on the envelope saying something like...

      return to sender: addressee deceased after being attacked by large gerbils and used as ink blotter fodder


      return to sender: addressee had a sex change, moved to Northern California and opened a health food store


      return to sender: addressee deceased after tricycle accident and was subsequently trodden over by a band of wild elephants, and now resides in the National Gallery as a really cool wall hanging.

    And yet, I still get this crap in the mail. Trouble is that I still don't know whether they keep sending because they haven't gotten the hint, or they just want to see what I'll write next.

    Enough fun. Back to work.

    It's been some time since my last diary "entry" (read: the fast, amazing, awesome, "dependable" connection went down for a few hours and I lost what I was working on), so I should be a bit more detailed about the goings on here.

    Been spending most time hardware hacking. Not building hardware, of course, mostly fixing things for other people. My hardware collection has grown a lot in the process. Seriously, I have to do something, as my apartment is rapidly becoming an ungodly mess. I'm slowly finding homes for gear, so if there is anything you're looking for, let me know.

    The review of the blinux kernel mods for speech synth haven't been going as quickly as I had hoped, but at least progress is being made. I'd like to see more people reviewing it, because although I really want to see this in the kernel, but only once it has 'matured' enough to not break kernel design principles (which it currently does).

    Freebasin has gone into a holding pattern pending thesis completion. The crash of the NT box didn't help things any, but at least I was able to move the data from an access db into postgres. Next stage will be the post-process differential correction of the position data, and merging what is promising to be one of the largest ecological databases going. :) This should be fun.

    OODS is still going well, lots of planning is happening. Normally, I would find this really unsettling, save that I know the people involved, and I know they will make this happen. Besides, I've learned so much about design and abstraction ... two aspects I would have liked to have learned in school, save that I'm getting a masters in ecology and not computer science.

    OODS is primarily about open source directory services. I would have figured this to be really dull, boring and with few practical applications outside of system management. However, thanks to Frank, I can see just how broad the 'directory services' category is and just how relevant it is to, say, ecology. Scary.

    What else. Finally saw X-Men and Art of War. Nice surprise, neither particularly sucked, I was really glad to see someone doing Wolverine 'right.' The other characters seemed to fit as well. Nice to see. As for AoW, homegrown movie it is, it looked decent as well, tho I expect to go see it again (missed some of the local 'sights' the first time around). Hopefully the trend will continue with Highlander?

    If it sucks, tho, at least you know you can blame someone.

    Well, that's about it for now. Not much to see here. These are not the droids you're looking for...

what is tv?
    Still thinking about some comments made by schoen (Software as science)

    I grew up reading very enthusiastic accounts of the work of very idealistic scientists, who mostly believed that they were working on a shared enterprise which by right belonged to all of humanity. The cool thing is that there have actually been a lot of scientists who believed that, and who lived that way.

    There have been great strides in the hard sciences and many of the natural sciences (geography, biostatistics, bioinformatics) thanks chiefly to the intuition of the researchers and the availability of the technology capable of carrying out the tasks required.

    From what I've seen, technology in the sciences is still by and large captive tech. Most large instrumentation requires software using undocumented interface specs. Sure, it may use a standard DB9 RS232 connector, but good luck finding the command set documented anywhere without an NDA or a substantial amount of cash (or both). Unfortunately, even among the scientists I know, one in particular insists on keeping his computer model private; this precludes peer review and slows down development. Others will typically release binaries for Win32 (most ecologists seem to be Win32-dependant) but none of the source.

    There have been some tremendous exceptions (examples here and elsewhere), in and out of ecology - which is great. As long as practicing scientists work within the spirit of open source, I think we may just see ecology explode just as the computing world did as more and more people embraced open source.

    The younger crowd of grad students seem more inclined to use Linux and free software, so there is hope. Unfortunately, the general technical literacy level is low enough to worry me. We go to schools, colleges, universities to learn to think critically, to analyse, to delve and write. Courses in methodologies, critical thought, and the like, are offered. As important as these are, they don't seem to go far enough; but in most biology departments I've been to, computer courses appear to be too far 'away' from the actual science at hand. And hell, I don't want to take a course on learning how to use Word, thanks.

    Perhaps the largest impediment to 'free science' is communication; most scientific journals are increasingly expensive. At up to $3000 per institutional subscription, it doesn't take long before many libraries carry fewer journals. As these costs increase, the next logical step is to start an "open source, peer review online journal"; indeed, this has already started. Unfortunately, they're not all that print friendly (granted, they're easier for a braille terminal to read, compared to a pdf). I think it will take numerous far-sighted individuals to pull off an SGML-based, open source journal; but I also think it will happen soon enough.

    Back to analysing those pesky data.

    Hm. Almost three weeks since the last entry. I've been checking in every so often before going to bed lately (and unfortunately usually too tired or too uncoordinated to write anything resembling a coherent thought), and noticed the discussion thread on the similarities between open source and science.

    Over the last few years, I've seen great examples of how some forward-looking people are working together, inside and outside of the university environment, to do better research and make a difference. Sadly, I've also found these people to be few and far between. Unfortunately, I've seen too many instances where a PI or possible collaborator would actively try to squelch fruitful discussions among grad students, post-docs, other profs, etc. The primary goal thus far seems to be focused on getting as many good publications out there as primary author as quickly and as often as possible. Although it's a nice ego boost, the primary reason for keeping a high publication rate is primary financial (okay, prestige is there as well, let's be honest :) Maintaining this "competative advantage" often appears to be the unwritten standing order, and this is especially seen in the infighting between people on large projects.

    There also tends to be a closed-mindedness particularly about technology and how it can impact on the way science is "done," for lack of a better term. Frankly, I don't see ecology / environmental science as having a data problem as much as it has a problem of lack of data organization, in such a way as to make it:

    • easy to submit data for inclusion and prior verification (this is a big one)
    • being able to ensure effective access to the data
    • being able to ensure effective USE of the data

    I've done some work on this already, and have had good results - won't go too far into details; it would be boring and I need to finish this thesis. Rest assured, tho, that these will be "published" later under the GPL - some is already available at the SGPL project site, more will follow (and the much needed porting will happen soon, any gnumeric hackers out there? :).

    Open source has opened up some tremendous potential for science. Perhaps the biggest contribution though, is to start getting scientists to be thinking in the "Unix" frame of mind, or at least gaining an appreciation of the Unix philosophy - copious small, specialized and reusable tools rather than few large applications. I can't speak for anyone else, but thanks to some people, I've come to see a raft of new possibilities that only a few years ago I couldn't even dream of. The key to this ephiphany was not to feel that I needed to create new software or programs but rather to look carefully at how existing software CAN be put to different or interesting uses...

    • using a spreadsheet as an effective interface to a data source for complex, focused calculations
    • using a web server as an efficient tool for data analysis and visualization
    • using a search engine as a personal cataloging system for online journal articles
    • using repository and good markup techniques to facilitate keeping local lab and study documentation up to date.

    The latter is usually an underappreciated and undervalued aspect of any endeavor, scientific or otherwise, and I've gained a lot of respect for those people or groups working on good docs.

    Even with all of this great open source software available, there is a still a very considerable price to pay for gaining this perspective. Pretty much everyone I know working with Linux and Unix in general for their ecological research feels pretty isolated because picking up *nix means that they no longer have any peers in a research world dominated by Win- and Mac-users. The energy (well spent, admittedly) in climbing the learning curve means that many in this situation (myself included) are perceived as being more interested with the technology than in doing science.

    Interestingly, in our case, we can easily work around this lack-of-peer-support problem by using that venerable geek tool - IRC - to maintain and develop our virtual peer group. Not only does this bring together some pretty competent *nix folk, but we get the added benefit of working in a very diverse community of researchers, and a place to talk with others about research and possible collaborations.

    Moral of the story: Hug an ecolgical *nix geek today. :)

    From the ongoing-saga-of-a-quack-gone-to-the-dogs dept:

      Finished the data prep work, at least enough to get data to analyse. 1058 lines of sql, three weeks or so of learning "proper" use of the query language. Making a lot of mistakes in the process (but hey, they're *my* mistakes :) All in all, things are well.

      Tomorrow, esox will get its installation of R upgraded at last, and I'll finally install grace as well, and start playing.

      In the meanwhile, it's way past my bedtime.

      Gamble of the ages. Suit me up, I'm ready to go. - Tom Cochrane

    More dreams of databases and lakes.

    Not much to report lately, mostly been fixing some SQL that seems to keep breaking. I have the underlying data ready, so this should end today. I'm kinda sad about that, because I'm starting to get all kinds of ideas about how ecological databases can be used. I have several functions set out already that I want to port from VBA to Postgres, simple things like temperature conversions, oxygen saturation, etc.

    Having these functions inside the db proper makes sense, mostly because they can have utility in large scale data workup (like I'm doing right now). From a design standpoint, however, other functions should not be inside the db proper at all. I'm thinking of some of the more complex functions that can be brought to bear on data subsets. Much of the data we deal with has both spatial and temporal structure, usually both (even if one is only implied), so conventional SQL breaks down for complex calculations. Besides, for profile analysis, Octave or something similar is most appropriate.

    Interfaces to data are also important. I had a really warped idea using infobot or another bot as the basis for an information retrieval interface, albeit a very simplistic one, for a db. SQL is often overkill, not to mention confusing, for simple queries. I can see something like this happening:

      <pete>fish, list tables to me
      <fish>sending /msg to pete
      (I get a list of tables pasted as a /msg)
      <pete>fish, list years for lake 'pete' in temperature table
      <pete>fish, list dates for lake 'pete' for year '1996' in temperature table
      and so on...

    Granted, this is a little contrived, because now the user wanting to drill down further and further in order to get at data, and not using SQL syntax for this is not wise. Also, at this granularity, adding a generic user to a given table and letting said person "play" in the tables (read only, of course) is probably more intelligent. This latter approach is lacking somewhat because it means that only one person can see the data, rather than everyone on channel which was the intent behind the 'fish' infobot mods.

    The other nice thing about this approach is that the bot logs all of its communications, so finding out what people are trying to do (which, of course, almost NEVER matches the spec of the system, 'cuz <cynical>Users Don't Read </cynical> :) provides hints for altering the query model.

    Other data does not lend itself well to being viewed textually, in that this spatial / temporal structure remains hidden until seen graphically. Oxygen and temperature profiles are good examples of this sort of data. I had some initial work done on an interface for profile data, but this was put aside due to lack of time. My recent departure from the Windows/VB world means the opportunity to do this 'properly' (read: reimplementing this using X / OpenGL), and better yet, there are open source examples of distributed data visualization apps I can draw on for this.

    Cool. I can hardly wait. ;)

    ADMiSSeS is mostly recovered now. Woo!

    After having made what in retrospect, turned out to be some pretty silly decisions when reintroducing link data, i managed to miss two fields in my primary data tables. These are fixed now, and IU've learned a fair bit about the way postgres handles date types. In particular, I got thrown when the elephant[1] figured out that some of the data was taken during eastern DAYLIGHT time, and not eastern STANDARD time. heh. mumble mumble 3 am mumble. Oh well, at least the 900 lines of SQL ran without breaking. I really hope to finish the data work up tomorrow.

    Had a neat conversation tonight with Jody and miguel about future directions of gnumeric. There had apparently been discussion about separating the front end (interface) from the back end logic (core functionality) at some point in the future. This would be great for a few reasons...

  1. it would expose core logic behind gnumeric to other code, allowing it to be extended in all kinds of sundry ways. So long as the strict separation of data from code is maintained (unlike much VBA code), it should keep the avenues of exploitation reduced. I have a list of things I want to implement once this is available (most of the SGPL and PLT code, for starters), so I look forward to future developments
  2. Permit a the development of a text-based front end to gnumeric. I can think of a few reasons why a text mode interface would be useful for a spreadsheet - broader possible use (esp. on more mature hardware), for one. The longer I'm in this field , though, the more people I meet who are using specialized peripnerals like eyetrackers, speech synthesizers and especially braille displays. Adding speech synth support to an app is a great idea, save that as an interface it is rather clunky when dealing with complex data (try running festival on math notation, f'rinstnace :). In many cases, braille displays are more useful for intricate work
  3. That said, some functionality exposed via the gui would likely be lost at least initially to text mode users. However, given (1) above, some functions otherwise primarily presented via the gui can be presented via "scripting" (probably the wrong term to use here, but. it's 3 am, and I'm tired. so there. :)

gnumeric already rocks. I can hardly wait to see how it develops from here.

What else... Not much. Too tired to write much more, and hopefully what's already typed is somewhat intelligible. 03:15 hours, waaaay past my bed time.

[1] The elephant being the mascot for postgres, of course

10 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!