Older blog entries for peat (starting at number 20)

Eeeps.

Where *does* time go?

I'm finally taking the time to sit down and finally learn how go code in C. Funny what being laid off does. I'm reading the ORA make book and getting all kinds of ideas on how I could use it to make custon CV versions... Then again, maybe it's time to get off my arse and get my own thing going.

Beyond hawking the CV over yon and sundry, I'm working on two small projects, mostly to get up to speed with C and Python.

The C project is being done with a friend and aims to be the penultimate program for unit conversion. Goals for 1.0 are full implementation of MKS (meter - kilogram - second) conversions. Once this is done, I would like to see changes to 2.0 dealing with full localization so that i18n is indeed possible.

The Python project is more immediate, as it is really to export raw data from my thesis into a decent TeX layout so that I can append it to the thesis. I have the pgsql links working already, but am a bit stuck on the crosstab query generator. Next step afterwards is generating a CRC32 checksum across rows and columns so that errors can be detected during OCR if and when someone decides to do that some time down the line.

This latter part seems like a lot of trouble for something that may not be useful, but after my experiences doing an OCR import of someone else's copious quantities of thesis data, I'm bound and determined to do better.

Speaking of the Never-Ending Thesis Saga, I did indeed finally get my revisions back (after two months). It gets better tho, according to He Who Must Sign Off, "One last set of revisions and it will be okay" (his words). Wow. That's the first time I've ever heard him say that term. Last. Could this actually be the end of the Never-Ending Thesis Saga? We'll see...

Quote of the day...
"Neil, it's not Project Thingy's policy to hire livestock ... but we felt like you could use a sidekick." -- Project Thingy

Don't ask me why I found that so funny.

Thesis, thesis, thesis

    Spent the last w'end (Canadian Thanksgiving) working on the sum of my existance for the last four years. My introduction to LaTeX and other documentation systems is going swimingly (as in I'm still managing to keep my head above water. mostly). Am glad to report that I am about to work on the last table for the document tonight, perhaps finishing it. TODO: add graphics, rework citations. The more I use latex, the more I like it. Granted, the learning curve is substantial and it helps to have good documentation (and colleagues online who can answer questions). That said, with time and a bit of forethought, I'm starting to be fairly well equipped to start on the phd after this is done (yes, even after all of this shit).

    That pretty much all of the news for now. I'm catching up on my sleep, er, reading rfcs and other neat docs. :) I'm learning a lot about the more interesting aspects of system administration; I hope to get my hands really dirty in a coding project soon (or many, for that matter). I've been watching with great interest how some acquaintances have set up personal, db-based web sites. Pretty impressive, I must say. I can see some ways in which a lot more could be done tho, and I've started sketching out what I'd like to see in a more focused site. Stay tuned.

    Back to thesis.

Oh, but before I do...

    And in other news, said credit card company bozos have decided to send me a platinum card application. Maybe if I write them back something really imaginative ("addressee used as stunt double for polar bear scenes on Chilly Beach" or something) they'll send me one for a titanium card.

    yeah. right. :)

    There is this one credit card company here in Montreal (the name isn't important, and besides, why would I give these bozos free publicity) that keeps mailing me a credit card application. And every time I get the "invite," I reply with a creative note on the envelope saying something like...

      return to sender: addressee deceased after being attacked by large gerbils and used as ink blotter fodder

    or...

      return to sender: addressee had a sex change, moved to Northern California and opened a health food store

    or...

      return to sender: addressee deceased after tricycle accident and was subsequently trodden over by a band of wild elephants, and now resides in the National Gallery as a really cool wall hanging.

    And yet, I still get this crap in the mail. Trouble is that I still don't know whether they keep sending because they haven't gotten the hint, or they just want to see what I'll write next.

    Enough fun. Back to work.

    It's been some time since my last diary "entry" (read: the fast, amazing, awesome, "dependable" connection went down for a few hours and I lost what I was working on), so I should be a bit more detailed about the goings on here.

    Been spending most time hardware hacking. Not building hardware, of course, mostly fixing things for other people. My hardware collection has grown a lot in the process. Seriously, I have to do something, as my apartment is rapidly becoming an ungodly mess. I'm slowly finding homes for gear, so if there is anything you're looking for, let me know.

    The review of the blinux kernel mods for speech synth haven't been going as quickly as I had hoped, but at least progress is being made. I'd like to see more people reviewing it, because although I really want to see this in the kernel, but only once it has 'matured' enough to not break kernel design principles (which it currently does).

    Freebasin has gone into a holding pattern pending thesis completion. The crash of the NT box didn't help things any, but at least I was able to move the data from an access db into postgres. Next stage will be the post-process differential correction of the position data, and merging what is promising to be one of the largest ecological databases going. :) This should be fun.

    OODS is still going well, lots of planning is happening. Normally, I would find this really unsettling, save that I know the people involved, and I know they will make this happen. Besides, I've learned so much about design and abstraction ... two aspects I would have liked to have learned in school, save that I'm getting a masters in ecology and not computer science.

    OODS is primarily about open source directory services. I would have figured this to be really dull, boring and with few practical applications outside of system management. However, thanks to Frank, I can see just how broad the 'directory services' category is and just how relevant it is to, say, ecology. Scary.

    What else. Finally saw X-Men and Art of War. Nice surprise, neither particularly sucked, I was really glad to see someone doing Wolverine 'right.' The other characters seemed to fit as well. Nice to see. As for AoW, homegrown movie it is, it looked decent as well, tho I expect to go see it again (missed some of the local 'sights' the first time around). Hopefully the trend will continue with Highlander?

    If it sucks, tho, at least you know you can blame someone.

    Well, that's about it for now. Not much to see here. These are not the droids you're looking for...

what is tv?
    Still thinking about some comments made by schoen (Software as science)

    I grew up reading very enthusiastic accounts of the work of very idealistic scientists, who mostly believed that they were working on a shared enterprise which by right belonged to all of humanity. The cool thing is that there have actually been a lot of scientists who believed that, and who lived that way.

    There have been great strides in the hard sciences and many of the natural sciences (geography, biostatistics, bioinformatics) thanks chiefly to the intuition of the researchers and the availability of the technology capable of carrying out the tasks required.

    From what I've seen, technology in the sciences is still by and large captive tech. Most large instrumentation requires software using undocumented interface specs. Sure, it may use a standard DB9 RS232 connector, but good luck finding the command set documented anywhere without an NDA or a substantial amount of cash (or both). Unfortunately, even among the scientists I know, one in particular insists on keeping his computer model private; this precludes peer review and slows down development. Others will typically release binaries for Win32 (most ecologists seem to be Win32-dependant) but none of the source.

    There have been some tremendous exceptions (examples here and elsewhere), in and out of ecology - which is great. As long as practicing scientists work within the spirit of open source, I think we may just see ecology explode just as the computing world did as more and more people embraced open source.

    The younger crowd of grad students seem more inclined to use Linux and free software, so there is hope. Unfortunately, the general technical literacy level is low enough to worry me. We go to schools, colleges, universities to learn to think critically, to analyse, to delve and write. Courses in methodologies, critical thought, and the like, are offered. As important as these are, they don't seem to go far enough; but in most biology departments I've been to, computer courses appear to be too far 'away' from the actual science at hand. And hell, I don't want to take a course on learning how to use Word, thanks.

    Perhaps the largest impediment to 'free science' is communication; most scientific journals are increasingly expensive. At up to $3000 per institutional subscription, it doesn't take long before many libraries carry fewer journals. As these costs increase, the next logical step is to start an "open source, peer review online journal"; indeed, this has already started. Unfortunately, they're not all that print friendly (granted, they're easier for a braille terminal to read, compared to a pdf). I think it will take numerous far-sighted individuals to pull off an SGML-based, open source journal; but I also think it will happen soon enough.

    Back to analysing those pesky data.

    Hm. Almost three weeks since the last entry. I've been checking in every so often before going to bed lately (and unfortunately usually too tired or too uncoordinated to write anything resembling a coherent thought), and noticed the discussion thread on the similarities between open source and science.

    Over the last few years, I've seen great examples of how some forward-looking people are working together, inside and outside of the university environment, to do better research and make a difference. Sadly, I've also found these people to be few and far between. Unfortunately, I've seen too many instances where a PI or possible collaborator would actively try to squelch fruitful discussions among grad students, post-docs, other profs, etc. The primary goal thus far seems to be focused on getting as many good publications out there as primary author as quickly and as often as possible. Although it's a nice ego boost, the primary reason for keeping a high publication rate is primary financial (okay, prestige is there as well, let's be honest :) Maintaining this "competative advantage" often appears to be the unwritten standing order, and this is especially seen in the infighting between people on large projects.

    There also tends to be a closed-mindedness particularly about technology and how it can impact on the way science is "done," for lack of a better term. Frankly, I don't see ecology / environmental science as having a data problem as much as it has a problem of lack of data organization, in such a way as to make it:

    • easy to submit data for inclusion and prior verification (this is a big one)
    • being able to ensure effective access to the data
    • being able to ensure effective USE of the data

    I've done some work on this already, and have had good results - won't go too far into details; it would be boring and I need to finish this thesis. Rest assured, tho, that these will be "published" later under the GPL - some is already available at the SGPL project site, more will follow (and the much needed porting will happen soon, any gnumeric hackers out there? :).

    Open source has opened up some tremendous potential for science. Perhaps the biggest contribution though, is to start getting scientists to be thinking in the "Unix" frame of mind, or at least gaining an appreciation of the Unix philosophy - copious small, specialized and reusable tools rather than few large applications. I can't speak for anyone else, but thanks to some people, I've come to see a raft of new possibilities that only a few years ago I couldn't even dream of. The key to this ephiphany was not to feel that I needed to create new software or programs but rather to look carefully at how existing software CAN be put to different or interesting uses...

    • using a spreadsheet as an effective interface to a data source for complex, focused calculations
    • using a web server as an efficient tool for data analysis and visualization
    • using a search engine as a personal cataloging system for online journal articles
    • using repository and good markup techniques to facilitate keeping local lab and study documentation up to date.

    The latter is usually an underappreciated and undervalued aspect of any endeavor, scientific or otherwise, and I've gained a lot of respect for those people or groups working on good docs.

    Even with all of this great open source software available, there is a still a very considerable price to pay for gaining this perspective. Pretty much everyone I know working with Linux and Unix in general for their ecological research feels pretty isolated because picking up *nix means that they no longer have any peers in a research world dominated by Win- and Mac-users. The energy (well spent, admittedly) in climbing the learning curve means that many in this situation (myself included) are perceived as being more interested with the technology than in doing science.

    Interestingly, in our case, we can easily work around this lack-of-peer-support problem by using that venerable geek tool - IRC - to maintain and develop our virtual peer group. Not only does this bring together some pretty competent *nix folk, but we get the added benefit of working in a very diverse community of researchers, and a place to talk with others about research and possible collaborations.

    Moral of the story: Hug an ecolgical *nix geek today. :)

    From the ongoing-saga-of-a-quack-gone-to-the-dogs dept:

      Finished the data prep work, at least enough to get data to analyse. 1058 lines of sql, three weeks or so of learning "proper" use of the query language. Making a lot of mistakes in the process (but hey, they're *my* mistakes :) All in all, things are well.

      Tomorrow, esox will get its installation of R upgraded at last, and I'll finally install grace as well, and start playing.

      In the meanwhile, it's way past my bedtime.

      Gamble of the ages. Suit me up, I'm ready to go. - Tom Cochrane

    More dreams of databases and lakes.

    Not much to report lately, mostly been fixing some SQL that seems to keep breaking. I have the underlying data ready, so this should end today. I'm kinda sad about that, because I'm starting to get all kinds of ideas about how ecological databases can be used. I have several functions set out already that I want to port from VBA to Postgres, simple things like temperature conversions, oxygen saturation, etc.

    Having these functions inside the db proper makes sense, mostly because they can have utility in large scale data workup (like I'm doing right now). From a design standpoint, however, other functions should not be inside the db proper at all. I'm thinking of some of the more complex functions that can be brought to bear on data subsets. Much of the data we deal with has both spatial and temporal structure, usually both (even if one is only implied), so conventional SQL breaks down for complex calculations. Besides, for profile analysis, Octave or something similar is most appropriate.

    Interfaces to data are also important. I had a really warped idea using infobot or another bot as the basis for an information retrieval interface, albeit a very simplistic one, for a db. SQL is often overkill, not to mention confusing, for simple queries. I can see something like this happening:


      <pete>fish, list tables to me
      <fish>sending /msg to pete
      .
      .
      (I get a list of tables pasted as a /msg)
      .
      .
      <pete>fish, list years for lake 'pete' in temperature table
      <fish>1995,1996,1997
      <pete>fish, list dates for lake 'pete' for year '1996' in temperature table
      .
      .
      and so on...

    Granted, this is a little contrived, because now the user wanting to drill down further and further in order to get at data, and not using SQL syntax for this is not wise. Also, at this granularity, adding a generic user to a given table and letting said person "play" in the tables (read only, of course) is probably more intelligent. This latter approach is lacking somewhat because it means that only one person can see the data, rather than everyone on channel which was the intent behind the 'fish' infobot mods.

    The other nice thing about this approach is that the bot logs all of its communications, so finding out what people are trying to do (which, of course, almost NEVER matches the spec of the system, 'cuz <cynical>Users Don't Read </cynical> :) provides hints for altering the query model.

    Other data does not lend itself well to being viewed textually, in that this spatial / temporal structure remains hidden until seen graphically. Oxygen and temperature profiles are good examples of this sort of data. I had some initial work done on an interface for profile data, but this was put aside due to lack of time. My recent departure from the Windows/VB world means the opportunity to do this 'properly' (read: reimplementing this using X / OpenGL), and better yet, there are open source examples of distributed data visualization apps I can draw on for this.

    Cool. I can hardly wait. ;)

11 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!