Older blog entries for Stevey (starting at number 115)

Webcams...

 After becoming extremely frustrated at the difficulty of setting up stable interesting webcam packages I've finally admitted defeat.

 I used to use vgrabjj, then I switched to 'camE'.

 Each of these would break randomly; either quitting uploading, having font problems, or other random errors.

 After a brief rant upon my livejournal a friend suggested writing a tool that would just dump a single image from a camera - then using that in a bash script.

 So I hicked and I hacked and I produced a little program 'camgrab' that will just output a single .JPG file from a video device.

 Wrapping that up in a little shell script with a call to ImageMagick to stick a date stamp upon the image I have a new working webcam.

 Debian (x86) package available from my apt-get repository.

Life

 In other news I'm still on the lookout for a new job ..

 I'm (yet another) Edinburgh SysAdmin / Free Software Programmer ..

 For fun I was recently told Byte mark hosting offer a 10% discount for free software developers. Nice to see a company doing things like that when their entire infrastructure is GNU/Linux/Free Software based.

 I think I'm going to use them.

BogoMips
 I'm awaiting the arrival of my first >1Ghz computer.

 As a source of amusement I'm wondering what bogomips rating it will have - meaningless as that is.

 So .. I created a simple system where people could submit their computer details to a database, and it would show the average CPU speed + bogomips rating.

 Depressingly the slowest machine on record is the Internet Gateway machine at work... 331 bogomips.

 If you're proud of your machine submit it - or just look around. (There is at least one pretty graph!)

brondsem:

 Thanks for the tip, unfortunately it wasn't recursive includes - though that wasn't something I'd thought of checking so it was a good suggestion.

trs80

 When I tried to leverage the LiveJournal account system I had a terrible time as the site doesnt export it's authentification system.

 My solution was to require users to enter their LJ username, eg 'skx', and then screen-scrape their email address from their user information page.

 I'm in the middle of working on a new system which relies on their being a one-to-one mapping between livejournal users and system accounts. This time I'm doing the system in the reverse order.

 A user will sign up to my new site with a username + password; and to prove that they really are the given LJ user they must paste into their journal a token which my system will look for before enabling the account.

 This is neater I think technically; plus it serves as advertising because anybody who follows their journal will see the token - which will contain the name of my site :)

17 Aug 2003 (updated 17 Aug 2003 at 16:51 UTC) »
Errors

 I sympathis with raph completey - I've managed to write some PHP code which kills apache:

[Sun Aug 17 17:20:03 2003] [notice] child pid 4989 exit signal Segmentation fault (11)
[Sun Aug 17 17:20:03 2003] [notice] child pid 4986 exit signal Segmentation fault (11)
[Sun Aug 17 17:20:04 2003] [notice] child pid 5018 exit signal Segmentation fault (11)
[Sun Aug 17 17:20:05 2003] [notice] child pid 5019 exit signal Segmentation fault (11)

 I have no idea why this happens - I have two bits of code 'view.php' and 'common.inc'. If I include common.inc within view.php I get the segfaults - if I don't I don't ... but my code doesnt work!

 What does the common code do? It connects to a database - and defines some functions which are never called. Wierd.

 I'm not sure how I should be debugging this. So I'm off for a curry instead!

Port Monitoring

 The new software I mentioned in my previous entry, Lestat, has just had an update.

 I'm was pleased with the previous release as it got some good feedback and useful suggestions - however even in the course of three days I've become almost embaressed at some of the old code.

 PHP I've used before, for things like my LiveJournal Valentine System (double-blind blinddating .. almost), but never enough to code in a PHP-ish way.

 My PHP tends to look like Perl, clean but non-ideomatic.

 Still the new release is out now - with template based presentation; because I know that other people are more capable of writing GUI's than I. Writing GUIs is hard, writing them well is harder still.

 I think in my previous project I was touched by a fair amount of luck when it came to creating the GUI - as most people liked it, and those that didn't were capable of creating new layouts/themes.

 Even now I feel humbled when viewing some other peoples themes for my software (graphics intensive link).

 Sleep now.

Connection / Portscan Monitoring

 Prompted more by the spread of the MSBlaster RPC worm than anything else I've packaged and released my connection monitoring application, Lestat.

 This produces pretty graphs of connection attempts.

 The system is comprised of a perl based agent which collects packets and logs them to a database, and a PHP based viewing system which pulls the data out and massages it into prettyness.

Security

 I've not been doing a fair bit of `real` hacking for the past few days... Looking through Debian packages for security holes.

 Mostly this has been triggered by somebody mailing me and telling me that the Debian Auditing Project had really nasty webpages - so I've updated them.

 Once I did that I got all enthusiastic and built up a list of all the setuid/setgid binaries in Debian stable, before starting to work my way through some of them.

 So far I've had several Debian Security Advisories published - and I've got a few more issues to report.

 Ideally I'd like to release one a day .. for the next few weeks!

 At the moment I have five in hand to report, so there is the chance that I can manage it.

 It's been productive week or so - it looks like there's the proposal to audit all new setuid/setgid binaries before they enter the distribution is going to be accepted, so we should be ahead of the game :)

Love

 In other life news I have a new cat.

 Cat Six is the successor to Tigger - (bet you thought I was gonna say cat 5 then didn't you? ;)

 I'm in love, she's beautiful and lovely and nice , and stuff :)

6 Jul 2003 (updated 6 Jul 2003 at 21:13 UTC) »
Bayesian Spelling

 A lot of people have heard of Bayesian Spam filtering recently, as a result of Paul Grahams Plan For Spam article.

 I confess that my maths knowledge is lacking, but I can follow along with his idea. Counting tokens is trivial stuff, and applying weights to the different tokens appears to be reasonable - so I can follow along, and see how it all wokrs.

 Reading through the code of several implementations has been rewarding as I can see it all in action.

 The whole process has piqued my interest in statistics, something I've never really been that interested in before. I guess the closests statistical thing I have coded before has been Genetic Algorithms, where this kind of thing doesn't really turn up to the same extent.

 My formal maths training isn't terribly high, much like my computer training. Most of the things I know I've picked up by accidental discovery rather than pure theory, although I have read a lot of the literature over the past few years to shore up my home-learning approach to programming.

The Idea

 Whilst I was typing up the latest entry for my online journal I enabled the online spell checker.

 This managed to correct my erroneous spelling of "muscles" to "mussels". This was quite a fun mis-correction, but it did make me pause for thought.

 So often I've seen this in spell checkers before - you type "that" which is a real word - but not the one you should have written.

 Perhaps what we need is a statistical approach to spell checking; much like Paul's work - look over a corpus of previous emails/blog entries/whatever and look at the word distribution.

 Examining pairs of words it should be possible to see, for example that "hot this" doesn't ever occur - but that "sex", "curry", "weather" are a acceptible suffixes to follow "hot".

 I guess this does break down badly when you're using globally unique words for the first time - as there wouldn't be an entry in the database to describe it. So the first time you wrote "hot Madigasgar" you'd be flagged as if you'd made an error.

 It's an interesting idea though nonetheless. I wonder if it's been done before?

hank

 I like your idea of a good visualization tool for duplicate file finding.

 As you might have seen from my recent diary entry I spent a while working on a quick and dirty script for finding duplicate files.

 I'd love to see a screenshot if you could dig one up - as I have a hard time imagining a useful GUI for such a tool.

 Finding duplicate directories might be simple, but displaying partial duplications seems tricky to me - maybe I just don't have the eye for it.

XSS

 Spent a while investigating online presentation systems recently for managing a new website in a collaborative manner.

 I narrowed down the list of systems to a couple - then went looking through the code to see how secure/paranoid/flexible each one was.

 (Due to my mistrust of such systems - How many times have holes been pointed out in PHPNuke et al?)

 Depressingly in both cases I found exploitable weaknesses. To my shame I tried to demonstrate one in a non-malicious manner after the author(s) didn't seem to understand what I had discovered and reported ... it went wrong. The main site was borked for around 15 minutes.

 I guess there's a good side the admins now spot the problem, but the down side is that I may have inspired evil people to take advantage.

 It was a genuine error for which I can only apologise profusely; in my investigation I hadn't realised quite what effect I'd have.

 Ce la vie ..

 Based on early responses the sites/packages will both be fixed shortly so a "Name and Shame" is inappropriate - but I'll document the flaws which might encourage other authors to take more care and be more paranoid in the future...

Hacking

 Nothing much to report - I wrote some quick and dirty scripts today to find duplicate files as I'm bad at organizing MP3's.

 First we scan through a directory, recursively, writing out a temporary file containing MD5 hashes and filenames - then we use that to find duplicate files.

 Handy, but messy.

Security

 It looks like I was responsible for the following two Debian Security Advisorys:

 (Details here, and here respectively).

 I'm such a naughty boy ... ;)

salmoni - No I guess all cats are perfect, although some are more perfect than others ;)

Advogato

 I spent an hour or two working on the Advogato codebase last night - adding support for Article Editting.

 This isn't complete yet because I'm having issues with the way that articles are posted. What I have is an 'Edit' link displayed next to an article if you're the author.

 The edit link brings up the article, preamble, and title in a form which you may edit and submit.

 This is where it goes wrong - when the form is submitted a new article is posted with the changes applied. What should happen is the old article should be updated. I'll deal with this tonight if I get time.

 There are other issues to deal with - such as the forking of Advogato. There are many different versions of the code now. I think I'm correct in saying that Steven Rainwater's version is the most up to date - but there are different fixes and changes in each version.

 I have packaged a copy for Debian which is pretty standard with only the addition of my password emailing patch and no other changes. (I can make the .deb file available to the world if theres any interest - I didn't do this initially to avoid poluting the world with yet another codebase).

 The article editting I started just for fun - if it's complete I've no idea what to do with it. Keep it to myself? Add it to my .deb?

GNUMP3d

 Over the weekend I changed a lot of things in my MP3 streamer, rather than reading the tags from each audio file as needed there's an indexer script which builds up a database of all the files and tag information.

 This "database" is used throughout the code which provides a huge performance win - at the expense of potentially out of date information.

 So far I'm assuming that the indexer will be run from cron, but I'm experimenting with the auto-rebuilding of the index whenever the machine is "idle"... We'll see how that goes.

 A new release is going to arrive soon - I'm determined to get it out before I drop offline during my housemoving.

106 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!