Older blog entries for rillian (starting at number 88)

30 Dec 2006 (updated 30 Dec 2006 at 00:50 UTC) »

parallel computing

We do nightly regression tests on the Ghostscript codebase to try and detect inadvertent changes. It's a combination of established test suites and our own collection of problem files from the wild.

The problem is that a complete run takes hours. Before we bought our current server, it was impossible to to a check on every commit, and even now we'd need a queuing system no one's been annoyed enough to write. So instead we run once a day, and then someone has to check the results and work out which change caused the differences.

So we've been looking at using a cluster to speed up the runs, hopefully to a few minutes, so we can easily test things, and get automatic feedback right away after a commit. My partner does scientific parallel programming and has been helping set something up.

For the moment, we're renting time. Our usage pattern is ununsual. Most cluster users have algorithms that are limited by communication between the nodes, and so they tend to do smaller jobs, but run a simulation for hours, days, even weeks. We want a lot of nodes, but not for very long, so it's the sort of thing where renting part of a shared resource makes sense.

Of course, it works better to be sharing a resource much larger than the average job size, or with other people with similar usage patterns to avoid being blocked in the queue. But we'll see how it goes. For the moment we're using Tsunamic Technologies' cluster on demand service. They've provided good support so far, and offer a familiar linux environment using the PBS job queue system (the venerable qsub et al.) to schedule access to the nodes. So far it's going pretty well, with scaling down to a 5 minute run.

wherefore the grid?

People have been talking about Grid computing for 17 years now, but not much has appeared to fulfill the promise. Right now, most parallel machine users are doing research simulations, and there the overhead of dealing with heterogenous environment and dynamic node allocation isn't especially worthwhile. But once it there's the infrastructure available to rent time easily, and especially to sell time, I think we'll see a lot more of our sort of use.

Ironically, it's the overhead of virtualization that's finally making that possible. The problem with a market in cpu time is that you have to be able to run untrusted code. An entirely automatic reputation system isn't really good enough. You need recourse if your provider is messing with your data, and providers need to be able to protect jobs from each other. And because you can move machine images around, it also fosters the sort of dynamic infrastructure we need to really have scalable computing available as a utility.

I was therefore excited to see that Amazon is doing exactly that with their Elastic Compute Cloud beta. To use the service you upload an OS image to their storage farm, and then launch as many instances as you want, for as long as you want. It's a really cool set up. Apparently the story is that they have this enormous server farm for dealing with their peak loads (like Christmas) but of course that means it's idle much of the time. TThe same issue we have, really. They already sell almost everything else online, so they decided to try renting out time on their infrastructure as a new business idea.

They have some other cool things too, like an RPC interface to human labor.

The best thing about it is that they have a web protocol for doing all this. So while someone has to provide a credit card and pay the bills, you can now write code that can allocate and occupy its own server resources. We're one step closer to AIs living free on the net. :)

everything rots eventually

wingo, there's also a bridge in Paris named Pont Neuf.

bad phone karma

So we moved recently. To a bigger, nicer place. Which is great. But along with our address, we changed our phone number.

You see, our previous phone number was previously (previously) the fax number of a modelling agency. When we first got the number three years ago after moving back from London, we got about 30 faxes a week. We figured it couldn't last, so we didn't immediately complain. However, as of two months ago we were still getting 5-10 a week, often in the middle of the night. Certain websites' disinterest in removing our number from their agency listings probably didn't help.

We therefore asked for a new number when moving. That was fine, and while not quite as memorable as the old number, it was still pretty good. We got a few odd wrong numbers the first week, but didn't think much of it.

Well, it's been a month now, and we're still getting a consistent few wrong numbers a week, and S finally figured out what was going on, from the tone of voice one of the callers used. Turns out our number is listed in the new, just came out last month yellowpages as...an escort agency!

Yup, someone just called. "Hi, I'd like to hire an escort."

Well, that explains a few things. You'd think they'd fallow these numbers for a few months! Sigh. OTOH, if this was their normal call volume, I can see why they went out of business. And at least we know how to answer the phone now. S has been practicing her derisive laugh.

robogato, thanks for enabling multiple posts. It makes it a lot easier to have conversations through the recentlog.

Zaitcev, sorting topicality for different syndication points is what tags are for. Most blog software where generate tag-specific feeds, but it's not clear livejournal is among them. I'm not aware of a standard for including the tags in RSS items themselves, but there's an atom:category element that looks like it's for this. So maybe some combination of using a tag-specific feed from a blog and filtering by atom:category on the advogato side would work?

It's a pity so many people seem to have left, but it's also nice to be able to read the complete recentlog again. :)

SteveRainwater have you changed the "multiple posts in a day clobber each other" behaviour? I think the planets have demonstrated the value of letting people have multiple entries.

Hooray, Advogato has been saved. Thanks StevenRainwater.

nutella, certs are entirely one way, and they're always a positive assertion. The idea is that trust flows along certification links, so a spammer can create as many accounts as they want, but unless a significant number of people already in the trusted group can be pursuaded to certify those accounts, they will still be cut off because they look like an island.

So fake accounts making some random certs of real accounts makes the fake accounts look a little less fake, but actually hurts their chances of being trusted. As long as most people cert based on their knowledge of another person's work, the trust metric will continue to work.

All this spam cleanup is just about cleaning up the pool of untrusted user accounts, and preventing spreading google juice where it doesn't belong.

DOS

Whoever was hammering the advogato person index page this (wednesday) morning, please don't do that again. It's an expensive page to generate and, well, you brought poor apache to its knees.

That's bad for your karma.

Was in San Francisco yesterday for an Artifex staff meeting. Had dinner with raph and Silvia Pfeiffer who runs the annodex project. They've been strong supporters of free and open multimedia for some time and it was great to finally meet here in person.

recruiting

We're looking for someone to help out with Ghostscript integration in Free Software, on a part-time contract. Yes, that means paid work. Help sort, review and update patches from the distros into upstream, write a Firefox plugin, that sort of thing.

Please send interesting resumes to giles@ghostscript.com.

whacky medieval latin

raph and nymia, google suggests ojusdem might be eiusdem, the third person, singular, feminine, genitive pronoun.

So inter omnes curvas ojusdem longitudinus might be "between each of its curves lengthwise." But I know exactly enough Latin to be extremely dangerous with a dictionary. Caveat lector.

79 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!