Older blog entries for Bram (starting at number 50)

XML people don't seem to realize that when I say I'm on the de facto standards committee I'm not kidding.

ladypine: Neither of the examples you cite show a failure of pareto efficiency, in fact quite the opposite. I'll explain.

In the case of an auction, any price below the second price wouldn't be pareto efficient, because the seller could sell the item to someone else at a higher price and both the seller and the new buyer would be happier. Pareto efficiency isn't unique in this case, because the high bidder purchasing with any price between the first and second bids is pareto efficient.

(Yes it's possible to make more money selling on ebay by setting your minimum bid increment to a humongous value. At some point I'll get around to writing about ebay at length.)

The reason to go with the second price instead of some amount more (which is how ebay does things, irritatingly enough) is to make it less gameable, however some gameability remains. Specifically, the seller could inflate the minimum price to anywhere between the first and second price and be better off. This is impractical under many circumenstances, but is very important with only two (or one!) bidder. Note that gameability is only a problem in the situations in selecting between different pareto efficient solutions, so there's no weakness to it as a technique here.

In the case of stable marriage, there is a straightforward algorithm for finding all stable solutions based on pareto efficiency. For each person A, if A's first place choice is B, then for every C which appears below A on B's list of preferences, scratch B off C's list and C off B's list (this is justified because if B were paired with C then we could change the pairings to have B paired with A, and both A and B would be happier). Repeat until you can't simplify any more. At this point, participants will be in cycles, in which B is first on A's list, C is first on B's list, D is first on C's list, etc, until we get to A being first no somebody's list. Because of the gender difference, these cycles will always be of even length, so for each cycle we have to decide whether the males get their first choice or the females get their first choice. Note that this is a choice between different pareto efficient configurations, the appropriateness of pareto efficiency as a criterion is uncontroversial.

In practice, even on random data, this technique does such a good job of pairing up people that there's hardly any arbitrariness in final pairings. In the medical example a study was done and found that only a miniscule number of students would be assigned elsewhere in students's choice, so the algorithm was switched to that for good PR.

A bit off-topic, I'd like to point out that It's utterly stupid that 'the match' was so controversial for so long. I at one point solved stable marriage on my own because it's an interesting problem, coming up with the above algorithm, and after a little bit of testing realized how little difference male versus female choice makes. That was only a few days's worth of work. Let this be a lesson to everyone that if there's a big controversy which a simple study can shed a lot of light on, do the damn study.

The stable roommates problem doesn't always have a pareto efficient solution, for example there might be a loser who's put last on everyone else's list of priorities, but the above technique can be used to make an algorithm which works very well. Note that stable marriage is just a special case of stable roommates where all the males rank all females above all males and all females rank all males above all females.

8 Dec 2002 (updated 8 Dec 2002 at 06:23 UTC) »
ladypine: Viewing cost and benefit between individuals as somehow comparable is dubious. For one thing, it's completely gameable - individuals who exaggerate costs and benefits to themselves, or are just plain histrionic, are rewarded unduly. For another, social policies based on such theories tend to be downright racist, since they heavily slant towards whatever benefits the gender/race/whatever is dominant most cares about.

A much simpler and more justifiable approach is to try to reach pareto efficiency, which is a situation in which no two individuals can make an exchange and both be happier. This cleverly avoids making subjective comparisons of different peoples's worth, and also yields a straightforward algorithm for maximization. The success of capitalism can be viewed as a testament to the robustness of greedy algorithms.

robocoder: I suggest setting up CVS, a mailing list, syncmail, and a todo list. Picking a bug tracking tool to begin with is like starting the construction of a bridge by digging a mass grave for everyone who will die in the building process.

There's a very difficult puzzle in the latest scientific american, it goes as follows:

Three of the nine members of the president's cabinet are leaks. If the president gives a tip to some subset of his advisors and all three interlopers are in that subset, then that tip will get leaked to the press. The president has decided to determine who the leaks are by selectively giving tips to advisors and seeing which ones leak. How can this be done by giving each tip to three or four people, having no more than two leaked tips, and using no more than 25 tips total?

There's a solution on the web site, although the one I figured out is very different.

Here's my solution. First, note that if a tip to four people leaks, the leaks can be found using only three more tips, giving each of them to three of the four.

Arrange eight of the nine advisors on the vertices of a 2x2x2 cube. Test each of the 12 subsets which are coplanar, plus the 2 subsets which are all the same color if they're colored like a checkerboard. If any of those leak, we're done. If not, we've determined that the odd one out must be one of the leaks.

Next, arrange the eight we previously had in a cube formation into a line, and test all 8 configurations of the form the odd one out, x, x+1, and x+3 (modulo 8).

If none of those hit, then the two other leaks must be of the form x and x + 4, which there are four possibilities for. Three of them can be tried directly and if none leak then the last one can be inferred.

This leads to a total of 14 + 8 + 3 = 25 tips. It's a very hard problem, it took me about 45 minutes to figure out a solution.

An additional question given is whether increasing the number of advisors included in a tip can reduce the number of trials necessary. The answer is yes. For example, to modify the technique I gave the first test could be of six of the corners of the cube except for two adjacent ones. This effectively does three tests at once, so the total number of tests needed drops to 23. If the first test turns up positive, all but one of the 20 subsets of 3 of 6 can be tried individually, and if none of them leak then the last one can be inferred, for a total of one initial tip with to 6 plus 19 others, or 20 total, so leaking on the first tip isn't the limiting case.

The CodeCon submissions deadline is December 15th. Get those submissions in.

The time and place have been set. It will be February 22-24 at Club NV in San Francisco.

I just finished my first game of go on a 19x19 board. I played against a 15kyu player, and lost by about 100 points (I should have lost be about 80, but blundered stupidly near the end). I literally feel ready to throw up. I'm very kinaesthetic and apparently my body confuses playing a position I don't understand at all with being dizzy. Hopefully this will stop happening in the future.

19x19 go is huge. Probably equivalent to about 30x30 hex.

Kiseido is a nice go server. A lot of interesting stuff has been done with CGoban, and java web start is actually working okay now. Web start's proclivity to make applications look like they're subordinate to it as an architecture would make me not even consider using it though - that's so obnoxious as to be offensive.

Compare Joel's current time giving site design with the real thing.

Sylvester's problem is cool.

Surprisingly, the rules of go are somewhat controversial.

This game is rather interesting. I figured out at least one position in which O has a simple forced win. A more interesting question is whether there are any positions in which X has a forced win. I doubt it.

I tried out my original rules Bohemia on a 4x4 board. It's a close game, but the first player can always win. I didn't try any possibilities where the bohemian moves first and doesn't pass. Here is an interesting question - If we take a generalizations of this game, in which there are sets of subsets which the square wins if any of them become monochromatic, and the bohemian can pass, and the bohemian can go first, is there any such game in which the bohemian can always win, but loses by passing on the first move? Note that my new rules aren't of this form.

Joel on API Apology

Joel Spolsky has an article in which he states

All non-trivial abstractions, to some degree, are leaky.

This is overly dogmatic - for example, bignum classes are exactly the same regardless of the native integer multiplication. Ignoring that, this statement is essentially true, but rather inane and missing the point. Without abstractions, all our code would be completely interdependent and unmaintainable, and abstractions do a remarkable job of cleaning that up. It is a testament to the power of abstraction and how much we take it for granted that such a statement can be made at all, as if we always expected to be able to write large pieces of software in a maintainable manner.

There are really two separate statements here, (what does 'leaky' mean, anyway?) The first one is -

All APIs contain artifacts of what they're built on top of.

As I said before, this is overly dogmatic, but mostly true. The example given, however, is extremely curious. TCP requires quite a tangent to explain ('look how smart I am, I know about TCP') and is one of the most robust, useful, and widely deployed abstractions in existence. I've spent a huge amount of time using TCP in very novel ways and haven't set a socket option in my life. Furthermore, Joel cites TCP's failure mode as an artifact of IP, when it is in fact an artifact of wires and electricity, and an extremely well done and clean one at that. Whether the connection drops due to network outage or the counterparty machine going down, you still get a single well-defined failure, with clear semantics as to what might have arrived on the other end.

TCP actually has plenty of real artifacts, such as slow start and high latency during bulk transfers, but these would clarify the distinction between the statement about artifacts and the one Joel is really trying to prove, which is -

All APIs are Broken

As I explained, the TCP example doesn't prove this at all - if a failure mode is defined in the API, then that doesn't mean that the API breaks when it gets invoked. But this statement does apply to most of the other examples Joel goes on to cite - NFS is a mess, C++ strings aren't real strings, COM internals aren't properly hidden, ASP.NET doesn't have proper string mangling code to make GET parameters properly, etc.

You may notice that all the Microsoft tools are left to the end, right before the claim that the law of leaky abstractions is dragging us down. This essay isn't about API design, it's about software apology, an attempt to claim that all software is inherently bad hence it's okay that Microsoft's is bad. Like any good microsoftie, Joel starts with the assumption that the (very flawed) tools he uses are completely acceptable, and works backwards from there.

12 Nov 2002 (updated 12 Nov 2002 at 11:18 UTC) »
Protocol Stuff

The single sign-on protocol I gave earlier can be made secure when re-using challenge/response pairs by using HMAC instead of raw sha1 (HMAC is built using sha1). That greatly simplifies implementation on third party web sites and gets rid of a nasty denial of service attack.

Bohemia

I tried playing a few games of Bohemia and was a bit disappointed. There are a few specific things I'm going for with the rules -

  • The game should be even on a medium size board, ideally in the ten to twenty on a side range. The rules as I presented them lead to an about even game on a 5 by 5 board, way too small.

  • The beginning of the game should be very positional, gradually leading to a very tactical endgame. The rules I gave are a lot more like Renju, with lots of forcing lines.

  • There should be a complicated interplay between pieces of either color, with many double-edged moves. With all the forcing moves, the square winds up emphasizing one color and the bohemian mostly plays the other, and the bohemian's color's pieces don't generally make squares.

To try and fix those problems, I've come up with the following rule change - the square's new goal is to form a square whose corners are black on the lower left and upper right, and white on the upper left and lower right. This straightforwardly enforces a complicated interplay between both colored pieces, and since forming a clump of one color is very counterproductive, the square no longer has much benefit in playing lots of forcing moves. Also, the reduction in number of winning arrangements and greater complexity of having to play both colors probably makes the evenly matched board size larger.

Furthur experimentation is once again necessary.

If anyone can figure out a proof that with a sufficiently large board under this new rule the square can force a win, I'd like to hear it. I don't see one immediately.

tripp: Yes, the challenge/response pairs are for the purposes of (drastically) reducing the amount of communication between blog.example.com and signon.example.net

I avoided using public keys for several reasons. The biggest one is that if signon.example.net relied on knowing blog.example.com's public key, then if that public key were lost or compromised some very serious human intervention would be required at signon.example.net, while with the technique I gave blog.example.com would simply issue a new query to get a new list of challenge/response pairs. Getting a phone call from someone demanding that you manually change a key in a database, and having to somehow guess if they're who they claim they are or someone trying to commit fraud, is a problem you want to deal with never.

Also, the technique I gave is simpler. There's no key setup and management queries necessary at signon.example.net, and the only libraries blog.example.com needs are string mangling and sha1, which are already available on almost all web scripting systems. Also, any public key operations at signon.example.net can result in a significant amount of CPU load at not very high levels of scaling, doing the operations for https is bad enough.

I can't think of any disadvantages to the method I gave over using public key trickery - even the extra bandwidth it uses is paltry.

Squares

I figured out a cleaner proof of my earlier theorem about monochromatic squares, based on the Hales-Jewett theorem. I looked up the section on theorems proven using Hales-Jewett in a book I have and, sure enough, I re-invented what's known as Gallai's theorem.

The lower bound on lattice edge size is two to the power of Hales-Jewett with two colors and edge length four. This is still a huge value. At some point I'll get around to throwing blackbox at the problem to try to get a reasonable guess of the actual value.

41 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!