Older blog entries for Bram (starting at number 41)

tripp: Yes, the challenge/response pairs are for the purposes of (drastically) reducing the amount of communication between blog.example.com and signon.example.net

I avoided using public keys for several reasons. The biggest one is that if signon.example.net relied on knowing blog.example.com's public key, then if that public key were lost or compromised some very serious human intervention would be required at signon.example.net, while with the technique I gave blog.example.com would simply issue a new query to get a new list of challenge/response pairs. Getting a phone call from someone demanding that you manually change a key in a database, and having to somehow guess if they're who they claim they are or someone trying to commit fraud, is a problem you want to deal with never.

Also, the technique I gave is simpler. There's no key setup and management queries necessary at signon.example.net, and the only libraries blog.example.com needs are string mangling and sha1, which are already available on almost all web scripting systems. Also, any public key operations at signon.example.net can result in a significant amount of CPU load at not very high levels of scaling, doing the operations for https is bad enough.

I can't think of any disadvantages to the method I gave over using public key trickery - even the extra bandwidth it uses is paltry.

Squares

I figured out a cleaner proof of my earlier theorem about monochromatic squares, based on the Hales-Jewett theorem. I looked up the section on theorems proven using Hales-Jewett in a book I have and, sure enough, I re-invented what's known as Gallai's theorem.

The lower bound on lattice edge size is two to the power of Hales-Jewett with two colors and edge length four. This is still a huge value. At some point I'll get around to throwing blackbox at the problem to try to get a reasonable guess of the actual value.

single sign-on protocol

Raph mentioned that we discussed single sign-on protocols, and I came up with some good ideas. I'll summarize now.

There is a single monolithic site which users maintain their login information and identity on. This is very convenient from an end user perspective, since they only have to remember one password and maintain their information in one place. It's also very appealing for web sites, since they no longer have to manage any login information or do tech support for lost passwords and the like.

Say the site which does the signon is signon.com and the site using those logins is blog.com. When a user is doing something which may require login to blog.com, they'll get a hyperlink saying 'log in', which will link to signon.com. When the user clicks on that, they'll either be prompted to set up an account on signon.com, or if they already have one and have an active session, they'll get redirected back to blog.com, where they're now logged in.

That's the end user experience, now for the technical details.

blog.com makes an http or https query in advance to get a bunch of challenge/response pairs, which are generated dynamically by signon.com. A huge number of these can be stockpiled, so the query only has to be done rarely, the only real restriction on their use is that they can only be used once each. The login hyperlink goes to (or redirects to) a url on signon.com, and includes a challenge code and callback url as GET parameters. For example, it might be http://signon.com/signon?challenge=a63f34b34d&callback=http://blog.com/login.cgi signon.com then gets the users login from their cookie, verifies their identity (either by an active session cookie or making them type in their password again) and redirects back to the callback, giving it GET parameters of the user's login identity and sha1(login + response). For example, it might be http://blog.com/login.cgi?user=bram&verify=ssonuoocrdua blog.com can then easily check that verify is the correct value, and give the user a session cookie if so.

Raph suggested that for developer convenience it should be possible to send signon.com a nonce url which gets echoed back. I think this is more cleanly implemented by making signon.com notice if there are already GET parameters in the callback url, and preserving them if so. There is a special place in hell for whoever decided that the first GET parameter should have a different delimiter than the rest of them.

signon.com can avoid having to store all challenge/response pairs by keeping a secret symmetric key and generating challenges at random and setting responses to be the encryption of the challenge.

I'm very happy with this protocol. It's easy to implement on all sides, and has a smooth implementation path - web sites are immediately motivated to use it because it removes the burden of account management from them. Also, blog.com has complete control over whether all requests are done by http or https, allowing them to determine their level of security pragmatically and migrate over time. signon.com doesn't even have to configure anything new to support new sites - it's simply left running. Best of all, sites using signon.com can't compromise each other no matter how insecure they are.

The main problem with this whole scheme, of course, is that it's centralized, has a single point of failure, and generally isn't very cypherpunkly. That's explicitly the model Raph asked for, out of overwhelming engineering expediency, and I agree with that sentiment - this solution is far from perfect, but it's easy, and noone's demonstrated that decentralized single signon is workable.

Bohemia

Bohemia's board size problem can be fixed with an interesting version of the swap rule - one player decides the board size, and the other decides whether they want to be the square or the bohemian. Handicaps can be given with a spread on the board size. It's possible that an even chances board size is overly large or small, experimentation is necessary.

One dimensional Bohemia is possible as well, by having the square go for arithmetic progressions. This can be done with any number of colors and length of arithmetic progression. The small van der Waerden numbers indicate that length three and two or three colors might be a bit small, but more is probably very interesting. This has the distinction of being one of the few games on a one dimensional board which isn't completely lame.

Debian

BitTorrent (which had a new release today) has the necessary files to make debs, but isn't in debian. If any debian maintainer would like to volunteer, I'd very much appreciate it. Maintenance should be easy - the dependencies are straightforward and it's written in pure Python.

3 Nov 2002 (updated 4 Nov 2002 at 07:17 UTC) »
garym: Your description of the current state of the software industry is, unfortunately, quite accurate, and it won't get much better any time soon. The problem is that only programmers can really tell whether other programmers are any good, and we aren't going to be put in charge of the industry any time soon.

I've achieved a modest degree of success as an open source developer, but my advice for those considering it as a career is, only do it if you have to out of some immovable internal need. We are only just beginning to see the phenomenon of starving artist programmers, and it will only get more pronounced in the future, possibly even becoming standard for the industry.

I've seriously considered attempting a career as a screen writer. Although I have no experience with it, I have a good work ethic, natural talent, and possibly most important, a willingness to write high concept work with happy endings. Unfortunately those attributes don't even vaguely gaurantee a decent living in that industry, so I've shelved any such plans until I either have enough money to not be worried about failure or have come to hate the computer industry so much that I just want out.

My current long term career plan is to get good enough at financial stuff that I can start a hedge fund or set strategies for one. Unfortunately there's a lot of luck and who you know involved in that industry, but at least it consistently pays well and, unlike screenwriting, the industry has a desperate need for real talent.

For now, I'm doing okay working on networking software, which unlike any of the above I have oodles of experience in. My other career option which involves something I'm already good at is juggling, but I find that a less appealing career path than any of the above.

I haven't even dreamt of getting payed for all the random weirdness I like posting to my advogato diary. That's only possible with a tenured university position, and the more time passes the farther away I seem to be from academia.

Bohemia

Here are the rules to an abstract board game I came up with called Bohemia -

The two players are the square and the bohemian, the square moves first.

Players alternate playing either white or black pieces on an n by n grid. Either player may place either color piece. The bohemian is allowed to pass.

If four pieces of all the same color get placed such that they form the corners of a square with horizontal and vertical sides, the square player wins. If the board gets filled up without that happening, the bohemian wins.

I'm not sure what size board gives even chances to both players. That should be determined through experimentation.

A conjecture strongly related to this game is that for a sufficiently large board size the square must always win regardless of how the pieces are placed. This can be generalized to more than two colors. Also to lattices of side k, for which a square is a special example with k=2. This is a two-dimensional version of van der Waerden's theorem, which can of course be further generalized to higher numbers of dimensions.

I can't even prove this for two colors, two dimensions, and lattices of size two. It seems likely that such a straightforward problem in ramsey theory has already been studied though.

Update: I figured out the following very nice proof of the simplest case of my conjecture.

First, we will demonstrate that for any number of colors, a sufficiently large lattice will contain three pointts of the form (a, b), (a + c, b) and (a, b + c) which are all the same color. I'll call these formations Ls.

By van der Waerden's theorem, a sufficiently large 2-colored lattice will contain an arithmetic progression of length three somewhere along its bottom row. Let us say that those points are at (a, 0), (a + c, 0) and (a + 2*c, 0). The point (a, c) must be the opposite color to avoid forming an L with (a, 0) and (a + c, 0). Likewise the point (a + c, c) must be the opposite color from (a, c) because of (a + c, 0) and (a + 2*c, 0). Now we can show by contradiction that there most be some monochromatic L - the point (a, 2*c) must either form an L with (a, c) and (a + c, c) or with (a, 0) and (a + 2*c, 0). Therefore for a sufficiently large 2-colored lattice there exist points (a, b), (a + c, b) and (a, b + c) which are all the same color.

We will now demonstrate that this is true for any number of colors by induction. Say we have shown for k colors that a lattice of size f suffices. By van der Waerden's theorem, a sufficiently large lattice colored with k+1 colors will have a monochromatic arithmetic progression of length 2*f along its bottom row, say that this is (a + x * c, 0) for 0 <= x < 2*f. Now consider the lattice formed by (a + p * c, c * (q + 1)) for 0 <= p < f and 0 <= q < f. Either some node in this lattice will be the same color as our arithmetic progression on the bottom row, in which case it forms an L with that, or it contains only k colors and hence contains an L by induction.

Now we will use the above lemma to prove our theorem. Say the size of a 2-colored lattice so that it must contain an L is g. Now make a sufficiently large lattice that if it's colored with 2 ** (g ** 2) colors, there must be an L, and substitute a square lattice with side length g for each of these, with colors corresponding to 2-color assignments within the sub-lattices. Call the offset between the L of squares which all contain the same pattern d. The lattice of offset d in both directions from the lower left must contain the exact opposite of every corresponding color or by construction form a square, so we will consider the case where it's the opposite. Since it's of size g, there must be an L in the lower left one of these, whose coordinates we will call (a, b), (a + c, b) and (a, b + c). It is now straightforward to show that by construction the points (a, b), (a + d + c, b), (a, b + d + c) and (a + d + c, b + d + c) are all the same color, hence there must be a monochromatic square.

This proof is so straightforward that I'd be extremely surprised if it isn't already known, but it's pretty and non-trivial enough that I'm happy to have come up with it.

Unfortunately the lower bound it gives is a little bit too big to be practical.

chalst: The online sources say differently. Perhaps you're thinking of subgraph isomorphism?

ciphergoth: what's ISTR?

Linear Algebra as a Reasoning Technique

Consider a problem statement with some number of boolean variables, and restraints along the lines of 'out of this assignment, at least x must be true'. For example, in a problem with five variables, you might have a constraint stating 'of a, b, not c, not d, and e, at least three must be true'.

We can construct a linear relaxation of this problem by replacing boolean variables with numbers in the range [0, 1] and replacing their constrains with linear constraints, for example the above clause would change to a + b + (1 - c) + (1 - d) + e >= 3.

This allows us to do a very interesting trick. Pick a random assignment to all the variables, and calculate the minimum number of them which can be true (their sum) in the linear relaxation. If the values which generate that result round off to a satisfying assignment, we're done. If not, we can take the minimum value, say it's 2.5, and deduce that since each of the values is 0 or 1, the minimum must be the next higher integer, in this case 3. We can then express this as a new linear constraint and add it to the list of constraints. If we do this repeatedly, we may nail down the constraints enough to force the approximation we find to yield an actual solution or show that none exists.

I should emphasize that this is a reasoning method and not an approximation method. If this technique keeps adding constraints until there is no solution to the linear relaxation, that constitutes a formal proof that the original boolean problem has no solution. My motivation in coming up with this technique is that all exhaustive techniques for NP-complete problems I've heard of are backtracking algorithms, and I'm trying to find an exception to that rule.

If the minimum values given in the constraints are less than half the number of variables in them, then the linear relaxation can always be solved by setting all values to 1/2, and the round-up trick won't be able to get past that point. This is why this technique doesn't work for basic SAT problems. Yeah that's obvious, but it took me a few years to realize, and I haven't had access to a decent linear programming package since then, so I don't know for sure that this technique is effective on the more appropriate problems I've suggested, although it seems promising.

Hard instances of these problems can probably be generated simply by adding new random constraints until it gets to the point where there's a 50% chance of there being a solution, at which point both finding any solution and determining that there isn't one by backtracking will be hard.

23 Oct 2002 (updated 23 Oct 2002 at 08:50 UTC) »
baruch: To verify signatures you check that the pre-hashes hash to what they're supposed to, there's not much to it.

I didn't realize until after my earlier post how much hotornot resembles animal. There truly is nothing new under the sun.

Graph Isomorphism Hard. Yeah, Right.

Complexity theorists talk about graph isomorphism as if it's a hard problem, since there's no known polynomial time algorithm. I'm extremely skeptical, since although I can't prove a particular algorithm is polynomial, there are ones which seem extremely hard to fool.

Feeding my skepticism is that the existing packages don't give any instances of isomorphism which they claim to be hard. They give examples of isomorphic graphs which are tricky to find mappings for, but no non-isomorphic graphs which are supposedly hard to identify as such. Since polynomial algorithms for determining isomorphism can generally be turned into polynomial ones for finding a mapping, (by being used as a method of pruning an exhaustive search), this seems tantamount to an admission that there are no hard instances.

The obvious algorithms for checking graph isomorphism all amount to finding some value for each node, then sorting those values, and comparing the sorted lists. Methods for coming up with these values inclued the following (all of these assume there are no separate islands in the graph, the case where there are islands is easy to deal with) -

  1. Calculate the minimum number of hops to each other node, and sort that list.

  2. Calculate the fraction of the time you'd spend at the given node if you did a neverending random walk around the graph along edges.

  3. Assume that you'll drop all other nodes with some probability. Calculate the value for that probability which makes the expected number of remaining reachable nodes equal to half the total. This can be computed via sampling, which requires some randomness and uncertainty, but is still polynomial.

All of these, of course, can be used jointly. Very symmetrical graphs can sometimes look homogenous to the above, but that can be fixed by for each node calculating the values for all other nodes if that node were removed and sorting the results. This can of course be done for multiple iterations, although each iteration multiplies the runtime by the number of nodes. Different iteration levels can, of course, be used jointly.

Given the above extremely powerful isomorphism checking techniques, I find it dubious that there are any hard instances of graph isomorphism. It looks rather like finding an odd perfect number - we have no proof that it doesn't exist, but the known constraints on it are quite ludicrous.

Note that for all of the above I'm talking about the problem of determining whether a mapping exists, or finding a single one, not the problem of finding all mappings. This is the problem which the interesting zero knowledge proof applies to, I'm distinctly uninterested in automorphisms.

Digital Signatures

It's possible to make digital signatures using a secure hash as the only primitive. This is very reassuring, since the sheer number of moving parts in modular group based signatures is huge.

The basic trick is to publish n pairs of hashes, where n is the number of bits in the secure hash. To sign something, you publish the pre-hashes of the one of each pair, picking which one based on the bits of the secure hash to be signed.

This technique can only sign one value, but can be improved considerably.

A straightforward improvement is to publish n + log(n) pre-hashes, and sign using a subset of exactly half of those, which there are about 2^n of. This doesn't change the number of things which can be hashed any, but does make the hashes smaller.

More things can be signed by making the public key be the root of a hash tree. This is fairly computationally intensive on the part of the signer - the entire tree has to be generated up front, but it does allow multiple things to be signed, and doesn't increase the size of the signature much. A new restriction is that with each signature there is a number, whose value isn't significant other than that the same number can't be used twice the system becomes insecure.

Even more things can be signed by making the first value signed be another public key. This can be repeated for any number of iterations, and results in a total number of signatures which can be done equal to the number which can be signed at each iteration to the power of the number of iterations. The algorithm for computing the later public keys must be deterministic based on the private key, to keep different public keys from getting signed by a single intermediate key.

Put together, those tricks result in performance which is quite practical for many applications. A completely excessive safety margin results in 20 kilobyte signatures. I have a functioning implementation, which unfortunately has not to my knowledge been vetted by anyone, but that should be straightforward to do.

Unfortunately the best known public key encryption algorithms still are along the lines of merkle puzzles, and require bandwidth equal to the square root of the computational ability they're trying to defend against. Bloom filters can reduce the value by a constant factor of about three, but it's still very far from practical. It would be nice to have a complete crypto toolset which didn't use any of that modular exponentiation nobody really trusts.

21 Oct 2002 (updated 21 Oct 2002 at 07:15 UTC) »
Game Interfaces

I've been playing online games some over the last few days. A frustrating amount of time is spent trying to find games rather than playing them. A hotornot-style interface, in which the server guesses which people you'd like to play against and asks a yes/no as to whether you'd like to play them would be a vast improvement. There are several algorithms which could be used to determine selections and decide when a match happens. It's an interesting problem.

Also lacking are personalized thumbnail images like are used on livejournal and some chat sites. The more serious online communities tend to look down on images, due to the 'asl' banter flavor of them, which is unfortunate, since context greatly improves social interaction.

None of the online game sites I've used have an option for the players to replay a game and comment on it after it's done. This is one of the more social aspects of playing over the board gaming; it would be a nice addition.

Another missing feature is an integrated opening book which automatically displays your comments on the current position as play unfolds. Opening books are often viewed as cheating, but if everyone had them it would be fair, and rote memorization isn't a fundamental aspect of most games.

I'm interested in getting into hedge funds, so I'm reading up on finance. Currently I'm reading 'Options, Futures, and Other Derivatives' by John Hull. I haven't finished it yet, but I'm going to post a review anyway.

This is the classic book for beginning traders to learn the basics from. Its structure is what you would expect - very simple, clear prose, and comprehensive coverage of all the basics.

My problem with reading this book is that it keeps making me fall asleep. After a few pages I have to take a nap. After enough naps I can't sleep any more and go into a trance-like dissociated state with the words passing through my brain with no comprehension. All the formula derivation in this book is spelled out in painstaking detail, with whole sections devoted to formulas anyone with a decent mathematical background could re-derive in a few minutes. Also, key bits of math are dropped out, presumably because they're too difficult. According to the reviews on amazon those derivations are actually misrepresented and the formulas given are approximate, even assuming that the underlying model is perfect. I think all readers would be better served with a treatment which simply gave the formulas, without burying them at the end of tedious derivations which are either too hard or too boring for any given reader.

I'll post a review of another, much more promising book I'm reading in the future.

movement: I suggest you view recentlog with a threshold greater than the default of 3, and rate diaries which are better than trolls but not what you want to read ratings somewhere between 3 and whatever threshold you set.

Circuit Complexity

Here's a page on circuit complexity. If only it used programming terminology instead of math notation I'd probably understand all of it.

I have a conjecture that given 2*k lists of n strings, each of them a small multiple of lg(n) long, the question of whether it's possible to select exactly one string from each list and have their xor be all zero has O(n^(2*k)) circuit complexity and O(n^k) real complexity, give or take a few log(n)s. The faster algorithm is to make a set of all sums from the first k lists, and do look-ups for all sums in the second k lists. This is a very well-behaved tradeoff between time and memory - it exactly matches the lower bound implied by the circuit complexity.

If proven, this conjecture would imply that P != NP, which suggests that it's very hard to prove, but it does have a very approachable feel. I think the people searching for proofs than P = NP is undecideable are being far too pessimistic.

Computers

My new computer is now working with the pre-installed Windows XP. XP detects all the peripherals properly, and has a useable command line, so I'll probably leave it this way. The user interface differences are interesting. XP now has lots of mouseover highlighting, which is nice, but still has far too little pre-installed, and the pre-installed windows media player and musicmatch are garbage, I'm using winamp.

Sadly, most windows application UIs have gotten worse since I last used them. Mirc's new server dialog should be a huge improvement but instead makes logging into multiple servers a UI disaster, and it still doesn't have a decent custom widget for displaying chat. X-chat is way ahead of it in many regards, but mirc does have the very nice feature of grouping channels you're on by server and alphabetizing them, the as-opened ordering in x-chat is ridiculous. Mozilla should open new tabs in the position after the current one instead of at the end.

Paint Shop Pro's usability has gone retrograde in a major way. They seem to be trying to compete with Photoshop, instead of being happy in the quick and easy but serviceable niche for people like me who have no aspirations of becoming graphic artists.

All the ftp clients have become almost unuseable. The only one left which lets you just say 'put this file here' is, ironically, WinSCP, presumably because they haven't spent enough time destroying its UI yet.

Command line CVS, surprise surprise, didn't seem to work at all, and there aren't even any decent help pages left on it. TortoiseCVS is quite good though, I'm happy with it.

But all of those pale in importance to stability problems. Under Windows, peripherals generally just work, which is wonderful. But the CD auto-play is flaky and refuses to work even when configured properly. And there's some bug in windows explorer which causes it to occasionally turn the whole screen gray which I have to control-alt-delete to get out of. Qwerty/Dvorak conversion continues to be a complete disaster. The converter is even harder to find than before, and still changes each process individually instead of system-wide, a behavior I can't even imagine wanting as a special feature, much less default behavior.

Thankfully I'm behind a firewall and never going to use outlook on this machine, so the biggest security concerns aren't a big deal, but I do worry about the multiple chat clients I've got running. Security's a big problem on everything though.

Conclusion - I still hate computers. OS wars continue to be a competition of who has fewer unacceptable deficiencies.

I can't complain about what the hardware people have done though - the new machine redraws graphics almost instantly, even in mozilla, and that's by far the most pratical difference for me.

1 Oct 2002 (updated 1 Oct 2002 at 03:26 UTC) »
mwh: Done. I also put in __rsub__ and added some asserts. I've separated out the rational class and put it on my home page.

ciphergoth: The argument you put forth applies to some types of trust metrics, but not the one I gave.

Specifically, I think your argument applies to trust metrics in which first compute a weight for each attester, then combine their judgements by simple averaging. The second step of the algorithm I gave is considerably more complicated. You imagine doing a random walk around the graph, with each hop to one of the ones certified by the current node, weighting the chances of going to each one proportionally to its confidence. The histogram of what this process will result in can be averaged in any way you choose, median is probably best.

If an attacker has compromised some nodes, their most effective attack will be to make all those nodes attest to what they want. Adding lots of nodes and certs can only asymptotically approach that amount of influence.

After thinking about it a bit, I realized that sampling works quite well as a method of computing confidences. Randomized algorithms show up in the most surprising places.

32 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!