Older blog entries for Bram (starting at number 37)

23 Oct 2002 (updated 23 Oct 2002 at 08:50 UTC) »
baruch: To verify signatures you check that the pre-hashes hash to what they're supposed to, there's not much to it.

I didn't realize until after my earlier post how much hotornot resembles animal. There truly is nothing new under the sun.

Graph Isomorphism Hard. Yeah, Right.

Complexity theorists talk about graph isomorphism as if it's a hard problem, since there's no known polynomial time algorithm. I'm extremely skeptical, since although I can't prove a particular algorithm is polynomial, there are ones which seem extremely hard to fool.

Feeding my skepticism is that the existing packages don't give any instances of isomorphism which they claim to be hard. They give examples of isomorphic graphs which are tricky to find mappings for, but no non-isomorphic graphs which are supposedly hard to identify as such. Since polynomial algorithms for determining isomorphism can generally be turned into polynomial ones for finding a mapping, (by being used as a method of pruning an exhaustive search), this seems tantamount to an admission that there are no hard instances.

The obvious algorithms for checking graph isomorphism all amount to finding some value for each node, then sorting those values, and comparing the sorted lists. Methods for coming up with these values inclued the following (all of these assume there are no separate islands in the graph, the case where there are islands is easy to deal with) -

  1. Calculate the minimum number of hops to each other node, and sort that list.

  2. Calculate the fraction of the time you'd spend at the given node if you did a neverending random walk around the graph along edges.

  3. Assume that you'll drop all other nodes with some probability. Calculate the value for that probability which makes the expected number of remaining reachable nodes equal to half the total. This can be computed via sampling, which requires some randomness and uncertainty, but is still polynomial.

All of these, of course, can be used jointly. Very symmetrical graphs can sometimes look homogenous to the above, but that can be fixed by for each node calculating the values for all other nodes if that node were removed and sorting the results. This can of course be done for multiple iterations, although each iteration multiplies the runtime by the number of nodes. Different iteration levels can, of course, be used jointly.

Given the above extremely powerful isomorphism checking techniques, I find it dubious that there are any hard instances of graph isomorphism. It looks rather like finding an odd perfect number - we have no proof that it doesn't exist, but the known constraints on it are quite ludicrous.

Note that for all of the above I'm talking about the problem of determining whether a mapping exists, or finding a single one, not the problem of finding all mappings. This is the problem which the interesting zero knowledge proof applies to, I'm distinctly uninterested in automorphisms.

Digital Signatures

It's possible to make digital signatures using a secure hash as the only primitive. This is very reassuring, since the sheer number of moving parts in modular group based signatures is huge.

The basic trick is to publish n pairs of hashes, where n is the number of bits in the secure hash. To sign something, you publish the pre-hashes of the one of each pair, picking which one based on the bits of the secure hash to be signed.

This technique can only sign one value, but can be improved considerably.

A straightforward improvement is to publish n + log(n) pre-hashes, and sign using a subset of exactly half of those, which there are about 2^n of. This doesn't change the number of things which can be hashed any, but does make the hashes smaller.

More things can be signed by making the public key be the root of a hash tree. This is fairly computationally intensive on the part of the signer - the entire tree has to be generated up front, but it does allow multiple things to be signed, and doesn't increase the size of the signature much. A new restriction is that with each signature there is a number, whose value isn't significant other than that the same number can't be used twice the system becomes insecure.

Even more things can be signed by making the first value signed be another public key. This can be repeated for any number of iterations, and results in a total number of signatures which can be done equal to the number which can be signed at each iteration to the power of the number of iterations. The algorithm for computing the later public keys must be deterministic based on the private key, to keep different public keys from getting signed by a single intermediate key.

Put together, those tricks result in performance which is quite practical for many applications. A completely excessive safety margin results in 20 kilobyte signatures. I have a functioning implementation, which unfortunately has not to my knowledge been vetted by anyone, but that should be straightforward to do.

Unfortunately the best known public key encryption algorithms still are along the lines of merkle puzzles, and require bandwidth equal to the square root of the computational ability they're trying to defend against. Bloom filters can reduce the value by a constant factor of about three, but it's still very far from practical. It would be nice to have a complete crypto toolset which didn't use any of that modular exponentiation nobody really trusts.

21 Oct 2002 (updated 21 Oct 2002 at 07:15 UTC) »
Game Interfaces

I've been playing online games some over the last few days. A frustrating amount of time is spent trying to find games rather than playing them. A hotornot-style interface, in which the server guesses which people you'd like to play against and asks a yes/no as to whether you'd like to play them would be a vast improvement. There are several algorithms which could be used to determine selections and decide when a match happens. It's an interesting problem.

Also lacking are personalized thumbnail images like are used on livejournal and some chat sites. The more serious online communities tend to look down on images, due to the 'asl' banter flavor of them, which is unfortunate, since context greatly improves social interaction.

None of the online game sites I've used have an option for the players to replay a game and comment on it after it's done. This is one of the more social aspects of playing over the board gaming; it would be a nice addition.

Another missing feature is an integrated opening book which automatically displays your comments on the current position as play unfolds. Opening books are often viewed as cheating, but if everyone had them it would be fair, and rote memorization isn't a fundamental aspect of most games.

I'm interested in getting into hedge funds, so I'm reading up on finance. Currently I'm reading 'Options, Futures, and Other Derivatives' by John Hull. I haven't finished it yet, but I'm going to post a review anyway.

This is the classic book for beginning traders to learn the basics from. Its structure is what you would expect - very simple, clear prose, and comprehensive coverage of all the basics.

My problem with reading this book is that it keeps making me fall asleep. After a few pages I have to take a nap. After enough naps I can't sleep any more and go into a trance-like dissociated state with the words passing through my brain with no comprehension. All the formula derivation in this book is spelled out in painstaking detail, with whole sections devoted to formulas anyone with a decent mathematical background could re-derive in a few minutes. Also, key bits of math are dropped out, presumably because they're too difficult. According to the reviews on amazon those derivations are actually misrepresented and the formulas given are approximate, even assuming that the underlying model is perfect. I think all readers would be better served with a treatment which simply gave the formulas, without burying them at the end of tedious derivations which are either too hard or too boring for any given reader.

I'll post a review of another, much more promising book I'm reading in the future.

movement: I suggest you view recentlog with a threshold greater than the default of 3, and rate diaries which are better than trolls but not what you want to read ratings somewhere between 3 and whatever threshold you set.

Circuit Complexity

Here's a page on circuit complexity. If only it used programming terminology instead of math notation I'd probably understand all of it.

I have a conjecture that given 2*k lists of n strings, each of them a small multiple of lg(n) long, the question of whether it's possible to select exactly one string from each list and have their xor be all zero has O(n^(2*k)) circuit complexity and O(n^k) real complexity, give or take a few log(n)s. The faster algorithm is to make a set of all sums from the first k lists, and do look-ups for all sums in the second k lists. This is a very well-behaved tradeoff between time and memory - it exactly matches the lower bound implied by the circuit complexity.

If proven, this conjecture would imply that P != NP, which suggests that it's very hard to prove, but it does have a very approachable feel. I think the people searching for proofs than P = NP is undecideable are being far too pessimistic.

Computers

My new computer is now working with the pre-installed Windows XP. XP detects all the peripherals properly, and has a useable command line, so I'll probably leave it this way. The user interface differences are interesting. XP now has lots of mouseover highlighting, which is nice, but still has far too little pre-installed, and the pre-installed windows media player and musicmatch are garbage, I'm using winamp.

Sadly, most windows application UIs have gotten worse since I last used them. Mirc's new server dialog should be a huge improvement but instead makes logging into multiple servers a UI disaster, and it still doesn't have a decent custom widget for displaying chat. X-chat is way ahead of it in many regards, but mirc does have the very nice feature of grouping channels you're on by server and alphabetizing them, the as-opened ordering in x-chat is ridiculous. Mozilla should open new tabs in the position after the current one instead of at the end.

Paint Shop Pro's usability has gone retrograde in a major way. They seem to be trying to compete with Photoshop, instead of being happy in the quick and easy but serviceable niche for people like me who have no aspirations of becoming graphic artists.

All the ftp clients have become almost unuseable. The only one left which lets you just say 'put this file here' is, ironically, WinSCP, presumably because they haven't spent enough time destroying its UI yet.

Command line CVS, surprise surprise, didn't seem to work at all, and there aren't even any decent help pages left on it. TortoiseCVS is quite good though, I'm happy with it.

But all of those pale in importance to stability problems. Under Windows, peripherals generally just work, which is wonderful. But the CD auto-play is flaky and refuses to work even when configured properly. And there's some bug in windows explorer which causes it to occasionally turn the whole screen gray which I have to control-alt-delete to get out of. Qwerty/Dvorak conversion continues to be a complete disaster. The converter is even harder to find than before, and still changes each process individually instead of system-wide, a behavior I can't even imagine wanting as a special feature, much less default behavior.

Thankfully I'm behind a firewall and never going to use outlook on this machine, so the biggest security concerns aren't a big deal, but I do worry about the multiple chat clients I've got running. Security's a big problem on everything though.

Conclusion - I still hate computers. OS wars continue to be a competition of who has fewer unacceptable deficiencies.

I can't complain about what the hardware people have done though - the new machine redraws graphics almost instantly, even in mozilla, and that's by far the most pratical difference for me.

1 Oct 2002 (updated 1 Oct 2002 at 03:26 UTC) »
mwh: Done. I also put in __rsub__ and added some asserts. I've separated out the rational class and put it on my home page.

ciphergoth: The argument you put forth applies to some types of trust metrics, but not the one I gave.

Specifically, I think your argument applies to trust metrics in which first compute a weight for each attester, then combine their judgements by simple averaging. The second step of the algorithm I gave is considerably more complicated. You imagine doing a random walk around the graph, with each hop to one of the ones certified by the current node, weighting the chances of going to each one proportionally to its confidence. The histogram of what this process will result in can be averaged in any way you choose, median is probably best.

If an attacker has compromised some nodes, their most effective attack will be to make all those nodes attest to what they want. Adding lots of nodes and certs can only asymptotically approach that amount of influence.

After thinking about it a bit, I realized that sampling works quite well as a method of computing confidences. Randomized algorithms show up in the most surprising places.

lukeg's post prompted me to spend several hours yesterday doing a Python implementation of the nine nines problem. My solution contains a much better rational number class than any of the ones I found doing a quick web search.

With some help from demoncrat I came up with a much better weightings algorithm than the one I gave yesterday.

Simply set the weight of each node to the probability that there will be an uncompromised path from that node to an attester. This technique obviously meets the increasing confidence criterion I gave, and is much simpler and easier to analyze. I also find it much more intuitively compelling.

Unfortunately it might not be possible to efficiently compute, or even approximate, said probability. I hope it is.

A probability of compromise of 1/2 seems to square with my own judgements of graphs imagining the nodes as people I know personally.

I spoke with raph for a while yesterday about trust metric criteria. It turns out all the non-definitional criteria I gave yesterday can be consolidated into a single one.

Say we're comparing graphs P and Q. For both of them, we make a list of all paths from me to an attester which don't visit the same node twice and don't pass along an attester in the middle. If the resulting lists are isomorphic, then their confidences must be the same. If a subset of P's list is isomorphic Q's list with possibly some hops added to some entries, then P's confidence must be greater than Q's.

For example, if I cert A, A certs X, and X attests, then that must have a higher confidence than if I cert A, A certs B, B certs X, and X attests, since (A, B, X) can be transformed into (A, X) by removing the B.

This criterion is clearly reasonable - in such circumstances for every attack on A there is a strictly easier attack on B, hence A's confidence must be higher than B's. Unfortunataely this criterion isn't very restrictive - both shortest path and max independent paths meet it - hence it is unlikely to directly lead to a formula.

Failing this criterion leads to an interesting type of attack. If an attacker can induce good nodes to cert other good nodes, then he can reduce their influence. This is a potentially insidious kind of attack. There is nothing obviously wrong after the attack has taken place, and even if there is, what are you going to do about it, remove the certs?

Another major problem with violating this criterion is that it strongly discourages nodes from certing others under some circumstances, since it could potentially reduce the rating strength of members of one's own clique qute substantially.

A somewhat related problem occuring in the random walk model is that it tends to look a lot like shortest path. My ratings confidences are so low that if just about anyone I've directly certified gives a rating, that will drown out all the others. This is arguably reasonable behavior, but not what should be happening when the chances of giving up at each hop are as ridiculously low as .05.

I've devised an algorithm which appears to get this problem under control. Consider the advogato diary ratings system as it exists today. Unlike the simplified model I gave yesterday, this yields not just a confidence value but a rating as well.

We will give each node has a weight, in the range [0, 1]. When doing a random walk, we will pick a next hop to go to with probability proportional to its weight. If a node certifies two others with weights a and b, the probability that the next hop will go to the first one is a / (a + b) and the second one is b / (a + b). Normal random walk corresponds to all nodes having a weight of 1.

We will do the algorithm in two parts. In the first part, we reduce the weight of each node to the probability that a random walk starting from there won't give up. This is unfortunately nonlinear, since the probability of giving up is based on the weights, but now the weights are based on the probability of giving up. Fortunately a very good approximation algorithm exists - we simply calculate the probabilities given the current weights, then recalculate again with the new weights, and repeat several times. I think this will consistently give very good values after only four iterations.

Note that all dead ends will now have probability zero, and all near dead ends will have very low probability. Also, longer paths will have lower weight than shorter paths.

Next we compute the diary rating based on the random walk method given the computed weights and with no probability of giving up at each hop. The justification for having no chance of giving up is that longer paths were already reduced in weight by the first step. A special case is when all weights are zero, in which case we have no rating.

Although it gets closer, this technique still doesn't meet my earlier criterion. For example, if I cert A, A certs B, B certs C, and C attests, then adding a B cert of A reduces A's weight. I think this phenomenon isn't all that big a deal in practice. Fixing it would probably have to involve making paths not visit the same node twice, and anything involving that is computationally problematic - for example, simply calculating the number of paths between two nodes which don't visit the same node twice appears very difficult.

Trust Metric Criteria

Confidence values in diary ratings are currently surprisingly low. It turns out that this is due to a serious deficiency in the way diary ratings are calculated, which I will now explain.

Consider the case of a single statement. We have a set of nodes, one of which is 'me', and a directed graph of their certifications. Some nodes 'attest to' the statement in question. We will calculate a confidence level of the statement in the range [0, 1] for each node, with special interest in our own confidence. Diary rankings are more complicated, but this simplified model is sufficient for a full exposition of the current problems.

The current advogato way of calculating confidence is as follows. Start with yourself, and repeatedly hop to a new position by randomly selecting one of the nodes the current position certifies. If you get to a node which attests to the statement, halt. If you get to a dead end, give up. Before each hop, have a 5% chance of giving up. Your confidence is the fraction of the time this procedure doesn't give up.

While this technique seems reasonable on its surface, it hase a serious deficiency.

the adding certifications criterion - Adding a certification to any graph must not reduce your confidence.

Advogato fails this criterion in even the simplest case. Consider the case where we only cert one other node, and that node attests to the statement. Our confidence will now be .95. If we add a cert to a node which doesn't attest to the statement and doesn't cert anyone else, our confidence will fall by a factor of 2. This is clearly undesireable behavior.

Here are some more criteria, some of which advogato already passes, but all of which are clearly desireable and none of which (I think) imply each other -

adding nodes - If nodes and certs are added in such a way that you have a path along new nodes which ends in a new node which attests to the statement, your confidence must go up. (This eliminates another technique.)

total strangers - If a node you have no path to certifies someone, that must not change your confidence.

extending chain - If you only have one cert and do not personally attest to the stament, your confidence must be less than the person you certed.

skepticism - If you do not personally attest to the statement your confidence must be less than 1.

terminators - If a node attests to the statement, additional certs from it to other nodes must not raise your confidence level.

redundant backlinks - If all paths from you to node B must go through node A, then B certifying A must not increase your confidence.

unreachability - If you have no path to a node which attests to the statement, your confidence must be zero.

full attestation - All nodes which attest have confidence one.

side show - If all paths through node B which arrive at a node which attests must pass through some node A both before and after B, then B's certifications must have no effect on your confidence.

That's all I can think of for now, there may be others. Ideally, we would like a set of criteria such that for any two graphs we could unambiguously prove which of them should yield a higher confidence.

28 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!