I spoke with raph for a while yesterday about trust metric criteria. It turns out all the non-definitional criteria I gave yesterday can be consolidated into a single one.
Say we're comparing graphs P and Q. For both of them, we make a list of all paths from me to an attester which don't visit the same node twice and don't pass along an attester in the middle. If the resulting lists are isomorphic, then their confidences must be the same. If a subset of P's list is isomorphic Q's list with possibly some hops added to some entries, then P's confidence must be greater than Q's.
For example, if I cert A, A certs X, and X attests, then that must have a higher confidence than if I cert A, A certs B, B certs X, and X attests, since (A, B, X) can be transformed into (A, X) by removing the B.
This criterion is clearly reasonable - in such circumstances for every attack on A there is a strictly easier attack on B, hence A's confidence must be higher than B's. Unfortunataely this criterion isn't very restrictive - both shortest path and max independent paths meet it - hence it is unlikely to directly lead to a formula.
Failing this criterion leads to an interesting type of attack. If an attacker can induce good nodes to cert other good nodes, then he can reduce their influence. This is a potentially insidious kind of attack. There is nothing obviously wrong after the attack has taken place, and even if there is, what are you going to do about it, remove the certs?
Another major problem with violating this criterion is that it strongly discourages nodes from certing others under some circumstances, since it could potentially reduce the rating strength of members of one's own clique qute substantially.
A somewhat related problem occuring in the random walk model is that it tends to look a lot like shortest path. My ratings confidences are so low that if just about anyone I've directly certified gives a rating, that will drown out all the others. This is arguably reasonable behavior, but not what should be happening when the chances of giving up at each hop are as ridiculously low as .05.
I've devised an algorithm which appears to get this problem under control. Consider the advogato diary ratings system as it exists today. Unlike the simplified model I gave yesterday, this yields not just a confidence value but a rating as well.
We will give each node has a weight, in the range [0, 1]. When doing a random walk, we will pick a next hop to go to with probability proportional to its weight. If a node certifies two others with weights a and b, the probability that the next hop will go to the first one is a / (a + b) and the second one is b / (a + b). Normal random walk corresponds to all nodes having a weight of 1.
We will do the algorithm in two parts. In the first part, we reduce the weight of each node to the probability that a random walk starting from there won't give up. This is unfortunately nonlinear, since the probability of giving up is based on the weights, but now the weights are based on the probability of giving up. Fortunately a very good approximation algorithm exists - we simply calculate the probabilities given the current weights, then recalculate again with the new weights, and repeat several times. I think this will consistently give very good values after only four iterations.
Note that all dead ends will now have probability zero, and all near dead ends will have very low probability. Also, longer paths will have lower weight than shorter paths.
Next we compute the diary rating based on the random walk method given the computed weights and with no probability of giving up at each hop. The justification for having no chance of giving up is that longer paths were already reduced in weight by the first step. A special case is when all weights are zero, in which case we have no rating.
Although it gets closer, this technique still doesn't meet my earlier criterion. For example, if I cert A, A certs B, B certs C, and C attests, then adding a B cert of A reduces A's weight. I think this phenomenon isn't all that big a deal in practice. Fixing it would probably have to involve making paths not visit the same node twice, and anything involving that is computationally problematic - for example, simply calculating the number of paths between two nodes which don't visit the same node twice appears very difficult.