Advogato: A scalable Gnutella-like protocol?

Posted 15 Feb 2001 at 02:20 UTC by Omnifarious

I've been thinking about his for awhile, and I figure it's about time to record down what I've been thinking somewhere where people will see it. Perhaps it's redundant or wrong. I hope not.

You build something that uses a distributed algorithm to build a spanning tree. The nodes near the top of the spanning tree become the servers. You build the algorithm so that parents in your spanning tree will naturally have more bandwidth than you do.

I've been thinking about this for a long while.

Building the spanning tree isn't hard. Every node just selects one and only one parent node. They tell the parent that they're a child of that parent. You prevent cycles having a parent refuse to be a parent unless it also has a parent. If it loses its connection to its parent, it tells all the children that it no longer is a parent. One node 'seeds' the network as a root by saying it can be a parent without being a parent and not looking for a parent. Eventually it can delegate roothood to a child that has proven high bandwidth. It cannot cease being a root without doing this delegation.

You can have connections to nodes that are neither parents nor children, but search requests should not be propogated to those nodes unless you have no parent. Eventually a search request will make it onto the spanning tree and be efficiently distributed.

You can eventually elect servers who are near the top of the spanning tree. Nodes should, in general, elect parents that have more bandwidth than they do. This means that nodes near the top of the spanning tree should have the most bandwidth.

There are more things to think about here. In the main, how do you prevent malicious people (it's already relatively immune to the slow and heavy handed maneuvers of the US justice system) from destroying your network? But, these are my thoughts for now.

Oh, yes..., posted 15 Feb 2001 at 02:28 UTC by Omnifarious »

I would like to implement this using my TheStreamModuleSytem, but I develop very slowly and methodically, and nobody else is helping me. *sigh*

Some thoughts, posted 15 Feb 2001 at 08:10 UTC by raph »

I've been thinking about these sorts of issues recently, possibly because I'll be spending Friday at the O'Reilly P2P conference, helping Zooko present a paper on some Advogato-inspired content filtering ideas we cooked up.

First, as the experience with IRC amply demonstrates, spanning trees are a horrible architecture for distributed networks. The main problem is that they are so vulnerable - a single node goes down, and your tree is broken. The result is all too well known on IRC as a "netsplit". In addition, a slow node can easily function as a bottleneck. Ankh has done some analysis showing that spanning trees don't necessarily optimize network traffic, either, one of the main motivations for its use in IRC.

Network architecture isn't really the main problem. The fundamental reason that Gnutella can't scale is that each search query must be broadcast to all servers. A very simple analysis shows why this can't scale.

Thus, in order to make a fully decentralized Napster-like service work, you need to do intelligent distribution of the searches. Specifically, while the search metadata needs to be distributed across all servers in the system, only a small number of servers should be needed for any one search.

Here, I'll outline a very simple approach for single-keyword searching. Assume that each server has a hash-derived ID as in Mojo Nation. Hash the keyword. All servers whose id's match the first k bits are authoritative for that keyword. If you want to query based on that keyword, you need only find a single such server and query it. If you want to publish an item containing that keyword, you need to notify all such authoritative servers.

While I think that this general approach can work well, there are a lot of problems with specifics. First, it suffers from hot spots for popular searches. God help the DSL lines of the poor users whose ID's match the hash of "britney". Also, k is a "voodoo constant" - a tunable parameter which can cause poor performance if too small (increase in network bandwidth for publishing) or too large (greater chance of not finding any server for a keyword; more tightly focussed hotspots). As I've described it, the technique is also not suitable for boolean searches, substring searches, etc., although it could be extended in this direction.

The bottom line is: we don't yet know how to build scalable, decentralized networks. It's one of the most interesting challenges facing computer science today. If I were an academic, this is probably where I would be doing my research - I believe that formal techniques of analysis are an essential tool for figuring out how to design such a network.

That said, Mojo Nation is one of the more interesting projects out there. It generally meets most of the goals for a decentralized network. It certainly has its share of problems - the "metatracker" has been a point of centralization, and while the new distributed metatracker relieves this single point of failure, the current design won't scale up to a huge network. Further, searches don't work particularly well. But I think these are problems that can be fixed, and that the fundamental architecture is sound.

I am optimistic that fully decentralized networks can be designed, implemented, and deployed. We do need to learn a hell of a lot more about them, though. The best way to do that is to build some, and study them. Go forth, and have fun! In the worst case, you'll learn quite a bit about networking.

Supporting evidence, posted 15 Feb 2001 at 11:58 UTC by dnm »

Jordan Ritter, formerly of Napster, finalized and released his analysis on Gnutella and it's problems. Bias disclosure: I know Jordan personally.

I have my own share of issues with Gnutella, but from as much as I can see (and have seen), none of them are anything that someone hasn't already said, and said better than I would have, really. One of my personal favorites is Seth McGann's paper on the potential for self-replicating Gnutella-spread virii. Bias disclosure: My current employer is OpenCola, arguably a competitor to Gnutella (although more correctly, a competitor to Gonesilent, since Gnutella isn't a company). Seth McGann and I are ex-roommates and friends.

The problem IRC has to solve is different. IRC has to maintain a lot of shared state about a channel and has to keep that state consistent, even across a network split. Also, if I'm not mistaken, IRC's topology is relatively static.

The problem of spamming out search requests is a different problem. Even if you are temporarily not part of the spanning tree, you can still send such requests out to all your neighbors. Eventually the request will find its way onto the spanning tree. Of course, you'll have a tendency not to recieve requests until you're part of the spanning tree, but given the number of connections you have and their fluid nature, it shouldn't take long to get back on it.

I think that IRC could even be redesigned with a method of quickly rebuilding the spanning tree, a different set of rules about what to do when it's broken, and end up working a lot better.

What you mention with Mojo nation ties the search method and characteristics of the network together. While DNS has had great success in this regard, I would prefer an approach that didn't do that. Search technologies are changing all the time too, and I would prefer a bit of orthogonality between search techniques and network topology maintenance.

Dynamically building a spanning tree according to a set of rules ought to leave you with good candidates for 'superservers' at the top of the tree.

spanning trees, posted 15 Feb 2001 at 20:01 UTC by Ankh »

Actually a minimum spanning tree neither minimises bandwidth usage where it matters, nor is reliable.

IRC used it, but the mathematics is wrong today. The problem is the link between the computer running the server, and the ISP router. That's usually the thinnest pipe, so that's where you have to reduce bandwidth. Every time a server is an intermediate node in the minimum spanning tree, packets are coming in to it and then going out again.

I've written a little about ideas of clouds of P2P IRC++ servers here.

suppose rather than asking for britney content, you ask for an index of britney content. still just a blob of data you're looking up, so logically it has a key and can be stored somewhere in your network. now assume your request routing architecture is decentralized, so you locate content by taking a key for it and searhing the space N-bits worth of key at a time, while you're walking towards the key, you could just as easily ask the machine you're passing by if it has seen any britney content recently. chances are it will not have much, but it might have a little sitting in its cache. when you arrive at a server storing the index of britney content, you have conveniently also built up a small list of additions to make to that index. you merge your little bit of information into the index, fetch the full index, (which gets cached along the way back) and pick your britney content out of it. if a local cache somewhere along your lookup already had the britney index, you can fetch the cached version (avoiding strain on the authoritative one) and send only your write upstream to the head of the index.

this is not the sort of approach which works on a network where you lack the ability to make updates, but such networks are imho not nearly as fun. if you really want to make non-modifiable data, all you need to do is flag your data as "don't let my owner update me", or so.

It might very well be true that a minimum spanning tree does not minimize total network bandwidth. I'd have to think about it a bit and do some detailed analysis and read some research on the topic.

OTOH, IRC is NOT applicable. The idea I proposed does not require that connections that are not part of the minimum spanning tree be dropped. In fact, it encourages occasional testing of various connections, and reselection of a new parent connection on discovery of a better path. This will result in people with low bandwidth links being pushed to the leaves of the spanning tree. It will also be fault tolerant. A network split shouldn't last very long because you just elect another link to be your 'parent' link. If you can do this quickly enough, you don't even have to inform children that you are no longer a parent.

As far as the search idea is concerned... I thought about various schemes for trying to reduce the amount of the network you have to search. I abandon them for two reasons.

One is that I feel strongly that search techniques should be independent of network topology issues. This cannot be underemphasised. Search techniques are evolving continuously. The attributes you look for change. The methods of specifying what you're looking for alter too. I even envision support for content based searches someday. I want to be able to support all of that without changes to how the network topology organizes itself.

The other is that many techniques, such as the one graydon describes, will result in incomplete or incorrect information collecting at various points and describing itself as authoritative. For a filesharing network, this is very damaging, especially if the information persists for any length of time.

One interesting idea is to have nodes that dynamically create broadcast channels and allow hosts to subscribe to them. This would make sending searches out to every host on the network VERY efficient. Sadly, a subscription to the MBone is very expensive. I've checked into it as part checking into whether I could help a friend doing video over IP research .

mbone..routing..., posted 16 Feb 2001 at 19:42 UTC by Malx »

Do not look into MBONE if you redesigning network routing from scratch. It is what you are really doing :)
You trying to make new DNS and routing over existent IP/DNS

Also - think about making you algorithm be able to work with content
So bandwith would be equal to "amount of data on selected subject on some node" (or sum of 2 points)
This would create virtual spaning tree of interested persons (servers with more data in "computers" or "health" or ... field).
Note - these subject-trees whould be independent one from another (and could be calculated automatically by persentage of word in given subject - see "similar document" algorithms)

BTW - From Gnutella stats you could see that there much more queries then new data additions - so invertion of this would save bandwidth
(do not ask but wait for all announces and save on servers only thous you have selected (interested) for server).

I like division by content type like you suggested. I had thought about that kind of scheme, and even making it announcement based, and having seperate trees for seperate subjects. That makes some kinds of searching more efficient, without precluding any particular search technique or leaving around bad/out of date data with no clear way of removing it.

Yeah, I am kind of trying to reinvent Internet routing, with one difference. You can technically have a 'connection' from any point to any point on the entire network, which most data link layers don't handle (ATM sort of being an exception, and not exactly in the data-link layer either). It's 'session layer' routing. :-).

Actually, since quick response is most important for searches, I was going to make the optimization metric for the spanning tree based on ping time, not any bandwidth measure. I think this would have the attribute of making the session layer topology more closely match the network layer topology.

Raph: the distributed content-search idea rocks.

Nejucomo and I implemented it on Mojo Nation today and committed the patch (although it is not enabled in the default install).

I think that we fixed the latter problem, the "voodoo constant" issue, by making it not a binary inclusion/exclusion decision, but instead a "handicap" which returns a scalar indicating the degree of preference you have for each Content Tracker for each query.

Mojo Nation already does (if it works) load-balancing so the hotspots should be smoothed out (by hot servers upping their "mojo price" number).

The other problem of "which (un)lucky server has a hash-collision with 'Britney'", I decided was a feature-not-a-bug. We want some (dynamic, anarchic, load-balanced) centralization of content tracking for efficiency, and if people do deadbeef work to make good ids, then that is a form of "proofs of work" which help ensure that you want a good reputation. (Because throwing out your ID and going to generate a new ID is expensive.)

Anyway, do a CVS up and look at "common/ContentHandicapper.py".

Regards,

Zooko

A scalable Gnutella-like protocol?

Posted 15 Feb 2001 at 02:20 UTC by Omnifarious

Oh, yes..., posted 15 Feb 2001 at 02:28 UTC by Omnifarious » (Journeyer)

Some thoughts, posted 15 Feb 2001 at 08:10 UTC by raph » (Master)

Supporting evidence, posted 15 Feb 2001 at 11:58 UTC by dnm » (Master)

Need to do more research in IRC, but..., posted 15 Feb 2001 at 12:09 UTC by Omnifarious » (Journeyer)

spanning trees, posted 15 Feb 2001 at 20:01 UTC by Ankh » (Master)

searching a distributed network, posted 15 Feb 2001 at 23:16 UTC by graydon » (Master)

More responses, posted 16 Feb 2001 at 14:45 UTC by Omnifarious » (Journeyer)

mbone..routing..., posted 16 Feb 2001 at 19:42 UTC by Malx » (Journeyer)

Division by content type, posted 17 Feb 2001 at 06:09 UTC by Omnifarious » (Journeyer)

Raph's distributed content-search implemented!, posted 19 Feb 2001 at 03:35 UTC by Zooko » (Master)