Is BitTorrent Evil?

Posted 10 Aug 2007 at 04:21 UTC by ncm Share This

Never mind copyright abuse, porn, and corporate co-optation. Just looking at usage of the global set of point-to-point links ("tubes" per Sen. Stevens), BitTorrent and similar protocols draw down the available network capacity many times more than necessary to move files or frames from the machines that have them to the machines that want them. What can we do instead?

Maybe you're not convinced, because BT has always worked fine for you.

So, think about what happens when Haughty Hydra or Debbie Does Jeddah is released. You grab a .torrent and start downloading. Ten thousand other people do the same thing, all over the world. Most are in the U.S. and Europe. BT starts collecting chunks of the file from whoever has them. Some people who have them are right nearby, but most are far, far away. BT doesn't care where they are, it just takes them. However, the load it puts on the internet when it gets a chunk from across the Atlantic is appallingly greater than when it gets one from nearby.

Each time a packet goes onto a fiber it delays some other packet, and a transatlantic packet may do that dozens of times, going through very heavily-contested fibers. The same payload from the guy next door would have exactly equal value to you, but it would delay many, many fewer packets, over typically less heavily loaded, higher-capacity links. How did that chunk get across the ocean in the first place? Maybe it came from your neighbor, and went all the way across before you copied it back again.

Why should you care? You're paying a flat rate. However, your ISP isn't. They pay per packet, more or less, and extra load drives up prices that come back to you. Worse, you pay in reduced competition, because the companies that own the big tubes, er, pipes have an overwhelming advantage over the little ISPs who have to use them. Reports that 30% of net traffic is BT packets don't mean success, they mean failure: that same amount of data sent optimally might be hard to notice.

What is to be done? We need a protocol that chooses its chunks from the nearest possible source first. The longer you wait to ask for a chunk that's far away, the more likely it is that somebody nearby will have it by the time you do ask. Measuring distance is hard, but measuring latency is a pretty good substitute. Nobody can afford to measure distance to everybody else, but you can trade measurements with nearby nodes you discover.

We need to replace BitTorrent. Has anybody already invented this? Will you?


Yes and no., posted 10 Aug 2007 at 09:58 UTC by quad » (Journeyer)

Mainline Bittorrent has the Cache Discovery Protocol.

Azureus has both the Ono plugin and Vivaldi network distance calculations.

Something about the premise bothers me..., posted 11 Aug 2007 at 00:52 UTC by cdfrey » (Journeyer)

Bittorrent could be seen in the opposite light as well. Put a file on a server in North America, and now 100% of the file is passing across those poor small fibres, for foreign downloads.

At least with Bittorrent, there's a chance you can grab a piece from your next door neighbour.

I don't think that 30% means failure either. It just means people want to share files. Bittorrent is just the best way to do it currently, no matter who you are, individual or megacorporation.

I'm glad to see optimization is being looked into. Seems to be following usual software practice as well: first implement the feature, then worry about optimization.

Local throughput is higher, posted 12 Aug 2007 at 18:17 UTC by Zaitcev » (Master)

The opening blurb ("many times more than necessary") is patently false. A user of a torrent does not download the same piece twice or more more, so the bandwidth use is the same as with FTP. So, opening with blatant lies ruins the article.

The locality argument would have carried some weight if the author provided measurements in support of the thesis. Otherwise one may argue that most torrent connections are local in Internet terms, because local connections offer better bandwidth. Whenever I sample my connections, the majority of users are in North America and very few are in Europe. That's because the transatlantic links are congested and so their throughput is lower. To be sure, this beneficial effect is mitigated by users throttling at the endpoint and thus distorting the picture that a node observes. So we don't know what is right. But this is just why actual research is important instead of handwaving.

When citing "reports" it helps, you know, actually citing them. For one thing, I "heard" "reports" that 75% of traffic on an unnamed "backbone networks" is BT. And which number is right? Maybe both?

Re: Local throughput is higher, posted 12 Aug 2007 at 19:43 UTC by bi » (Journeyer)

"Many times more than necessary" is not really patently false, unless you somehow manage to interpret that to mean "many times more than FTP, which as cdfrey points out probably makes packets go longer distances than necessary anyway".

(But that's just my charitable interpretation of your interpretation. A less charitable interpretation would be something like "my own argument about terrorism and left-wingers was totally busted by ncm, so I'll look for a flimsy excuse to interpret his words in the worst light possible so that I can say something bad about him.")

Anyway, the Cache Discovery Protocol does sound like what ncm wants. But I'm interested to know what further work there is in improving audio and video compression, so that people don't even have to download that much data in the first place.

Compared to what?, posted 14 Aug 2007 at 21:46 UTC by ncm » (Master)

I don't want to make assumptions about why Pete missed the point so badly. We all have lapses. Suffice to say that I never mentioned FTP. I would not have guessed anybody would see FTP as a standard of comparison for optimal use of network resources. Worst case, maybe.

To be precise, an optimal distribution mechanism would send each chunk across the transatlantic link exactly once. If a chunk is sent across five times, that's five times more than necessary. Never mind that FTP might run it across five thousand times.

(Pete, accusing people of lying just because you don't understand their argument interferes with rational discussion. "bi", we can each make up our own uncharitable interpretations; posting yours doesn't help.)

I'm very grateful to quad for pointing out CDP, Ono, and Vivaldi.

Actually, Chris, BT (at least, classical BT) isn't the current best way, from a network utilization standpoint. The current best is Akamai. There are lots of ways in which it's not good, but it's silly to argue about which legacy method is least bad. The point is to invent something better. That BT is a good starting point seems already to have been recognized.

BT Measurements ...., posted 14 Aug 2007 at 23:06 UTC by nymia » (Master)

http://www.isa.its.tudelft.nl/~pouwelse/Bittorrent_Measurements_6pages.pdf

It would be interesting to know, probably come up with an equation showing the relation between proximity and bandwidth utilization.

But then I thought the concept was already established decades ago? As old as the sliding window protocol?

Utilization, posted 15 Aug 2007 at 00:54 UTC by ncm » (Master)

I didn't think the problem would be so hard for people to understand.

Imagine a very simple network: five nodes, connected in a line:


  A <--> B <--> C <--> D <--> E
A has a file, and the rest each want a copy of it. If each takes a copy via FTP, it traverses AB four times, BC three times, CD twice, and DE once, 10 total. What's optimal? It could traverse each link once, 4 total. What does classical BitTorrent do? Each gets about 1/4 of its pieces from each of the other nodes, for a total cost of 7.5.

Now, imagine BC is not a single hop, but actually runs through two dozen routers.


  A <--> B <-- ... --> C <--> D <--> E
Every packet sent through BC costs 24 times as much as one carried on AB or CD. FTP costs 1+25+26+27=79. Optimal costs 1+24+1+1=27. BT costs ((1+24+25+26) + (25+24+1+2) + (26+25+1+1)+(27+26+2+1))/4 = 59, more than twice optimal.

Now, imagine a hundred nodes in place of A and B, and another hundred in place of C, D, and E (not all in a line). Under classical BT, each node will bring about half the file across BC or CB, so it will cross a hundred times, at a total cost of 2.4k, and get the other half locally, at a cost of a few hundred, say 3k total. Optimal is 224. FTP is close to BT. Improved BT+CDP or BT+CDP+Ono might be much closer to 224 than 3000.

"Best" and worst case, posted 15 Aug 2007 at 04:25 UTC by cdfrey » (Journeyer)

I admit I didn't define my "best" very clearly, and in the context of ncm's replies, I guess it isn't quite accurate. :-)

With a title including the word "evil", part of my gut reaction is to come to the defence of Bittorrent, since one of its good points is that it allows anyone to serve up large files if needed. That gives it a huge advantage in the "best" column in my books. Video podcasts need this sort of thing, and I'm often amazed at how infrequently bittorrent is used.

From this point of view, Akamai is not nearly as accessible.

Anyway, focusing on the technical is good. Bittorrent does expand to fill all available bandwidth, and once that occurs for a transatlantic pipe, bittorrent is still happy. (Everything else suffers.) The math would change, with B to C traffic slowing down and taking a smaller fraction of the traffic, and the clouds on each side filling in the gaps.

The old fashioned way to handle this was mirror servers. This suggests that different trackers for various geographical areas could be useful, but messy.

This also suggests that I don't have any brainy ideas that other people haven't thought of before. :-)

I have been thinking of this..., posted 15 Aug 2007 at 13:32 UTC by Omnifarious » (Journeyer)

It would be useful if an ISP could run an 'auto-tracker' or something that figured out any particular thing a few of its customers were downloading and tried to get them all to talk to each other while it opened out 2-3 links of its own to the outside each traversing one of its major backbone links.

I remember the digital fountain people showing up at CodeCon and telling everybody how great their patented multicast based technology was and wondering why anybody was using BitTorrent or anything similar. Their patented multicast technology wouldn't have had the problem ncm is complaining about and it was quite spiffy. But, of course, their technology was patented, which relegates it to the 'useless for 15 years' heap.

Evil vs. Rude, posted 15 Aug 2007 at 19:11 UTC by ncm » (Master)

cdfrey: By the formal definitions, I suppose the title should have been "Is BitTorrent Rude?", but that would have been much less catchy. Of course, pervasively deployed mirror servers are the cooperative equivalent of Akamai et al.

The Wikipedia article claims that BitTorrent, Inc. has failed to document CDP, but who knows?

Torrent for ISPs, posted 16 Aug 2007 at 23:45 UTC by ncm » (Master)

There might be a market for "torrent-spoofers" for ISPs. Imagine if your ISP noticed torrents and inserted itself into the conversation. It could save off copies of chunks requested by subscribers and offer them to all the rest, while making those chunks apparently unavailable from upstream. Less helpfully, it could throttle outward traffic, and favor delivering to other copies of itself. I wonder if BitTorrent, Inc. is doing this.

peers are not randomly selected, posted 26 Aug 2007 at 08:21 UTC by walken » (Master)

I'm late to the party and I have no actual experience with the bittorrent protocol. However, I'm not sure it actually works as nathan describes it.

My understanding is that bittorrent clients try to select peers that they get good throughput with. If that's correct, this should favor local peers somewhat.

In the example of the transatlantic link, sure if the link is uncongested and you get good throughput on it, the torrent client might use it, but then does it really matter much in that case ? If the link is congested though, clients will start looking for peers they have better throughput with, i.e. most likely on their side of the pond.

Now this is only my intuition, I have absolutely no idea if this works as nicely as that in practice.

realtytrac@custhelp.com provokes a spamming impulse from me, posted 2 Sep 2007 at 20:33 UTC by badvogato » (Master)

Recently you requested personal assistance from our on-line support=20 center. Below is a summary of your request and our response.

If this issue is not resolved to your satisfaction, you may reopen it=20 within the next 7 days.

Thank you for allowing us to be of service to you.

Your Service Coordination Group is here to help with any questions or concerns you have about your RealtyTrac account, tools, services or information.

We are available by telephone at 1-877-888-8722 Monday - Friday 8am - 5pm= (PST). =20

Sincerely,

The Service Coordination Group RealtyTrac, Inc. Phone: 1-877-888-8722 Fax: 1-949-861-9413

----------------------------------------------------

My initial problem is that i entered my credit card for a 7-days trial of their service and didn't realize that if i don't cancel it on my end, they'd automatically charge the memebership dues... so I sent them this email:

I do NOT honestly recall ordering services that warrant these charges:=20

08/28/07=09 POS =09 REALTYTRAC INC 949-502-8300, CA REFID:087239820577= 382 -49.95

08/21/07=09 POS =09 RTI PUBLISHING 949-502-8300, CA REFID:087231273057= 506 -29.95

07/30/07=09 POS =09 REALTYTRAC INC 949-502-8300, CA REFID:16720882022= 5576 -49.95

regards

your former trial-only member.

-------------------------------- after receiving their customer response, the resolution I'm contemplating is to fax them my incident report every 7 days with my computer's intelligent program. Is that seem to be fair enough self-service?

moderation, posted 5 Sep 2007 at 02:50 UTC by ncm » (Master)

I'm reading badvogato's posting above (like those elsewhere) as a call for some sort of moderation capability in Advogato articles. Maybe replies should be visible or not according to whether diary postings are?

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!

X
Share this page