OLS2008 - "Issues in Linux Mirroring: Or, BitTorrent Considered Harmful"As one talk I was really interested in, I went to John Hawley's talk entitled "Issues in Linux Mirroring: Or, BitTorrent Considered Harmful", as seen from the perspective of the kernel.org mirrors.
This paper was really interesting for me, both as the Gentoo releng infra liasion (I get the bits from releng onto the mirrors), as well as working for IsoHunt, since he was complaining about BitTorrent.
Before the actual material about BitTorrent, he had some harsh words about distributions and space usage, and the lack of co-ordination. Having multiple major distributions doing their releases in the same week really only hurt themselves, because the mirrors get saturated by users. Between two major distros, they use up fully half of the 5.5TiB at kernel.org, and having them doing new material at the same time just blows out the cache, even with stupid amounts of memory. (Comments were made about Mark Shuttleworth having the money to buy some boxes with TiB of RAM for kernel.org). Co-ordination between distributions is needed to resolve this issue, and the audience discussion suggested we should try the distributions@freedesktop list first, and if that's too much noise, start up a list at kernel.org instead.
Moving onto BitTorrent, he noted that in large Linux torrent swarms, the standard tracker balancing algorithms end up with a net effect that a few slow peers joining greatly slow down the swarm speed at present (based on analysis of the tracker used by Fedora for the F8 release). If mirror are performing seeding, in many cases, it will still be faster for the mirror to provide content for a given user than other client peers. If the objective is to move content as fast as possible, this is needed vs. the normal BT objective of balancing total bandwidth usage.
Issues for distributions in handling bittorrent to make life easy for mirrors, he had several complaints about the level of manual interaction needed, to which I responded with the Gentoo structure of symlink trees under experimental, which is used for mirrors to run torrents easily, as well as powering the HTTP seeding additions to the BitTorrent protocol.
In using rtorrent(libtorrent), he complained that it wasn't using sendfile at all, which had a large negative performance impact, should be tackled upstream.
The BitTorrent community also needs to look at tweaking the peer decision protocol in the announce protocols, to hand out a smarter selection of fast peers. Where fast is local (look at BGP looking-glass for clues) or is a designated fast mirror that should be used as a fast peer.
Lastly, he noted that the trackers seem to be badly run, as somebody from isoHunt, I offered to post up my own work on running effective trackers to the inter-distro discussion.