bufferbloat (and a Net Neutrality rant in the footnotes)
Jim
Gettys writes about "bufferbloat", the tendency of operating systems to
keep increasing network buffer sizes so that now, with Internet links faster
than ever, browsing often seems slower than ever.
His series of articles is a great introduction to the concept that I've been
ranting to people about for years. Usually their eyes glaze over. What can
you do? But his article is a bit more beginner-level, so maybe you'll have
some idea what he's talking about.
This article, on the other hand, you probably won't.
Back in September, I was "lucky" enough to live for a month in someone
else's apartment - and that someone was a cheapskate, so they
subscribed to the cheapest possible Telus DSL plan, 256kbits/sec
(symmetrical, for a change!).
On that connection, which is at least 4x as fast as single-channel ISDN or
my old 56k modem (on which I used to multitask like crazy), it was totally
impossible to do two things at a time. Loading a single web page totally
killed interactive performance, to the point where we would have 8+ second
ping times if someone had Google Reader open in a web browser tab. That
same Google Reader that killed everything didn't even *work*, because it
opened multiple sessions at once, and I think something in there somewhere
must have had a 5-second timeout. Hilarity ensued.
Obviously, I concluded, our ISP has hard coded their DSL buffering based on
the idea that we have a fast modem, even though we don't. At 6
megabits (about 23 times faster), that maximum 8-second delay would have
been only 348 milliseconds... a mostly acceptable latency.
(Even if you have a fast DSL link, you still have the problem; most ISPs
configure their DSL connections with symmetrical buffering, ie. the same
sized buffer at each end. But even a 6 megabit downstream DSL link usually
has a much lower upstream bandwidth - a megabit if you're lucky, or much
less if you're not - so the same buffer size means a much higher delay on
upload than on download. That's why on a normal link, *uploading* stuff
kills performance much faster than *downloading* stuff. It's just
misconfigured buffers, pure and simple.)
The bad news is that by the time you get into Jim's third article article in
this series, he makes analyzing it all seem like rocket science.
Bufferbloat is not rocket science at all. It's one single misconfigured
number on virtually every DSL router (and now most cable modems too;
technology apparently gets worse over time) everywhere. And unfortunately,
it's often at both ends, yours and the ISP. And even when it's your end, you
often can't reconfigure your DSL modem because it's locked down.
I heard a rumour once that ISP's do this on purpose: using a bigger
buffer lets you squeeze a couple percentage points more bandwidth in
stupid reviewers' benchmarks, and reviewers "of course" only ever download
one thing at a time. You can evaluate bandwidth objectively, but lagginess
during a download is mostly subjective. So perversely, an ISP that
lowered their buffer size to make their connections suck less would lose out
in third-party comparisons. Sigh. Having had my own products reviewed by
morons in the past, I can't even fault them if that's the case.
On the other hand, the ISP here had no incentive to tune our
cheapest-you-can-go 256kbit DSL link. In fact, they had an incentive
to make it suck as much as possible, so we'd stop being such incredible
cheapskates and actually buy one of their bigger, profitable plans.
So anyway, with all that suck out of the way, I have a message for the
world:
You can fix it, and you don't need your ISP to not screw it up.
The bad news is that what you do need is a miracle :) Just kidding.
Sort of. You don't technically need a miracle, but you need to figure out
Linux's tc (traffic control) tool, which is about as close as you can get.
Let me jump ahead and say that, with a lot of fiddling back in September -
let's just say I spent *most* of my September learning about this - I
managed to get my 256 kbit connection to upload/download 10 files
simultaneously while having a flawless Skype call, and my ping latency
to Google never exceeded 250 ms. Were the downloads fast? Heck no, 256
kbits still sucks. But they were each about (my_bandwidth-skype)/10, which
is to say, as good as it gets.
In case you don't already know, what you need is to insert another router
(if you don't already have one) in between your home network and your DSL
modem, and enable something called traffic shaping. On a typical DSL
modem, you mostly only need to care about upstream traffic shaping,
since that's the one that normally kills your performance (especially with
BitTorrent); if you're picky, like me, you might also play with
downstream shaping (called "policing" since it works totally
differently), though the Linux code for that is not nearly as complete
as the upstream stuff.
Here are some things I learned:
First of all, about 90% of traffic shaping setups that people try are
absolute disasters. They make things much, much worse than nothing at all.
It's a bit of a black art. It actually *shouldn't* be a black art, but
there's something a bit funny about the whole department; everybody doing it
seems to have taken user interface lessons from Cisco. The Linux 'tc' tools
(kernel and userspace) were written by some kind of super genius in Russia,
which is great as far as performance goes, but absolute death in terms of
usability and documentation. I mean, the help message from the thing is
literally a dump of its BNF grammar. If you're a Russian super genius,
that's probably all you need. If you're me, you cry.
Given that, the terrible state of the documentation - all written by other
people, Russian super geniuses don't need no stinking documentation - is
actually kind of forgivable. Except in the cases where they obviously
haven't even tried their own advice. Oh, sure, the sample scripts all *run*
without errors, but they just plain don't do the things they claim to do.
I'll get to that in a minute.
So, ok. Super complex UI, lots of fiddly magic numbers, and no
documentation. You're sort of doomed. I know, I'm sorry. That's why this
article is more of a rant than a howto. I won't keep you in suspense:
you're only halfway through this article, but it doesn't get any better.
Feel free to stop now before you get as depressed as I am.
BTW, all the "user friendly traffic shaping" scripts I could find make
horrible mistakes in configuration and do lots of weird things. If anyone
remembers the Nitix "Active Queue Management" feature and that it would do
insane things sometimes... now I know why. Because we didn't spend enough
time figuring out what the heck those scripts were doing, and realizing that
they're doing it wrong.
Actual experimental observations
With upstream shaping, you can set it to about 99% of your upstream
bandwidth and life is good. This makes sense, of course, because some
traffic can jump straight to the head of the queue, so you don't really have
latency concerns. (By the way, with an idealized perfect queue, there's no
reason to ever limit the queue size. But such a queue doesn't exist.)
With downstream policing, you have to cut bandwidth usage to 80% or less, or
the jitter will kill your latency "sometimes." (This might not matter as
much on faster-than-256k links. At 256k, even three full-sized packets
enqueued at once can cause 140ms of momentary delay. If you have ten
downloads at a time, the *average* case is about 10 packets at once, or 458ms,
and it gets worse from there.) So this 80% number isn't really fixed; it
varies based on how many connections you have going at once.
If you're massively exceeding your allotted bandwidth (that is,
always, if you have a super-slow 256k link like mine, or if you have a
normal link and you're doing a lot of torrenting/etc) then enabling ECN on
all your client machines, and using an active queue like RED on your router,
can make a huge difference versus no ECN.
ECN's biggest effect is in improving interactive traffic when there are
high-bandwidth connections around (interactive TCP responds *very very
badly* to packet drops, since it doesn't have any other packets outstanding,
so it doesn't get the instant re-ACKs caused by later packets... long
story, but losing a couple of packets can cause multi-second delays on
interactive traffic, and only millisecond delays on busy sessions). The
good news is that this means you can enable ECN on *just* your own client
that you use for interactive traffic, and your results are improved, even if
you don't want to ECN-enable every single computer on your network. (This
isn't cheating; you aren't hurting anybody else's performance by turning on
ECN for only one machine. Like magic, ECN is just always an improvement,
and always fair. Nowadays, chances are that the reasons ECN is off by
default don't apply to you anymore; try turning it on and feel the love.)
If you do traffic shaping/policing properly, there is little need to actually
*prioritize* traffic; just letting all the streams share fairly is usually
enough. I was able to have a Skype call just fine on my 256k link with 5
downloads going at once with traffic speed limiting but no
prioritization. With 10 downloads, 1/11th of my bandwidth was just too
small to get good sound quality, so prioritization was the only way. But
come on, if you have a 10 megabit link, do you *really* think that's your
problem? Plus, you can't do prioritization of your downstream - that's
something only the ISP can do, and they won't, for very good
reasons.1 So you're lucky it turns out not to matter.
So the rule of thumb? Get your basic queuing right *first*, and worry about
prioritization *second*. A lot of people get those two mixed up.
Prioritization is worthless when your queue is doing stupid things.
Next, beyond simple idiotic oversized DSL buffers, the *real* problem with
BitTorrent is that it uses so many simultaneous TCP connections. If you
have 50 of them at once going at full speed, everything else will get 1/50
of the bandwidth at most. As far as I know, nobody has ever found a really
good trick for protecting against this. (You could group traffic by IP
instead of by IP:port, but that would penalize anybody behind a NAT, so it's
no good.)
Linux's traffic policing (ingress) layer can't do ECN. This is a terrible,
terrible oversight. Someday I may try to fix it, if someone doesn't beat me
to it.
Some people suggest shaping the *outgoing* queue on your *local* interface
as an alternative to policing the *incoming* queue on the *remote*
interface. Same thing, right? Packets still get dropped when they exceed
the bandwidth, but now you can use all the fancy queue types, including RED,
which allows you to use ECN for incoming traffic! Except this setup results
in total crap; I wonder sometimes if the people writing these howtos even
try their own advice. When receiving packets from a slow link, you've
already paid the latency costs; if you now create a queue before
sending the packets onward locally, you're just adding a bunch more latency.
And the quality of (for example) your RED tagging varies proportionately with
the added latency; if you keep the useless queue short, the policing quality
drops because the algorithms aren't made for that. A much better solution
would be a policing ingress queue that would "virtually" drop packets using
ECN when possible.
I think you could make a "virtual RED queue" that doesn't actually delay or
even store packets - RED isn't about delaying packets anyhow, it's about
dropping them before the queue is full, and in fact it's better to
ECN tag them instead of dropping them, so why not virtually drop them when
your virtual queue is virtually full? It would be about as good as really
dropping them from a real queue that's really full, but with no added
latency and no actual packet loss. Someone could probably write a Ph.D.
thesis about this, which I'm guessing is also why nobody has implemented it.
Note that regardless of the above, an *outgoing* queue used for traffic
shaping does *not* increase latency. It always releases the packets (just
slightly less than) as fast as they can go. So even though the queue
usually has a bunch of stuff in it, there would have been a queue with stuff
waiting in it anyhow; traffic shaping just makes it smarter. This
fundamental difference is why using your outgoing queue algorithms for
ingress traffic totally doesn't work.
TBF vs. SFQ
Whatever you do, don't try to combine TBF (token bucket filter) with SFQ
(stochastic fair queuing), because the end result is dramatically bad.
Unfortunately, some of the Linux howtos actually show you how to set this
up as one of their examples! But it's fatally flawed.
A basic non-prioritizing TBF will simply drop the next packet if
there's been too much data lately; statistically speaking, the busiest
connections have the most packets, so they're the most likely ones to have a
packet dropped. Interactive sessions, with just one packet in a blue moon,
have a very low probability of dropping a packet. Pretty good, right?
That's TBF by itself. I mean, it's not perfect: even my 1/100 speed
interactive session will get screwed for 1/100 of the packet drops, even if
my busy connection is 99/100 packets. What we really want is for *all* the
packet drops to punish the high-bandwidth guy. Still, it's pretty good
considering how simple it is, and being wrong 1% of the time isn't so bad.
Now, let's look at SFQ, which is a very clever invention. SFQ works by
sorting each session into one of, say, 100 "bins." Then it feeds the first
packet in each bin to the output queue, looping through the bins in
sequence. When a new packet arrives, it sorts it into a bin, and drops the
packet only if *that* bin is full, but other packets, in other bins, don't
get dropped. The trick is that packets are sorted into bins based on the
hash of their (srcip,srcport,dstip,dstport) tuple, so on average, each
connection gets its own bin. Assuming there are a lot more bins than
connections, the end result is that you *never* sacrifice your 1/100
interactive session by accident; the only packets that ever drop are the
99/100 busy session, exactly the guy you want to slow down. Sweet! And it
really works, too... if your network device never drops packets from its own
queue, which is a safe assumption for physical devices.
Now let's hook SFQ to a TBF instead of a physical device. SFQ cleans things
up, sorting the packets in the order A,B,A,B,A,B (where A is your
interactive session, and B is your high-bandwidth session). It feeds them
into TBF, which queues, say, 10 packets at a time. When the tokens run
out, it drops a random one of the 10 packets, and lets in a new one from the
SFQ. Which one will it choose randomly? Well, given that SFQ has inserted
a whole bunch of fairness into the queue and the packets are no longer
bursty... your interactive session has about a 50% chance of having its
packet dropped by TBF.
...blink...
OH MY GOD IT'S SO TERRIBLE.
And I'm not just making this up. I tried it. I really did. I gathered
statistics. I analyzed packet dumps. I learned what all the crazy magic
numbers did (and tc has a *lot* of crazy magic numbers). And the conclusion
was: it works just like I said just now. And it sure does *completely* tank
your interactive performance. TBF by itself is actually okay (1% error),
and SFQ is okay (~0% error) if that's your problem, but TBF+SFQ is never,
never okay (50% error).
Two rights make a wrong? That's a new one.
Now, you could probably fix this by writing a different sort of TBF: one
that accepts packets into its queue on a particular schedule, but
assumes the guy feeding it packets will be the one doing the dropping; my
alternate-universe-TBF would never drop packets itself. This parallels my
"virtual RED queue" suggestion up above, but beware that SFQ+RED is probably
just as bad as SFQ+TBF.
By the way, just putting things in the opposite order - TBF then SFQ -
doesn't work either. The SFQ always drains instantly, and without packets
in the queue, it can't add any fairness.
Oh, and also, if you think about connecting PRIO (basic prioritization, ie.
VoIP vs. bulk) to TBF... you get exactly the same problem. TBF will end
up penalizing high-priority packets even more than it would without
PRIO! Which is why some people find that enabling packet prioritization
makes their Internet performance worse.
No, HTB is not the answer!
And before you say something, yes, I've read all the notes that HTB
(hierarchical token bucket) is the newfangled big thing that will solve all
your problems.
No it won't.
HTB is probably better than CBQ, the thing it was designed to replace, but
neither of them actually solves any of the problems I'm writing about here.
As soon as people start talking about traffic shaping, they always want to
start talking about restricting one stream to 1 megabit, while the other one
gets 5 megabits, but then we allow short bursts of xyz for not longer than
abc every pqr, and then we subdivide it like this...
It's all wankery. You're sitting at home, your Internet sucks. You do not
need to do any of those things. Moreover, even if you had the problems
those people are talking about, you would still have the latency problems
*I'm* talking about, and HTB gets you 0% closer to solving them. HTB is
good if you need it. But chances are, you absolutely don't.
The real problem with traffic shaping...
...is that it's just too fiddly. The truth of the matter is that it's
really just a few little numbers - really simple numbers, integers, like the
maximum number of packets in a queue - that make a huge difference.
Coming up with this small set of numbers requires a surprising amount of
math, and if you don't get your math exactly right, you don't just get
substandard results - you get horrifically bad results. A mis-tuned
RED queue, as I learned in my tests, is both very easy to produce, and very
destructive to use. Combining TBF, which is great, with SFQ, which is
great, results in disaster. It's really mathematically hard.
The funny thing is that, if you could just reliably get the math right, the
actual code behind traffic shaping is delightfully simple. Any tiny
little routing box with a super crappy CPU and almost no memory can do it,
and the results would be great.
It's one of those really frustrating computer science problems: most
programmers just like to get down to coding stuff. If you work a little
harder, your solution gets a little better. The world of traffic shaping
isn't like that; it's not a lot of work, but the outputs also aren't
proportional to the inputs. It's a step function from crappy to awesome,
with nothing in between to give you a hint that you're on the right track.
Footnote: net neutrality
1 I said up above that ISP's won't do packet prioritization at
their end of the link, for "good reason." That reason comes down to the
whole famous "net neutrality" debate; should ISPs prioritize traffic or is
every stream equal to every other stream? You might have heard of the net
neutrality debate, but every time I hear it, I always think: wait, which
side of this debate am I supposed to be on, anyway? I can never tell.
So far my answer is, it all sucks. In short, everyone can agree that their
phone calls should take precedence over their bittorrent downloads. They
can probably even agree that their phone calls should take precedence over
their web browsing. And you know, most people can even agree that *other*
people's phone calls and web browsing should take precedence over their own
bittorrent downloads. Most of it is easy; most of the people spamming you
about net neutrality only tell you about the easy bits.
But the thing nobody can agree on is... what *is* a phone call? And what
*is* web browsing? And how can your ISP tell the difference so they can
prioritize it the way we all agree is the right way?
Packet prioritization on your local network is no big deal. (It's useless,
but easy; by the way, if your wireless router has a "wireless QoS" option,
just turn it off. I guarantee you don't need it, and it definitely
doesn't do what you think it does.)
Conversely, packet prioritization on the open internet is a giant
security hole. If you can tag a skype session as "high priority", then
what's to stop people from tagging their browser sessions as high priority?
What's to stop some annoying hacker from blasting a million "high priority"
packets at your IP address, completely filling your queue, which prevents
any low priority packets from getting through *at all*, thus destroying your
ability to access email or the web? Nothing can prevent this but your ISP.
And so your ISP is stuck: do they honour those packet priority tags, or not?
If they do, giant security hole. If they don't, skype calls are a bit less
clear. On the open Internet, they pick the latter because the former is
simply not acceptable to anybody.
Now, at least in Canada, all the ISPs are *also* in the VoIP business
nowadays. They offer special boxes that do phone calls over their Internet
link, in parallel with your cable/DSL modem that does Internet stuff. And
they prioritize *that* traffic, right? Which is totally unfair to Skype,
right? Totally unfair? Yes, absolutely!
But unfortunately, also justifiable for technical reasons. I'm not
talking about business here; that's another question. But the idea is that
you *obviously want* your phone connection to take precedence over the
Internet connection; your ISP controls their network, the cable modem,
and the phone box; and so the ISP can safely tag the packets - in a
way your cable modem can be configured not to do - so that you and anybody
else can't fake high-priority packets and cause trouble.
In short, barring insanely complicated inventions (ie. intserv, "integrated
services", the failed opposite of diffserv) you or anyone can only
securely implement prioritization on a network you completely control.
Unfortunately this gets the net neutrality police up in arms, because now
BigCo's VoIP service has more reliable performance than Skype. And oh boy,
does it ever; the sound quality is pretty much perfect, all the time. And
yeah, that helps BigCo reinforce their monopoly position, and that totally
sucks for Skype and competition and consumers. But make no mistake; if you
pass a law requiring Net Neutrality, then you'll just make the VoIP service
suck; you won't make Skype better. Or if you pass a law requiring ISPs to
implement diffserv, then the whole Internet will collapse because a few
idiots will abuse it by making their torrents look like phone calls.
And how about those special deals for Youtube and Netflix? Are those fair?
Nope. But because those companies have special arrangements with the ISPs,
the ISPs *can* prioritize those live video streams over random Internet crap
without creating a giant gaping security hole. And let's face it, average
consumers *do* want their Youtube and Netflix to be prioritized over their
random web browsing, because glitchy video sucks, and glitchy web browsing
is mostly unnoticeable. But is it fair to J. Random Bob's video sharing
site? Of course not. Is there even any way, technically, for ISPs to offer
the same service to J. Random Bob? No, and that's because the universe is
fundamentally unfair.
With net neutrality, your choice is between suck and unfairness. Both
options are terrible. And that's why neither side has won the debate and
governments can't figure out what to do.
Oddly, it doesn't even matter that ISPs stand to make a lot of money by
auctioning off priority to the highest bidder. We could deal with that;
that would just be plain simple corruption. Take a bribe, or throw someone
in jail, or whatever, but it would be over with by now.
Epilogue
There is some good news, though: I read somewhere (I can't find the link
right now) that in Internet backbone-scale tests, simply overprovisioning
bandwidth by around 1.6x (ie. 60% more bandwidth than you need) reduces
jitter and latency just as much as packet prioritization. So if available
bandwidth keeps going up at the rate it has been, unless some accursed
person invents something even more bandwidth-wasteful than video (what could
possibly do that?!) then this whole debate should end itself in a few years.
When in doubt, use brute force.
Of course, even that does you no good if your router's queue is too big.
Syndicated 2011-01-10 07:39:17 from apenwarr - Business is Programming