I got a couple of nice replies to my recent TCP ravings, one from
Grit and even a (delightfully titled!) followup
article from the author of the original piece. It's an
interesting topic.
Actually it's two topics, I think. One is how well TCP works, particularly in some specific situations like "well provisioned" high-speed LANs. The other is about the merits transport-layer framing, in other words having TCP (or a similar protocol) track discrete "frames" rather than just a continuous stream of bytes.
For the framing issue, I think there must be some specific examples where it's useful that haven't been mentioned. Grit: why do you want the correspondence between write()s and TCP segments? What am I missing?
Here's my reply to the "TCP Apologists Considered Annoying" article (anyone's welcome to respond to this):
I disagree about pay-as-you-use. The vast majority of TCP's congestion-control
features cannot be turned off, either because they're deeply embedded in the
protocol itself or because no platform that anyone actually uses provides an
interface to disable them. Those features do get in the way, too, kicking in
and causing timeout waits even when - intuitively, to an informed human
looking at a packet trace - there's absolutely no good reason for it.
By pay-as-you-use, I mean that most congestion-control features are
only actually used when you encounter congestion. Or more accurately,
when you encounter packet loss, severe packet reordering, or large
"spikes" in delay, which TCP interprets as signs of congestion.
On a fast and reliable "specially provisioned" network, I'm assuming
these things are extremely rare. That being the case, I don't see why
TCP congestion control should cause any problems.
It would be more interesting to know what mistakes are showing up in
packet traces, and whether they're caused by TCP implementation bugs
or by network quirks that the protocol intrinsically doesn't handle
well. I'd want to determine this before concluding that TCP's
congestion control is expensive, and certainly before turning it off
or designing alternatives.
First off, there's no such thing as a "layer 7 switch". Anybody referring to
anything above layer 2 as a switch should be forced to re-take basic
networking courses until they learn to get it right.
This is just a disagreement over names. I was refering to boxes that
both switch ethernet packets and work with the higher level protocols,
for instance to use HTTP cookies for session persistence in load
balancing. Vendors and industry press call them "layer 7 switches". I
did originally put the name in quotes, after all :-)
Secondly, transparency requires that service-consumer behavior be preserved.
If a transparent proxy wants to do unspecified "useful things" by reshaping
traffic internally, it must do so within the limitations of preserving
higher-level behavior or it's not actually transparent.
That hits the nail on the head. The reason that TCP proxies can do so much
transparently is that TCP and SOCK_STREAM leave so much freedom to the
transport, compared with e.g. a datagram protocol where application
level frames must correspond to IP packets. With streams, write()
isn't defining a frame, it's just writing the next sequence of bytes
in the stream, so there is no frame information to be preserved.
What Luke seems to be missing is that message-boundary information can be more
efficiently maintained and transmitted within the transport layer than via
application-level framing. If the framing occurs at the application layer, the
transport layer is bound to transmit the framing information absolutely
verbatim; it has no flexibility. A message-boundary-preserving transport layer
can do much more intelligent things that combine this framing with chopping
stuff up into network-layer packets, flow control, retransmission,
etc.
Apparently I am missing the advantages, but then they haven't been
stated specifically. I can't tell what's being proposed from the text
above, and TCP streams already have most of the listed advantages.
Streams give tremendous flexibility to the transport layer: its only
restriction is to ultimately deliver the bytes in order. Any way it
wants to chop them up into network-layer packets for better
flow-control, retransmission, transfer efficiency, or to fit the
receiver's buffer is no problem. It can even split an
application-level frame header into pieces if that will be more
efficent - it's just stream data like any other. Intermediate proxies
are able to re-package the data when they need to, e.g. because their
MTU is different on either side or because they want to modify the
data. Even the sender and receiver can repackage the data, e.g. with nagle's algorithm or to compress the send/receive queue.
It's also easy to put application frames on top of streams. Writing
and reading a 2- or 4- byte in-band length header is cheap and simple,
and so are other framing formats like HTTP 1.1 chunked
encoding. Taking care of byte ordering is trivial, and often necessary
for other things than framing anyway.
In summary, streams support simple and cheap application-level framing with great flexibility to the lower layers, in addition to unframed byte streams. So what significant problem would transport-level framing for a TCP-like protocol be solving, and what specific scheme might it use to do so?
Way too many times, I've seen application messages and message
separators appear on the wire as two separate packets, and TCP's
byte-stream orientation can make this hard to avoid without extra
buffering.
This is a really important point. It's one thing to respect layering,
but to completely ignore the existence of lower layers could lead to
very inefficient code. It pays to understand what's likely to happen
under the hood and give the right "hints", in much the same way it's useful to understand what compilers are going to generate from your programs.
So for instance, if a program is doing a bunch of separate write()
calls, this might cause the data to go out in separate IP packets and
ethernet frames, when it could have all fit into one. Thoughful use of writev(3) and TCP_CORK could help out for things like
this. It sounds like Jeff from Platypus has a lot of experience in
this area -- I don't.