(In the preview the PRE and TT tags aren't working, I hope they do in
the real page. Original version of this article is at
http://itamarst.org/multiplex/).
I'm trying to design a protocol that allows multiple channels running
over the same TCP connection. Why would we want to do that? Well, it
saves time opening
new connections, we can have multiple requests to the server waiting for
replies at once. Unlike pipelining, even if one of our requests returns
a lot
of data, we don't have to wait until all of its is sent - we can get
replies to other requests while we are getting all that data.
The sMUX protocol alows multiple channels
of
communications to be run over the same TCP connection, for
client/server
protocols. A channel allows two entities to send messages back and
forth.
The receiver gets the total message the sender sent. Therefore, unlike
a
regular protocol, we don't need to use \r\n to signify the end
of a request,
since the object that gets a message knows it got the total contents of
the message. For example, if the sender sent the message
HELLO, the receiver
will be notified it got a message of length 5 with the content
HELLO.
Protocol Definition
A message is made up of two parts - the channel ID, and the actual
data
of the message. The channel ID is two bytes chosen by the client. Each
channel
has a unique ID, so we can have 65536 channels. The message data is
some
number of bytes, the contents of which depend on the protocol.
To send a message we prepend the channel ID (two characters) to the
message
data, and the resulting string is encoded as a netstring and added to a
sending queue. At the same time the messages in the queue are taken one
by one and sent over the TCP connection.
What exactly is a netstring? To quote
Dan Bernstein's
article:
A netstring is a self-delimiting encoding of a string.
Netstrings
are very easy to generate and to parse. Any string may be encoded as a
netstring; there are no restrictions on length or on allowed bytes.
Another
virtue of a netstring is that it declares the string size up front.
Thus
an application can check in advance whether it has enough space to
store
the entire string.
For example, the string hello world! is encoded as <31
32 3a 68
65 6c 6c 6f 20 77 6f 72 6c 64 21 2c>, i.e., 12:hello
world!,. The
empty string is encoded as 0:,.
When the other side receives the message, it know how much to read
since
the message is a netstring. Once the receiver has read in the whole
message,
it extracts the message data and channel ID from the netstring and
hands
the message to the object that is using that channel.
If the server recieves a message with a channel ID that was not in
use before, this implicitly means the client has openned a new
channel.
Example
Let's look at an example - a counter application. Whenever a new
channel is opened, the server creates a new counter with value 0. The
client can then increment the counter, and the server will send back
it's new value. Let's look at an example session:
Client on channel ab sends: INCREMENT 100
Client on channel zx sends: INCREMENT 50
Client on channel ab sends: INCREMENT 20
Client on channel ab sends: VALUE?
Client on channel zx sends: VALUE?
Server on channel ab responds: 120
Server on channel zx responds: 50
So what was actually sent was:
15:abINCREMENT 100,14:zxINCREMENT 50,14:abINCREMENT
20,8:abVALUE?,8:zxVALUE?,
Notice that we do not know what order the server will respond in -
perhaps ab's response will be returned first, perhaps zx's. SO, the
response from the server can be either:
5:ab120,4:zx50,
or, alternatively:
4:zx50,5:ab120,
<h3>Techniques</h3>
Sending large messages is not a good idea, since this will prevent
other channels from sending messages while the large message is being
sent. The solution is to break up the data into multiple peices of a
small size, e.g. 10kb. If there are more pieces of the data to be sent
after this one, we prepend "M" to the data, and send that as the
message. If this is the last piece, we prepend "S" the data before
sending it as the message.
The receiver, knowing this data being sent is broken up (because it
just issued a download command for a file, for example) examines the
data of the messages as it gets them. If the data starts with "M", it
knows there are more pieces of data coming, but if it starts with "S" it
knows this is the last piece of data.
Another benefit of this technique is that it can be used to send
large amounts of data when the sender does not know how much data will
be generated (e.g. when sending the results of a long-running
program.)
Sample Implementation
You can download a sample implementation written in Python - multiplex-0.2.tgz.
First, you should indeed look at BXXP.
But second, you quickly skip past your rationale for not using multiple
TCP streams. Multiplexing reliable, sequenced streams over the same
physical connection is, after all, what TCP/IP is all about. You claim
that you want to "save time opening new connections", but I fear that
might be a case of premature optimization. Have you actually measured
the cost of opening a connection, or looked into optimizations that can
reduce that cost? Do you understand what your performance requirements
(for latency and throughput) are in the first place, and the conditions
under which this protocol will be operating?
(I'm not saying it's a bad decision, I'm just saying it's an
insufficiently justified decision.)
Either way, this is not a wheel that's worth reinventing.
Sorry for being so blunt, but your idea sucks. First, you don't save
any cost on the server, because it has to parse the packets and
distribute the data anyway. It does not matter whether the OS does it
or whether some user space program does it. If it is too slow, rewrite
the software.
Second, TCP has a large window, i.e. when you multiplex one channel for
bulk data and one channel for interactive data, the interactive data
will receive a huge latency because it has to wait for the bulk data
transfer buffer to be sent before it hits the wire.
So, the only situation where this could be any useful are several bulk
transfers and several interactive connections. Bulk transfers can
normally be pipelined just fine (saving the multiplexing overhead) and
interactive connections don't cause enough system load to warrant this
kind of complexity.
I suggest you find yourself a different problem. ;-)