Older blog entries for elanthis (starting at number 369)

Bike Show Photos

More photos of the 20th Annual Custom Bike Show in Birch Run, MI.

Facebook album

Dad’s public Picasa gallery

The show was sponsored by Bubba’s Tri-City Cycles, and the four beautiful models we took way too many pictures of are employees of M/C Leather Works.

Syndicated 2008-01-23 23:52:12 from Sean Middleditch

Bike Show - First Place in Class

We won first place in our class (Manufactured Custom) at the 20th Annual Custom Bike Show in Birch Run, MI. Pretty cool.

Some of the models there agreed to pose on our bike for us, so here’s a picture of the gorgeous Jackie from Leather Works posing on our green Big Bear Choppers Sled Pro custom bike:

Jackie on our Bike

Syndicated 2008-01-21 19:07:55 from Sean Middleditch

Hair Cut

So I just got my first real hair cut in 11 years.

Me with short hair

Yeah, it’s all short now. The picture’s lighting sucks, as my apartment’s lighting sucks. What my apartment lacks in lighting it more than makes up for with pharaoh ants (I think, hard to identify a bug that size with eyes like mine and no magnifying glass).

The short hair is kinda throwing me for a loop. I don’t have a ponytail to play with while I think anymore. :(

Syndicated 2008-01-05 22:01:30 from Sean Middleditch

Moving Monday

I’m moving to my new apartment tomorrow, on New Year’s Eve. Fun fun.

Helped my very-soon-to-be ex-roommate buy the various necesseties he’ll need, like dishware and such, since I own pretty much everything in the apartment now. Also picked up some more objects d’art for the new apartment, since I’m going to have an awful lot of wall space to make interesting.

Syndicated 2007-12-31 01:15:29 from Sean Middleditch

Metacity Compositor

The new Metacity compositor in 2.21.5 is pretty sweet. It has only very simple effects (no eye-straining wobbly bullshit or workspace transition effects that just waste time), so it isn’t nearly as annoying as Compiz demos make compositing seem to be. It also made the desktop feel a little faster. The new desktop switcher is also a good deal nicer than the old one. Once an exposé-like feature is added, I think I’ll be all set with Metacity. All in all, I like it quite a bit.

Hopefully Ubuntu replaces the “low-bling desktop effects” option in the desktop effects setup with the new metacity. Compiz is a beast and it can’t even work on common setups like any machine that doesn’t support accelerated GL.

The only “bug” I’ve noticed with the new compositor is that windows sometimes get drawn to the screen before they have any content, so you see a very short flash of garbled pixel data. I’m not entirely sure if that’s a problem with Metacity, with X, or with GTK+. It’s not really critical, and I imagine it’ll be fixed in whichever layer needs it soon.

Syndicated 2007-12-22 22:40:00 from Sean Middleditch

Apartment Scare, All Good

The apartments I’m moving into, Aspen Chase in Ypsilanti, once International Place, have a very bad reputation. A very well earned, very bad reputation. The place used to be crawling with drug addicts, rapists, thieves, and other criminals, and that’s an alarmist exageration. The sex offender density for International was almost four times that of the average for tor Ypsilanti Township.

I found at all of this after making the decision to move in. Having not yet signed the lease, I was about to back out entirely. I have friends who used to live there telling me how awful it was, apartment review sites claiming it to be the most disgusting hole to live in, and neighbors who moved to my current hooker-infested (no, really, I almost ran one over last night because she jumped out in front of my truck hooking - it’s a real fucking shame that it’s illegal to run them over) told me how bad Aspen Chase was.

Having visited a few times, the place did not at all seem bad. The staff are great, the apartments are clean and recently remodeled, and I didn’t see any nasty people hanging around or here loud noise or other problems during the few hours I’ve spent looking around with friends and family.

I decided to just call the Sheriff’s Department and ask. The officer I spoke told me that the crime rate has dropped tremendously in the last year, and that while they do still get calls there, it’s no worse than anywhere else in the Township. To quote, “I wouldn’t discourage you from living there.”

Turns out that McKinley, the current owners, just bought the place a year ago, and have been fixing it up really nice like they tend to do. They like to buy crappy places and sink a bunch of money into them to make them nice, and I’m told by my mom that their reputation is impeccable.

That was a pretty big scare there for a few days, though. I was about to back out, and then I’d have been screwed on finding a place to live by the time my current lease is up.

On a worse note (unrelated to apartments), the small wart I’ve had on my right thumb under the nail is - two years, 14 cryo-surgeries, and one radio-surgey later - still there. The dermatologist told me that she gives up and won’t touch it anymore, and that I should look into getting laser surgery on it. I’ve had warts on my hands when I was a young kid, and a freeze or two later and they’d always be gone. I hadn’t had one in years until this one showed up, but it just will not go away. So now I get to look into laser surgery, which is usually for cosmetic purposes, which means it will be difficult to get health insurance to pay for it. It doesn’t help that my insurance expires this month, so if I don’t at least get the surgey billed in the next week or two, I’ll have to pay for a very expensive laser surgery out of pocket just to get rid of this stupid wart. I’m tempted to just leave it alone, since they’re supposed to go away in a couple years on their own, but given that it’s already been a couple years and it’s actually a little bigger than it was when I first noticed it, I’m going to do whatever I can to just get rid of it. I’m beyond sick of having it there, if for no other reason than it just makes me feel dirty for having it, even if it is where nobody ever notices it and it can’t spread easily.

Syndicated 2007-12-13 06:34:18 from Sean Middleditch

Creating Custom C++ Output Streams

In my younger, dumber days, I’d often write whole new classes in C++ when I had a need for output streaming, such as in a class handling a TELNET connection. Such classes required a ton of

I'm here today to tell you how not to do that, but to instead write a stream class that will work with all of the standard C++ output stream features. You might even learn a thing or two about input streams, but output streams are going to be my focus.

Note that I’m only going to touch on the basics here. C++ streams have a lot of features, most of which you won’t need to customize in the the majority of circumstances, so I’m ignoring those topics.

Introduction

Streaming output in C++ is accomplished by using the

One of the primary reasons to use a stream instead of directly writing bytes to a file is that streams allow for formatting and buffering. Formatting allows you to do something like the follow:

cout 

That will write out 123 in hexadecimal, or 7b. Without streaming, you’d have to create a byte buffer, format 123 yourself into that buffer, and then call system facilities like wite() to get your output on the screen. Kind of a pain.

C programmers will be familiar with the printf family of functions. These functions server a very similar role to the C++ streaming facilities. The above line of code, in C, could be written:

printf("%xn", 123);

The C++ streams offer several very distinct advantages over the printf family of functions, however. The first, and most widely known, is type safety. If the printf call had used the %s formatter instead of %x, then the program would likely have just crashed. The second advantage is that C++ streams have built-in support for user-controllable buffering. Buffering allows output to be stored up and sent to the OS facilities in larger chunks, which can both improve performance as well as allow for some special tricks which we’ll explore later. To provide user-controlled buffering in C, new functions which use printf functions internally must be created, and gracefully dealing with buffer overflows (that is, neither crashing nor losing output) can be a serious pain. A final advantage is that C++ streams maintain state, allowing you to more easily output a large number of identically formatted values without respecifying the format for each and every one.

The printf family of functions do have some advantages. They are, in most implementations, significantly faster than their C++ counterparts. Additionally, sometimes that “advantage” of C++ streams of maintaining state can actually be a problem, particularly if you set some state and never unset it. For example, the C++ example up above sets the number output format to hexadecimal, but never reverts it to decimal. All other output on cout will be formatted to hexadecimal until reset.

Since this is an article about C++ streams and not C string formatting, we’re going to assume that you actually want to use C++ streams and not printf, so let’s get on to the meat of creating a custom output stream.

Ostream and Streambuf

The std::ostream class does all the meat of formatting your streams. It stores the output state (like whether numbers should be displayed as decimal or hexadecimal) and processes your values to convert them into the correctly formatted output. This class is really the true core of all output formatting.

There’s absolutely no need to derive a class from std::ostream, either. The ostream class handles the formatting, but it doesn’t itself actually do anything with the output. Sure, there are derived classes like std::ofstream and std::ostringstream in the standard library, but these classes don’t actually change the behavior of std::ostream in any way. They are merely convenience wrappers that make use of a derived std::streambuf class.

All the actual work of outputing formatted data is performed by std::streambuf. Every ostream has a streambuf object associated with it. When you stream data to an ostream object, it formats the data and passes the results on to its streambuf. The streambuf then does the actual interesting work of writing the result out to your screen, into a file, or into an internal buffer. When you want to change the behavior of an output stream, what you actually need to do is make a new streambuf child class.

The std::ofstream class, for instance, creates a new std::filebuf object and associates it with the opened file. The ofstream class also offers a few other convenience methods on top of the base std::ostream, but all of these methods actually interact with the filebuf object.

To associate a streambuf object with an ostream, you can call the ostream::rdbuf() method. If called with no arguments, it returns the current streambuf. If called with a pointer to a streambuf, it sets that as the current streambuf. You can also pass a pointer to a streambuf to the constructor for an ostream. For example, let’s mimic ofstream using just ostream and filebuf.

filebuf file;
file.open("myfile.txt", ios::out);
ostream os(&file);
os 

That could behaves identically to code that uses ofstream. The only difference there is that the methods like open and close must be called on the filebuf object instead of the ostream object.

There is one catch to be wary of. The ostream class will not manage the memory for its streambuf object. That’s fine for the example above, but if you had created the filebuf object using the new operator, you would have to remember to delete the pointer yourself when you’re done.

Buffered and Unbuffered Output

There are two kinds of output you can perform using the std::streambuf class: buffered and unbuffered. Buffered output is when all data is stored temporarily in a buffer. The data is only sent to the actual output destination when the buffer fills up, or when the output stream is flushed. Unbuffered output sends all data to the output destination immediately.

When you create a new streambuf instance, it is by default unbuffered. If you wish to make it buffered, you must create a buffer for it, and then tell the streambuf about your buffer using the setp method, which is protected. So, to create a buffered streambuf object using a 100 character buffer:

class mybuf : public streambuf {
public:
  mybuf () {
    setp(buffer, buffer + sizeof(buffer));
  }

private:
  char_type buffer[100];
};

And voila! Your streambuf descendent is now buffered using your 100 character array. That’s all there is to it.

Note that buffer memory is not managed by the streambuf class. If you allocate a buffer with new, you are responsible for deleting it.

Custom Unbuffered Streams

You want to write a log stream facility that sends output both to cerr (standard error output) as well as a file, mylog.txt. It’s easy enough to do either - just stream your data to either cerr or a ofstream - but you’d rather not write each stream command twice. Writing a very simply streambuf class that performs both for you is, thankfully, quite easy.

A virtual method called xsputn is called on a streambuf whenever there is data to write. You need only override that one function to create a custom unbuffered streambuf. The function takes a pointer to an array of characters and the length of the array, and is expected to return the number of characters it was able to write. Since we’re just passing this on to a couple other streams we just return the length of the buffer given us.

// your log file, lazily declared as a global
ofstream logfile;

// logbuf forwards all output to cerr and logfile
class logbuf : public streambuf {
private:
  // write a string s of length n to standard
  // error and a log file
  int xsputn (char_type* s, streamsize n) {
    cerr.write(s, n);
    logfile.write(s, n);
    return n;
  }
};

int main () {
  // open our log file
  logfile.open("mylog.txt", ios::app);

  // create our log stream
  ostream log(new logbuf());

  // be friendly
  log 

That’s the gist of what you need, and nothing more. Pretty simple, eh? We could improve things a little further. For example, our logbuf object is leaked - we never delete it. That isn’t really vital for this example, since the memory is reclaimed at the end of the function anyhow, but we should handle it properly anyway. More importantly for our little example, however, we don’t control buffering properly. Our logbuf is unbuffered, but both cerr and logfile are buffered. We would expect the

class logbuf : public streambuf {
private:
  // flush both cerr and logfile; return 0 to
  // indicate there was no error, but we're
  // too lazy to check for errors ourselves
  int sync () {
    cerr.flush();
    logfile.flush();
    return 0;
  }

  // write a string s of length n to standard
  // error and a log file
  int xsputn (char_type* s, streamsize n) {
    cerr.write(s, n);
    logfile.write(s, n);
    return n;
  }
};

There, flushing is now supported!

I’m going take this moment to explain the char_type and streamsize types used above. streambuf::char_type is a typedef for the actual character type in use, which for streambuf would be char. The wstreambuf type is identical to streambuf, except it works with wchar_t (wide character support, for unicode), and char_type is different for that class. The streamsize type is similar in purpose to size_t - it’s just a typedef for the particular type of integer your STL implementation chose, and using streamsize makes your code portable.

Custom Buffered Streams

Unbuffered streams are great, but they’re not always what you’re looking for. Say that you are writing a network stream. You don’t want to call send() over and over for performance reasons; you’d rather buffer up your output and send it all at once. Once we set the buffer with setp, the streambuf class will do all the work of putting characters into the buffer and protecting against overruns. We don’t need to implement our own xsputn at all, since the default implementation does exactly what we want. We can simply override the sync method to take our buffer contents, write them to the socket, and then clear the buffer.

class sockbuf : public streambuf {
public:
  // initialize our sockbuf with a socket
  // descriptor, and setup a new buffer
  sockbuf (int _sockfd) : sockfd(_sockfd) {
    char_type* buf = new char_type[1024];
    setp(buf, buf + 1024);
  }

  // free our buffer
  ~sockbuf () {
    delete[] pbase();
  }

private:
  // dump our buffer to the socket and clear
  // the buffer
  int sync () {
    // for brevity's sake, not doing proper error
    // handling; we return a non-zero value(error)
    // if we failed to send the full buffer contents
    int ret = send(sockfd, pbase(), pptr() - pbase(), 0);
    if (ret != pptr() - pbase())
      return 1;
    // reset the buffer
    setp(pbase(), epptr());
    return 0;
  }

  // our socket descriptor
  int sockfd;
};

We’ve got a few new functions there. The pbase() method returns a pointer to the beginning of the buffer. The pptr() method returns the current position of the stream in the buffer. So, the number of characters in the buffer is equal to pptr() minus pbase().

Unfortunately, this little class has a problem. Our buffer is only 1024 characters long, and the data is only written out when the flush method is called on the ostream using this streambuf. When that buffer fills up, any further data we try to stream is just lost. It would be ideal if we could instead grow the buffer or try to flush the data we already have in the buffer. Let’s try growing the buffer.

When our buffer fills up, the overflow() method is called. This method takes the character that didn’t fit into the buffer as a parameter, and returns either EOF to indicate failure or any other value to indicate success. We’re just going to grow our buffer by 1024 elements and then call the standard sputc() function to add the character into the buffer.

class sockbuf : public streambuf {
public:
  // initialize our sockbuf with a socket
  // descriptor, and setup a new buffer
  sockbuf (int _sockfd) :
      sockfd(_sockfd), buf(0), buflen(1024) {
    buf = new char_type[buflen];
    setp(buf, buf + buflen);
  }

  // free our buffer
  ~sockbuf () {
    delete[] buf;
  }

private:
  // dump our buffer to the socket and
  // clear the buffer
  int sync () {
    // for brevity's sake, not doing proper error
    // handling; we return a non-zero value(error)
    // if we failed to send the full buffer contents
    int ret = send(sockfd, pbase(), pptr() - pbase(), 0);
    if (ret != pptr() - pbase())
      return 1;
    // reset the buffer
    setp(pbase(), epptr());
    return 0;
  }

  // we ran out of space, so grow
  // the buffer
  int overflow (int c) {
    // allocate a new buffer and copy our
    // current data into it, then swap it with
    // the old buffer
    char_type newbuf[buflen + 1024];
    memcpy(newbuf, buf, buflen);
    delete[] buf;
    buf = newbuf;

    // now we need to stuff c into the buffer
    sputc(c);
    return 0;
  }

  // our socket descriptor
  int sockfd;

  // our buffer
  char_type* buf;
  unsigned long buflen;
};

And there we have it. Our sockbuf class can now buffer up data until flushed without losing any data.

Complex Example

Alright, let’s go ahead and totally abuse the system now. We want to use our log class from before, but we want all of our log lines to include a date and time at the start as well as a log priority, but we don’t want to have to stream the time to log over and over. We’d like to be able to write code like this:

log 

There are a few important things going on here. First, we are setting the priority by streaming out a special priority value (e.g., DEBUG, ERROR). Note that on the third line we didn’t stream a priority, but the ERROR priority from the prior line didn’t carry over. It’s a piece of state that we’ll reset when a flush (or endl) occurs on the stream.

We could do this as an unbuffered stream. We would just write an xsputn method that wrote the time and then the log message. However, think what would happen in this example:

log 

We’d actually end up with the time and priority printed four times in the single line: once before “The,” once before the user’s name, once before ” logged,” and then a final time just before the newline. We wil need to buffer our output and then only write the time and the log once for each flush.

We’re also going to assume that flush won’t be called on our log directly, but instead we’ll always use endl. That way we know that our stream will contain a newline and we don’t need to worry about one ourself.

We’re also going to actually override ostream this time. We actually need to do that to get the priority feature to work, plus it’s kind of a pain to have to create an ostream object and then call rdbuf() on it all the time, and a custom ostream allows us to hide that in the constructor.

First, the priorities. This will just be a simple enum.

enum LogPriority {
  INFO, // regular unimportant log messages
  DEBUG, // debugging fluff
  ERROR, // it's dead, jim
};

Because it’s an enum, we can create a special

Our logbuf derived from streambuf should be fairly old news by this time. In fact, it's identical to our sockbuf above, except when sync() is called we spit out the time and the log message to cerr and a logfile ofstream, whch this time around we'll make a member. We also keep the current priority level, which defaults to INFO.

class logbuf : public streambuf {
public:
  // create a buffer and initialize our logfile
  logbuf (const char* logpath) :
      priority(INFO), buf(0), buflen(1024) {
    // create our buffer
    buf = new char_type[buflen];
    setp(buf, buf + buflen);

    // open the log file
    logfile.open(logpath, ios::app);
  }

  // free our buffer
  ~logbuf () {
    delete[] buf;
  }

  // set the priority to be used on the
  // next call to sync()
  void set_priority (LogPriority p) {
    priority = p;
  }

private:
  // spit out the time, priority, and the
  // log buffer to cerr and logfile
  int sync () {
    // nifty time formatting functions
    // from the C standard library
    time_t t = time();
    tm* tmp = localtime(&t);
    char timebuf[128];
    strftime(timebuf, sizeof(timebuf),
      "%Y-%m-%d %H:%M:%S", tmp);

    // now we stream the time, then the
    // priority, then the message
    cerr 

Well, there’s that. Now we need our customized ostream-derived class so that we can stream the LogPriority values and get the desired behavior. We’ll keep a logbuf object as a member variable, which we’ll setup as the streambuf for ostream in the constructor.

class logstream : public ostream {
public:
  // we initialize the ostream to use our logbuf
  logstream (const char* logpath) :
    ostream(&buf), logbuf(path) {}

  // set priority
  void set_priority (LogPriority pr) {
    buf.set_priority(pr);
  }

private:
  // our logbuf object
  logbuf buf;
};

// set the priority for a logstream/logbuf
// this must be a global function and not a
// member to work around C++'s type
// resolution of overloaded functions
logstream& operator

And there you have it! You can now easily create log and use logstreams. You can even have multiple such streams, giving each its own log file.

int main () {
  logstream log("logfile.txt");

  log 

Closing Thoughts

C++ streams are fairly simple to implement. We took a few liberties in the examples above with sockets and memory handling, but the core concepts of writing a custom output streambuf are there.

The complete source to the logstream example is available here.

Syndicated 2007-12-10 23:34:28 from Sean Middleditch

LLVM Development

So, I decided to go ahead and sink into LLVM and Clang. Feels good to be back in the wider development community, even if time spent on Clang is eating up paid-time for my work projects.

The Clang code is huge, and the subject material is complciated, but the code is surprisingly clean and the comments are generally fairly useful. It’s completely awesome compared to hacking on “professional” PHP scripts where the original coders didn’t understand basic concepts or understand how to write useful comments or function names.

Granted, the few tiny patches I’ve sent in to Clang have so far been not quite right, but I’m still learning the guts of how a C compiler works. There’s a big gap between understanding the various effects on code generation between using a short and a long and understanding how the compiler actually generates the the code. For example, the bug I’m currently working on has to do with padding between struct fields, which is something I knew about and something I’ve worked wirh before (reordering fields to reduce the total amount of padding), but making a compiler track that padding, calculate the correct amount based on type and architecture, and so on isn’t something I’ve ever needed to know before. Writing a generic interpreted scripting engine on a custom byte-code VM and writing a standards compliant and system ABI compatible C compiler are worlds apart.

Still, actually learning how Clang works is fairly easy, if time consuming. It’s huge, but it’s well written.

I look forward to submitting a patch for the struct padding issue I’m running into, and maybe even having that patch do everything correctly. Which might be hard, given I can only test on a small handful of architectures (x86, amd64, ppc32).

Syndicated 2007-12-10 17:11:00 from Sean Middleditch

How To Write a TELNET Server or Client

Introduction

TELNET is a protocol designed way back when dinosaurs still roamed the earth, chasing cavemen and operating large mainframe computers connected to remote line printers. The protocol doesn’t see much use in mainstream computing, although it’s still popular on various IBM mainframe installations and, which you the intrepid reader of my humble blog are more interested in, text-based Multi-User Dungeons and similar online games.

Modern TELNET clients are vastly different than the line printers of yore. We have graphical terminals in which our screen can update instantly, and the drawing cursor can move about freely painting characters anywhere it wants in an assortment of colors and styles. Line printers, on the other hand, simply printed horizontally, occassionally chugging down a line and continuing onward. We generally don’t care about those anymore, though. Very few people writing a TELNET server today are expecting their client to be using a line printer. Most of those writing TELNET apps today are probably writing MUDs, or modifying MUDs, or writing a client for MUDs.

“So,” you ask, “how does one write a TELNET server or client, anyway?” The answer is thankfully quite simple. I am going to assume that a familiarity with the basic networking APIs is already possessed, but if not, a few minutes on Google should help.

Basic Concepts

For the most part, TELNET is simply nothing more than sending characters back and forth between the client and the server. Old line printers and networks were half-duplex, meaning that only one side could send data at a time, and the other side had to wait for permission to send. While the protocol still technically uses those rules, they are ignored by all MUD software, as well as most general TELNET software, so we won’t worry about those. A server which simply sends ASCII text to its client and receives ASCII text in response is a completely functional TELNET server, and a client likewise is the same in reverse.

There is more to TELNET, howerver. TELNET offers a variety of options, which range from options like enabling full-duplex mode (not really necessary these days) up to controlling the display of what a user types on his screen. TELNET does not control things like cursor positioning or text color, however. Those are a separate protocol, which I’ll touch on briefly later in this article.

From the perspective of MUD developers, possibly the most interesting feature of TELNET is the ability to control the display of what a user types. Normally, when a user types a key in his TELNET client, it is immediately displayed on his screen, and then sent to the server. This makes typing nice and quick. However, sometimes the server wants more advanced control of the display of input, such as to synchronize it with its own rendition of the screen… or to suppress it entirely, such as when a user is entering his password.

TELNET makes use of a simple control code scheme. In a way, you can this of this as being analogous to the \ escape sequences found in almost all programming languages. For example, \n produces a newline in a string, while \\ must be used in order to create a single backslash in the string. TELNET does the same thing, except instead of a \, it uses a special value called IAC (Interpret As Command), which is equal to the number 255. (You might note that 255 is the largest integer that can be stored in a single 8-bit byte.)

When operating in half-duplex mode, for example, one end of the communication must send the GA (go ahead) signal to let the other end know that it can begin sending. This is done by sending two bytes over the network pipe, IAC GA (255 249). If the client used the interupt key (control-C), the client might send the interrupt signal to the server by sending the two bytes IAC IP (255 244).

Normally TELNET is not 8-bit clean. That means that the plain text data sent between the two ends can only be 7-bit ASCII values. It is possible to put TELNET into binary mode, however, which allows the use of 8-bit values. In this case, it may be necessary to send the value 255, but not have it interpreted as a TELNET command. Just like the \ escape sequence, this is done by doubling up the special character. So, to send the value 255 and not have it processed specially, send the two bytes IAC IAC (255 255).

Option Negotiation

Being able to send 8-bit data over the TELNET connection is pretty handy. It lets you support non-ASCII character encodings, like UTF-8 or ISO-8859-1. You know how to properly send the character value 255 without confusing TELNET, but the client or server you’re talking to keeps doing funny things when you send it values over 127 (that is, any value that doesn’t fit in 7 bits). That’s because first you must negotiate the BINARY option with the remote end of the connection.

This is where TELNET option negotation comes in. TELNET has four special codes for negotiating options: WILL, WONT, DO, and DONT. These commands are a little different compared to normal commands. They start with the special IAC value just like other TELNET commands, but they are three bytes long instead of two. The third byte sent is the option code for the option you are negotiating. The BINARY option is code 0.

So, what do those four negotiation commands mean, exactly? Each has two meanings, based on context. For example, the WILL negotation either means, “I am willing to use this option, if you are,” or it can also mean, “I am acknowledging your request to begin using this option.” Let’s say that you are writing a client that wishes to enable BINARY mode. You must ask the server if it would like to do so, by sending it the sequence IAC WILL BINARY. The server will then respond with one of two commands: either IAC DO BINARY (”I accept”) or IAC DONT BINARY (”I refuse”). If the server accepted, your client is now free to send 8-bit data to the server.

However, the server is not at this point permitted to send 8-bit data to the client. The server might request it in the same fashion as the client, but with the roles reversed. On the other hand, the client could request that the server start sending 8-bit data by telling the server to enable the BINARY option. This is done by using the DO command, by sending the bytes IAC DO BINARY. The server will then respond with either IAC WILL BINARY (”I accept”) or IAC WONT BINARY (”I refuse”).

All TELNET option negotiation works this way. One end either advertises that it capable of using the option with WILL or requests the other end to use the option with DO, and the other end responds in the affirmative or negative. However, things can get a little more complicated. Let’s say we have a naive client talking to a naive server. The client wants to enable BINARY mode, so it sends IAC WILL BINARY. The server accepts, and responds with IAC DO BINARY. The client however, being naive and incomplete, doesn’t know if the server is acknowledging a prior request or initiating a new request. The client assumes it might be initiating a new request, and sends the appropriate response IAC WILL BINARY. The server, also naively written, believes the command to be a new request, and responds with IAC DO BINARY. The client and server are now sending back these two commands over and over, eating up bandwidth and not really accomplishing much.

For this reason, a complete TELNET implementation must track the state of each option for both the local end and the remote end. Each option has three states: enabled, disabled, or unknown. This can be implemented with two 256-element arrays containing an enum denoting the enabled/disabled/unknown state of the option. All elements in both arrays are initialized to unknown. An option with a value of unknown is effectively disabled, but there is more to it, and yes, the local option set also needs the unknown state. Say that you have a client talking to a buggy server that requests the BINARY option be enabled, but doesn’t actually support the option and gets into the infinite loop described above. The server sends the IAC DO BINARY sequence. Your improved, non-naive client looks at its local option array and sees that the BINARY option is set to unknown. The client now enabled the option and responds with IAC WILL BINARY. The buggy server responds with IAC DO BINARY. The client sees, however, that the option is already enabled. Therefor, it does not need to send a response. This effectively breaks the loop caused by the buggy server. Additionally, the client can look at the array of server options and effectively knows not to send a request or response to a server option that is already set to enabled or disabled.

Now, in practice, it is usually not necessary to use all those arrays of flags. Most MUD servers and clients do not, and they work just fine. This is particular because most of the options used in MUDs are “one way” options; that is, only one end of the connection ever requests them. A MUD client generally never ends an initial IAC DO ECHO (which would tell the server to echo everything the client sends back to the client), so when the client receives an IAC WILL ECHO it knows that the server is requesting to enable the option itself, and the command is not a response acknowledging that the client begin echoing data back to the server (which would pretty badly break things for the client). So long as the client never talks to a really broken server, a client could get by with just a handful of flags for the options it supports. The same goes for the server. Just be careful for any options that both the server and client use (like BINARY) to make sure you only respond when the other end is requesting the option, and not when the other end is acknowledging the option, and your application will work just fine so long as the other end isn’t totally broken.

There is a general rule that will help for implementations that don’t use the full 256-element arrays. When receiving a request to enable an option your application does not support, always refuse. When receiving a request to disable an option your application does not support, don’t respond at all.

Let’s take a brief look at that ECHO option. ECHO is option code 1. For MUDs, and in truth most TELNET applications, the server is the only end that ever performs echoing. If a client echoed back everything the server sent to it then it would probably result in another infinite loop. The server would say something, the client would send it back, the server would interpret that as a command and say something back (possibly just an “unknown command” error), and the client would echo that back to the server, which would interpret it as a command… bad stuff.

However, it’s generally pretty nice when the user types something in and whatever he typed shows up on his screen. A TELNET client will generally always prefer this, and by default it will print anything anything the user types on the user’s screen. A server will sometimes want to disable this, most commonly when it is requesting a password. However, TELNET has no option for “hide the user’s input.” Instead, we have to use a sneaky trick. If the server sends an IAC WILL ECHO, that means that it is willing to echo back everything the user types. Pretty much all clients will agree to this, and they will respond with a IAC DO ECHO. At this point, the client no longer prints the keys the user types in. The client is expecting the server to do this itself. However, nothing actually requires the server to do so. It could echo back the user’s input after transforming it (turning it into stars), echoing it verbatim, or just echoing nothing. When the server is finished retrieving the user’s password, it then tells the client that it no longer wants to echo by sending IAC WONT ECHO. The client then acknowledges this with IAC DONT ECHO, and will start displaying what the user types in again.

Note: the Windows TELNET client is notoriously broken with its handling ECHO. The client will gladly accept when the server sends IAC WILL ECHO, but when the server sends IAC WONT ECHO, the Windows client will not start echoing local characters any more. Also note that, unlike almost every other client, the Windows client only operates in character mode. That means that each character is sent to the server as it is typed, while most clients only send whole lines. There are ways to tell a client to go into character mode or into line mode, but the Windows client only supports character mode.

So, now you have option negotiation working, as well as 8-bit support with proper escaping. However, you’ve heard about this NAWS thing, which lets your client tell the server how big the display window is so that the server can do fancy layout. NAWS is option code 31. A server that wants window size information will send an IAC DO NAWS, and a client which supports it will respond with IAC WILL NAWS. But… now what?

Sub Options

Option negotation is only capable of enabling or disabling an option. However, some options, like NAWS, control features which need to be able to send more complex data using the protocol. The NAWS feature needs a way for the client to tell the server the number of rows and columns in the client’s display.

For features like these, TELNET uses the SB command, which is called a “sub option.” SB is code 250. This command is rather special. It starts with three bytes: IAC, SB, and then the option code, such as NAWS. It is then followed by an arbitrary number of bytes, which we’ll call the payload. End of the of the sub option is marked with the two byte sequence IAC SE. SE is code 240. So, what do those bytes between the initial three byte sequence and the ending two byte sequence mean? Well, it depends on the option.

NAWS send two 16-bit integers as its sub option payload. Each integer is in network byte order. The first integer is the number of columns (width), and the second integer is the number of rows (height). So, a client with 80 columns and 24 rows would, after the NAWS option has been enabled with option negotiation, send the byte sequence IAC SB NAWS 0 80 0 24 IAC SE.

One must be careful when writing code to handle sub options. A very large number of MUD servers and clients do not do this properly. Let us pretend, for a moment, that a user has some particularly large terminal… say, 255 columns and 61440 rows. The NAWS sub option byte sequence would be IAC SB NAWS 0 255 240 0 IAC SE. However, remember that IAC is 255 and SE is 240. That means that the bytes are equivalent to IAC SB NAWS 0 IAC SE 0 IAC SE. See the problem? Correlcty implemented software will parse that as a sub option with a single byte in its payload, followed by a zero byte and then an IAC SE sequence, which is illegal. Plus, the NAWS sub option would be the wrong size, which is also illegal. The correct thing for the client to do is to escapse that byte equal to 255 with a double IAC sequence, just like the \\ escape. So the correct thing for the client to send woul dbe IAC SB NAWS 0 IAC IAC 240 0 IAC SE. While it looks like the payload is 5 bytes, the server would convert the IAC IAC into a single byte equal to 255 in the buffer it stores the sub option payload in. However, many incorrectly written MUD servers do not do this; after receiving IAC SB NAWS, they then look for exactly 4 bytes for the payload (ignoring the values of those bytes, even if they contain 255), and then immediately expects IAC SE (sometimes they don’t even check that they actually get IAC SE, they simply read in two bytes and call it done). It is thus impossible to write a client that will be able to handle this situation both with correct servers (which require that the IAC be escaped) and incorrectly written servers (which require that the IAC not be escaped).

Fortunately, the scenario is rather unlikely to occur. There is little benefit in a client that display 61440 lines, even if your screen could somehow handle it. Furthermore, while the proper escaping of IAC bytes within a sub option payload is essential for some options, almost none of those are used in MUDs and thus they should never be sent to those poorly written MUD servers. However, if you’re writing new software, even for a MUD, it is a very good idea to correctly process all sub option commands. The correct way to handle NAWS is to use a buffer to read in the sub option payload (performing IAC escaping as you do so), and once the IAC SE is read, to then check that the payload buffer has exactly 4 bytes in it before processing the command.

Alright, so now you have all the low-level TELNET machinery working, and you’re even supporting cool things like window size notification. Now there’s that tricky deal with actually displaying and sending text properly. See, while I wasn’t lying when I said TELNET just sends raw text back and forth for input and output, there are a few tricks to how that text is interpreted, especially if you want to support fancy colors and stuff.

Newlines

Welcome to the shortest section of this article! TELNET newlines are expected to be the two-byte sequence CR LF. That’s byte values 10 and 13, or \r and \n. Just sending \n by itself or just \r or sending \n \r may cause some funny things to happen.

When reading and writing text files, a newline is usually represented by just a plain LF, or \n. Even on systems that store a CR LF sequence in text files, like Windows, the standard file I/O facilities will automatically translate back and forth between \n and \r \n when reading and writing text files. However, when you are displaying text to a terminal, even on systems like UNIX, the terminal might be in a mode in which a solo LF (line feed) only does what it was originally meant to: cause the cursor (the print head in old line printers) to move down a line, but not return to the start of the line. The CR (carriage return) character tells the cursor (or print head) to return to the beginning of the current line. So, in order to move down to the beginning of the next line, you’d need to send \r \n (or \n \r).

TELNET, being a protocol designed specifically for driving those old line printers, works the same way. Even on modern systems, many clients will treat a solo LF as just a line feed, and many servers will not recognize a solo \n as being the end of a command. So, if you’re writing a server and your output only uses a \n for newlines, be sure to translate those into \r \n (CR LF) when you send the data to the client. If you’re writing a client, be sure to send \r \n whenever the user hits enter.

Now, getting newlines to work properly is thrilling and all, but even more thrilling is that there’s not much more to say about TELNET itself. Sure, there are some extra commands to learn (the GA command used in half-duplex mode, which TELNET is in by default, can be handy to learn about, especially since MUDs use it for some fun tricks), and some other options that can be useful, but there’s no more actual protocol machinery to learn about. Now it’s on to colors and cursor control, which isn’t actually a part of TELNET at all.

ANSI Terminal Escapes

First, let’s look at terminal types. See, a long time ago (and, actually, right now, too) there were a bazillion different line printer and graphical terminal products on the market. Infuriatingly, pretty much every single one had its own proprietary protocol for controlling its special features. Even two terminals made by the same company would often have different (though usually similar) protocols for controlling color codes, cursor positioning, and other features. Even today, the text console on Linux uses a slightly different protocol than the text console on various other UNIX and UNIX-like operating systems, which themselves are all different from each other. Graphical terminal emulators, which is what most of us are using (and which includes your average MUD client), can have their own protocols, too. The modern xterm variation (xterm is a standard graphical terminal emulator for Linux/UNIX systems) is very slightly different than the popular terminal emulator I use on my Linux desktop, for example.

In order to properly handle all of these different terminals, a TELNET server would need to ask the client what kind of terminal they are using (yes, there is an option in TELNET for this), and then consult a library that maps common operations, like “clear the screen,” into the proper sequence of control codes for that particular terminal type. If you’re writing a real TELNET server, you’re going to need to get familiar with the termcap and/or terminfo libraries, as these provide those services.

MUD servers and clients, however, don’t need to care about such things. See, all modern terminal types, while they have slight differences, are based off of the ANSI terminal specification. This specification includes a number of common control codes, like setting terminal color, clearing the screen, or moving the drawing cursor to a specific position in the terminal window. A MUD server need only support these ANSI terminal codes, and can safely assume all clients support them as well. Most regular TELNET clients, running on modern terminal emulators, will do so. Most users of MUDs use a specialized MUD client which will interpret the control codes itself, and then translate those into whatever commands are appropriate for the display, so even a MUD client running on some non-standard terminal will still be compatible with a MUD server that only uses ANSI control codes. So, we’re just going to talk about ANSI control codes from now on.

Now that we’ve gone through three paragraphs of boring and mostly useless exposition, let’s get to the meat of things!

All control codes begin with the ESC character (\e, or 27). In general, the ESC character will be followed by a [ (left-hand square bracket), and then possibly by a command payload, followed by an ASCII letter denoting the actual escape command. For example, to clear the screen, you might use ESC [ 2 J ESC [ H. The J command does various screen-related actions, and the payload 2 is what tells the J command to clear the screen. That technically just clears the screen, though, leaving the cursor at whatever position on the screen it was already at. The H command tells the cursor to return to the upper left corner of the screen (Home).

Setting the color, and an assortment of other visual display settings, is done with the m command (Mode). The payload for the m command is one or more numeric values, separated by a comma. The value 0 means “reset the display settings to the default.” The value 31 means “set the text color to red.” So, to display the phrase “Red Baron” with the word “Red” in the color red and the rest in the default color, the server would send ESC [ 31 m R e d ESC [ 0 m _ B a r o n. (The _ represents a space.)

Remember that you can include multiple values in your payload for the m command. If you want to display something in green (code 32) and wanted to make sure that all otehr display mode settings, like background colors, were disabled, you would sent ESC [ 0 ; 3 2 m.

You can set the cursor position using that H command we saw before. Simply provide the row and column, separated by a semi-colon, in the payload. So, to move the cursor to the second row at column 20, send ESC [ 2 ; 2 0 H.

That’s pretty much the gist of ANSI control codes. You can find a fairly complete list of codes here.

Remember that not all commands have a [ after the ESC character. A decent strategy for parsing these control codes on the client end is to look at the first character after each ESC. If the character is a [ then keep buffering input until a letter character is received, then process the buffer. If the character after the ESC is not an [, then immediately process the command.

For MUD servers, it is a good idea to also include a basic ANSI control code parser, solely for the purpose of stripping such codes out of input sent by users. While you’re at it, be sure to strip out lone CR or LF characters not part of a newline, the BEP character (\a), and other character codes. Imagine a user who sends a command line to your server with “say ” followed by a couple dozen BEP characters in it - every other player in the room will be treated to a long series of annoying beeps (if their client is supports it, which some do). You can just strip out every non-printable character, which is any code less than 32 (or just is the C isprint macro from ctype.h). On a similar note, remember to always escape IAC bytes in your output that aren’t meant to be a part of a TELNET command, otherwise malicious users might find interesting ways to break other users’ clients using commands that let them send text to other players.

And that’s all, folks!

Syndicated 2007-12-10 04:48:22 from Sean Middleditch

Squirrel Wants To Be Lua

Squirrel is another language I took a look at this morning. Squirrel is essentially an offshoot of Lua, being written by a games developer who was dissatisfied with some of Lua’s shortcomings in older (pre-5.1) Lua releases.

The biggest change one will notice between current Lua and Squirrel is that Squirrel has a built-in class mechanism. Unfortunately, the class system is single-inheritance only with no mixins or interface support, so developing larger applications would not be overly easy to do with Squirrel. This is probably just fine given that Squirrel seems geared more towards embedding than application authoring, just like it’s conceptual ancestor, Lua.

I think the language comparison page for Squirrel (which only compares against Lua) best explains Squirrel. I quote:

Lua has an established and growing set of 3rd party libraries. That’s the biggest problem with Lua spin-offs : you trade in compatibility with everything written for Lua (see http://lua-users.org/wiki/LibrariesAndBindings) for some syntactic sugar and a feature or two that will be implemented in some future Lua version anyway…

True enough, while Lua might not be intended for developing complete applications, it’s got enough addons and extensions to make it possible. Squirrel is severely lacking in such things. There isn’t any standard way to do networking, for example, which is a big requirement for this project. I could write a core C module that embeds Squirrel and adds the extra routines I need, but then I might as well just use a more well known language like JavaScript on SpiderMonkey, or just embed Lua.

One thing I do however very much like about Squirrel over both JavaScript and Lua is that variables aren’t automatically declared in the global namespace. With either JS or Lua, if you mistype a variable name in an assignment, you not only silently get a new variable, but it’s a global variable. Yuck! In Squirrel, assigning to an undeclared variable results in an error.

If you’re looking for a Lua-like runtime to embed with a syntax closer to JavaScript or C++, check Squirrel out, it’s just what you’re looking for. If that syntax isn’t important to you though, just use Lua instead.

Syndicated 2007-12-05 22:26:25 from Sean Middleditch

360 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!