Older blog entries for elanthis (starting at number 362)

LLVM Development

So, I decided to go ahead and sink into LLVM and Clang. Feels good to be back in the wider development community, even if time spent on Clang is eating up paid-time for my work projects.

The Clang code is huge, and the subject material is complciated, but the code is surprisingly clean and the comments are generally fairly useful. It’s completely awesome compared to hacking on “professional” PHP scripts where the original coders didn’t understand basic concepts or understand how to write useful comments or function names.

Granted, the few tiny patches I’ve sent in to Clang have so far been not quite right, but I’m still learning the guts of how a C compiler works. There’s a big gap between understanding the various effects on code generation between using a short and a long and understanding how the compiler actually generates the the code. For example, the bug I’m currently working on has to do with padding between struct fields, which is something I knew about and something I’ve worked wirh before (reordering fields to reduce the total amount of padding), but making a compiler track that padding, calculate the correct amount based on type and architecture, and so on isn’t something I’ve ever needed to know before. Writing a generic interpreted scripting engine on a custom byte-code VM and writing a standards compliant and system ABI compatible C compiler are worlds apart.

Still, actually learning how Clang works is fairly easy, if time consuming. It’s huge, but it’s well written.

I look forward to submitting a patch for the struct padding issue I’m running into, and maybe even having that patch do everything correctly. Which might be hard, given I can only test on a small handful of architectures (x86, amd64, ppc32).

Syndicated 2007-12-10 17:11:00 from Sean Middleditch

How To Write a TELNET Server or Client

Introduction

TELNET is a protocol designed way back when dinosaurs still roamed the earth, chasing cavemen and operating large mainframe computers connected to remote line printers. The protocol doesn’t see much use in mainstream computing, although it’s still popular on various IBM mainframe installations and, which you the intrepid reader of my humble blog are more interested in, text-based Multi-User Dungeons and similar online games.

Modern TELNET clients are vastly different than the line printers of yore. We have graphical terminals in which our screen can update instantly, and the drawing cursor can move about freely painting characters anywhere it wants in an assortment of colors and styles. Line printers, on the other hand, simply printed horizontally, occassionally chugging down a line and continuing onward. We generally don’t care about those anymore, though. Very few people writing a TELNET server today are expecting their client to be using a line printer. Most of those writing TELNET apps today are probably writing MUDs, or modifying MUDs, or writing a client for MUDs.

“So,” you ask, “how does one write a TELNET server or client, anyway?” The answer is thankfully quite simple. I am going to assume that a familiarity with the basic networking APIs is already possessed, but if not, a few minutes on Google should help.

Basic Concepts

For the most part, TELNET is simply nothing more than sending characters back and forth between the client and the server. Old line printers and networks were half-duplex, meaning that only one side could send data at a time, and the other side had to wait for permission to send. While the protocol still technically uses those rules, they are ignored by all MUD software, as well as most general TELNET software, so we won’t worry about those. A server which simply sends ASCII text to its client and receives ASCII text in response is a completely functional TELNET server, and a client likewise is the same in reverse.

There is more to TELNET, howerver. TELNET offers a variety of options, which range from options like enabling full-duplex mode (not really necessary these days) up to controlling the display of what a user types on his screen. TELNET does not control things like cursor positioning or text color, however. Those are a separate protocol, which I’ll touch on briefly later in this article.

From the perspective of MUD developers, possibly the most interesting feature of TELNET is the ability to control the display of what a user types. Normally, when a user types a key in his TELNET client, it is immediately displayed on his screen, and then sent to the server. This makes typing nice and quick. However, sometimes the server wants more advanced control of the display of input, such as to synchronize it with its own rendition of the screen… or to suppress it entirely, such as when a user is entering his password.

TELNET makes use of a simple control code scheme. In a way, you can this of this as being analogous to the \ escape sequences found in almost all programming languages. For example, \n produces a newline in a string, while \\ must be used in order to create a single backslash in the string. TELNET does the same thing, except instead of a \, it uses a special value called IAC (Interpret As Command), which is equal to the number 255. (You might note that 255 is the largest integer that can be stored in a single 8-bit byte.)

When operating in half-duplex mode, for example, one end of the communication must send the GA (go ahead) signal to let the other end know that it can begin sending. This is done by sending two bytes over the network pipe, IAC GA (255 249). If the client used the interupt key (control-C), the client might send the interrupt signal to the server by sending the two bytes IAC IP (255 244).

Normally TELNET is not 8-bit clean. That means that the plain text data sent between the two ends can only be 7-bit ASCII values. It is possible to put TELNET into binary mode, however, which allows the use of 8-bit values. In this case, it may be necessary to send the value 255, but not have it interpreted as a TELNET command. Just like the \ escape sequence, this is done by doubling up the special character. So, to send the value 255 and not have it processed specially, send the two bytes IAC IAC (255 255).

Option Negotiation

Being able to send 8-bit data over the TELNET connection is pretty handy. It lets you support non-ASCII character encodings, like UTF-8 or ISO-8859-1. You know how to properly send the character value 255 without confusing TELNET, but the client or server you’re talking to keeps doing funny things when you send it values over 127 (that is, any value that doesn’t fit in 7 bits). That’s because first you must negotiate the BINARY option with the remote end of the connection.

This is where TELNET option negotation comes in. TELNET has four special codes for negotiating options: WILL, WONT, DO, and DONT. These commands are a little different compared to normal commands. They start with the special IAC value just like other TELNET commands, but they are three bytes long instead of two. The third byte sent is the option code for the option you are negotiating. The BINARY option is code 0.

So, what do those four negotiation commands mean, exactly? Each has two meanings, based on context. For example, the WILL negotation either means, “I am willing to use this option, if you are,” or it can also mean, “I am acknowledging your request to begin using this option.” Let’s say that you are writing a client that wishes to enable BINARY mode. You must ask the server if it would like to do so, by sending it the sequence IAC WILL BINARY. The server will then respond with one of two commands: either IAC DO BINARY (”I accept”) or IAC DONT BINARY (”I refuse”). If the server accepted, your client is now free to send 8-bit data to the server.

However, the server is not at this point permitted to send 8-bit data to the client. The server might request it in the same fashion as the client, but with the roles reversed. On the other hand, the client could request that the server start sending 8-bit data by telling the server to enable the BINARY option. This is done by using the DO command, by sending the bytes IAC DO BINARY. The server will then respond with either IAC WILL BINARY (”I accept”) or IAC WONT BINARY (”I refuse”).

All TELNET option negotiation works this way. One end either advertises that it capable of using the option with WILL or requests the other end to use the option with DO, and the other end responds in the affirmative or negative. However, things can get a little more complicated. Let’s say we have a naive client talking to a naive server. The client wants to enable BINARY mode, so it sends IAC WILL BINARY. The server accepts, and responds with IAC DO BINARY. The client however, being naive and incomplete, doesn’t know if the server is acknowledging a prior request or initiating a new request. The client assumes it might be initiating a new request, and sends the appropriate response IAC WILL BINARY. The server, also naively written, believes the command to be a new request, and responds with IAC DO BINARY. The client and server are now sending back these two commands over and over, eating up bandwidth and not really accomplishing much.

For this reason, a complete TELNET implementation must track the state of each option for both the local end and the remote end. Each option has three states: enabled, disabled, or unknown. This can be implemented with two 256-element arrays containing an enum denoting the enabled/disabled/unknown state of the option. All elements in both arrays are initialized to unknown. An option with a value of unknown is effectively disabled, but there is more to it, and yes, the local option set also needs the unknown state. Say that you have a client talking to a buggy server that requests the BINARY option be enabled, but doesn’t actually support the option and gets into the infinite loop described above. The server sends the IAC DO BINARY sequence. Your improved, non-naive client looks at its local option array and sees that the BINARY option is set to unknown. The client now enabled the option and responds with IAC WILL BINARY. The buggy server responds with IAC DO BINARY. The client sees, however, that the option is already enabled. Therefor, it does not need to send a response. This effectively breaks the loop caused by the buggy server. Additionally, the client can look at the array of server options and effectively knows not to send a request or response to a server option that is already set to enabled or disabled.

Now, in practice, it is usually not necessary to use all those arrays of flags. Most MUD servers and clients do not, and they work just fine. This is particular because most of the options used in MUDs are “one way” options; that is, only one end of the connection ever requests them. A MUD client generally never ends an initial IAC DO ECHO (which would tell the server to echo everything the client sends back to the client), so when the client receives an IAC WILL ECHO it knows that the server is requesting to enable the option itself, and the command is not a response acknowledging that the client begin echoing data back to the server (which would pretty badly break things for the client). So long as the client never talks to a really broken server, a client could get by with just a handful of flags for the options it supports. The same goes for the server. Just be careful for any options that both the server and client use (like BINARY) to make sure you only respond when the other end is requesting the option, and not when the other end is acknowledging the option, and your application will work just fine so long as the other end isn’t totally broken.

There is a general rule that will help for implementations that don’t use the full 256-element arrays. When receiving a request to enable an option your application does not support, always refuse. When receiving a request to disable an option your application does not support, don’t respond at all.

Let’s take a brief look at that ECHO option. ECHO is option code 1. For MUDs, and in truth most TELNET applications, the server is the only end that ever performs echoing. If a client echoed back everything the server sent to it then it would probably result in another infinite loop. The server would say something, the client would send it back, the server would interpret that as a command and say something back (possibly just an “unknown command” error), and the client would echo that back to the server, which would interpret it as a command… bad stuff.

However, it’s generally pretty nice when the user types something in and whatever he typed shows up on his screen. A TELNET client will generally always prefer this, and by default it will print anything anything the user types on the user’s screen. A server will sometimes want to disable this, most commonly when it is requesting a password. However, TELNET has no option for “hide the user’s input.” Instead, we have to use a sneaky trick. If the server sends an IAC WILL ECHO, that means that it is willing to echo back everything the user types. Pretty much all clients will agree to this, and they will respond with a IAC DO ECHO. At this point, the client no longer prints the keys the user types in. The client is expecting the server to do this itself. However, nothing actually requires the server to do so. It could echo back the user’s input after transforming it (turning it into stars), echoing it verbatim, or just echoing nothing. When the server is finished retrieving the user’s password, it then tells the client that it no longer wants to echo by sending IAC WONT ECHO. The client then acknowledges this with IAC DONT ECHO, and will start displaying what the user types in again.

Note: the Windows TELNET client is notoriously broken with its handling ECHO. The client will gladly accept when the server sends IAC WILL ECHO, but when the server sends IAC WONT ECHO, the Windows client will not start echoing local characters any more. Also note that, unlike almost every other client, the Windows client only operates in character mode. That means that each character is sent to the server as it is typed, while most clients only send whole lines. There are ways to tell a client to go into character mode or into line mode, but the Windows client only supports character mode.

So, now you have option negotiation working, as well as 8-bit support with proper escaping. However, you’ve heard about this NAWS thing, which lets your client tell the server how big the display window is so that the server can do fancy layout. NAWS is option code 31. A server that wants window size information will send an IAC DO NAWS, and a client which supports it will respond with IAC WILL NAWS. But… now what?

Sub Options

Option negotation is only capable of enabling or disabling an option. However, some options, like NAWS, control features which need to be able to send more complex data using the protocol. The NAWS feature needs a way for the client to tell the server the number of rows and columns in the client’s display.

For features like these, TELNET uses the SB command, which is called a “sub option.” SB is code 250. This command is rather special. It starts with three bytes: IAC, SB, and then the option code, such as NAWS. It is then followed by an arbitrary number of bytes, which we’ll call the payload. End of the of the sub option is marked with the two byte sequence IAC SE. SE is code 240. So, what do those bytes between the initial three byte sequence and the ending two byte sequence mean? Well, it depends on the option.

NAWS send two 16-bit integers as its sub option payload. Each integer is in network byte order. The first integer is the number of columns (width), and the second integer is the number of rows (height). So, a client with 80 columns and 24 rows would, after the NAWS option has been enabled with option negotiation, send the byte sequence IAC SB NAWS 0 80 0 24 IAC SE.

One must be careful when writing code to handle sub options. A very large number of MUD servers and clients do not do this properly. Let us pretend, for a moment, that a user has some particularly large terminal… say, 255 columns and 61440 rows. The NAWS sub option byte sequence would be IAC SB NAWS 0 255 240 0 IAC SE. However, remember that IAC is 255 and SE is 240. That means that the bytes are equivalent to IAC SB NAWS 0 IAC SE 0 IAC SE. See the problem? Correlcty implemented software will parse that as a sub option with a single byte in its payload, followed by a zero byte and then an IAC SE sequence, which is illegal. Plus, the NAWS sub option would be the wrong size, which is also illegal. The correct thing for the client to do is to escapse that byte equal to 255 with a double IAC sequence, just like the \\ escape. So the correct thing for the client to send woul dbe IAC SB NAWS 0 IAC IAC 240 0 IAC SE. While it looks like the payload is 5 bytes, the server would convert the IAC IAC into a single byte equal to 255 in the buffer it stores the sub option payload in. However, many incorrectly written MUD servers do not do this; after receiving IAC SB NAWS, they then look for exactly 4 bytes for the payload (ignoring the values of those bytes, even if they contain 255), and then immediately expects IAC SE (sometimes they don’t even check that they actually get IAC SE, they simply read in two bytes and call it done). It is thus impossible to write a client that will be able to handle this situation both with correct servers (which require that the IAC be escaped) and incorrectly written servers (which require that the IAC not be escaped).

Fortunately, the scenario is rather unlikely to occur. There is little benefit in a client that display 61440 lines, even if your screen could somehow handle it. Furthermore, while the proper escaping of IAC bytes within a sub option payload is essential for some options, almost none of those are used in MUDs and thus they should never be sent to those poorly written MUD servers. However, if you’re writing new software, even for a MUD, it is a very good idea to correctly process all sub option commands. The correct way to handle NAWS is to use a buffer to read in the sub option payload (performing IAC escaping as you do so), and once the IAC SE is read, to then check that the payload buffer has exactly 4 bytes in it before processing the command.

Alright, so now you have all the low-level TELNET machinery working, and you’re even supporting cool things like window size notification. Now there’s that tricky deal with actually displaying and sending text properly. See, while I wasn’t lying when I said TELNET just sends raw text back and forth for input and output, there are a few tricks to how that text is interpreted, especially if you want to support fancy colors and stuff.

Newlines

Welcome to the shortest section of this article! TELNET newlines are expected to be the two-byte sequence CR LF. That’s byte values 10 and 13, or \r and \n. Just sending \n by itself or just \r or sending \n \r may cause some funny things to happen.

When reading and writing text files, a newline is usually represented by just a plain LF, or \n. Even on systems that store a CR LF sequence in text files, like Windows, the standard file I/O facilities will automatically translate back and forth between \n and \r \n when reading and writing text files. However, when you are displaying text to a terminal, even on systems like UNIX, the terminal might be in a mode in which a solo LF (line feed) only does what it was originally meant to: cause the cursor (the print head in old line printers) to move down a line, but not return to the start of the line. The CR (carriage return) character tells the cursor (or print head) to return to the beginning of the current line. So, in order to move down to the beginning of the next line, you’d need to send \r \n (or \n \r).

TELNET, being a protocol designed specifically for driving those old line printers, works the same way. Even on modern systems, many clients will treat a solo LF as just a line feed, and many servers will not recognize a solo \n as being the end of a command. So, if you’re writing a server and your output only uses a \n for newlines, be sure to translate those into \r \n (CR LF) when you send the data to the client. If you’re writing a client, be sure to send \r \n whenever the user hits enter.

Now, getting newlines to work properly is thrilling and all, but even more thrilling is that there’s not much more to say about TELNET itself. Sure, there are some extra commands to learn (the GA command used in half-duplex mode, which TELNET is in by default, can be handy to learn about, especially since MUDs use it for some fun tricks), and some other options that can be useful, but there’s no more actual protocol machinery to learn about. Now it’s on to colors and cursor control, which isn’t actually a part of TELNET at all.

ANSI Terminal Escapes

First, let’s look at terminal types. See, a long time ago (and, actually, right now, too) there were a bazillion different line printer and graphical terminal products on the market. Infuriatingly, pretty much every single one had its own proprietary protocol for controlling its special features. Even two terminals made by the same company would often have different (though usually similar) protocols for controlling color codes, cursor positioning, and other features. Even today, the text console on Linux uses a slightly different protocol than the text console on various other UNIX and UNIX-like operating systems, which themselves are all different from each other. Graphical terminal emulators, which is what most of us are using (and which includes your average MUD client), can have their own protocols, too. The modern xterm variation (xterm is a standard graphical terminal emulator for Linux/UNIX systems) is very slightly different than the popular terminal emulator I use on my Linux desktop, for example.

In order to properly handle all of these different terminals, a TELNET server would need to ask the client what kind of terminal they are using (yes, there is an option in TELNET for this), and then consult a library that maps common operations, like “clear the screen,” into the proper sequence of control codes for that particular terminal type. If you’re writing a real TELNET server, you’re going to need to get familiar with the termcap and/or terminfo libraries, as these provide those services.

MUD servers and clients, however, don’t need to care about such things. See, all modern terminal types, while they have slight differences, are based off of the ANSI terminal specification. This specification includes a number of common control codes, like setting terminal color, clearing the screen, or moving the drawing cursor to a specific position in the terminal window. A MUD server need only support these ANSI terminal codes, and can safely assume all clients support them as well. Most regular TELNET clients, running on modern terminal emulators, will do so. Most users of MUDs use a specialized MUD client which will interpret the control codes itself, and then translate those into whatever commands are appropriate for the display, so even a MUD client running on some non-standard terminal will still be compatible with a MUD server that only uses ANSI control codes. So, we’re just going to talk about ANSI control codes from now on.

Now that we’ve gone through three paragraphs of boring and mostly useless exposition, let’s get to the meat of things!

All control codes begin with the ESC character (\e, or 27). In general, the ESC character will be followed by a [ (left-hand square bracket), and then possibly by a command payload, followed by an ASCII letter denoting the actual escape command. For example, to clear the screen, you might use ESC [ 2 J ESC [ H. The J command does various screen-related actions, and the payload 2 is what tells the J command to clear the screen. That technically just clears the screen, though, leaving the cursor at whatever position on the screen it was already at. The H command tells the cursor to return to the upper left corner of the screen (Home).

Setting the color, and an assortment of other visual display settings, is done with the m command (Mode). The payload for the m command is one or more numeric values, separated by a comma. The value 0 means “reset the display settings to the default.” The value 31 means “set the text color to red.” So, to display the phrase “Red Baron” with the word “Red” in the color red and the rest in the default color, the server would send ESC [ 31 m R e d ESC [ 0 m _ B a r o n. (The _ represents a space.)

Remember that you can include multiple values in your payload for the m command. If you want to display something in green (code 32) and wanted to make sure that all otehr display mode settings, like background colors, were disabled, you would sent ESC [ 0 ; 3 2 m.

You can set the cursor position using that H command we saw before. Simply provide the row and column, separated by a semi-colon, in the payload. So, to move the cursor to the second row at column 20, send ESC [ 2 ; 2 0 H.

That’s pretty much the gist of ANSI control codes. You can find a fairly complete list of codes here.

Remember that not all commands have a [ after the ESC character. A decent strategy for parsing these control codes on the client end is to look at the first character after each ESC. If the character is a [ then keep buffering input until a letter character is received, then process the buffer. If the character after the ESC is not an [, then immediately process the command.

For MUD servers, it is a good idea to also include a basic ANSI control code parser, solely for the purpose of stripping such codes out of input sent by users. While you’re at it, be sure to strip out lone CR or LF characters not part of a newline, the BEP character (\a), and other character codes. Imagine a user who sends a command line to your server with “say ” followed by a couple dozen BEP characters in it - every other player in the room will be treated to a long series of annoying beeps (if their client is supports it, which some do). You can just strip out every non-printable character, which is any code less than 32 (or just is the C isprint macro from ctype.h). On a similar note, remember to always escape IAC bytes in your output that aren’t meant to be a part of a TELNET command, otherwise malicious users might find interesting ways to break other users’ clients using commands that let them send text to other players.

And that’s all, folks!

Syndicated 2007-12-10 04:48:22 from Sean Middleditch

Squirrel Wants To Be Lua

Squirrel is another language I took a look at this morning. Squirrel is essentially an offshoot of Lua, being written by a games developer who was dissatisfied with some of Lua’s shortcomings in older (pre-5.1) Lua releases.

The biggest change one will notice between current Lua and Squirrel is that Squirrel has a built-in class mechanism. Unfortunately, the class system is single-inheritance only with no mixins or interface support, so developing larger applications would not be overly easy to do with Squirrel. This is probably just fine given that Squirrel seems geared more towards embedding than application authoring, just like it’s conceptual ancestor, Lua.

I think the language comparison page for Squirrel (which only compares against Lua) best explains Squirrel. I quote:

Lua has an established and growing set of 3rd party libraries. That’s the biggest problem with Lua spin-offs : you trade in compatibility with everything written for Lua (see http://lua-users.org/wiki/LibrariesAndBindings) for some syntactic sugar and a feature or two that will be implemented in some future Lua version anyway…

True enough, while Lua might not be intended for developing complete applications, it’s got enough addons and extensions to make it possible. Squirrel is severely lacking in such things. There isn’t any standard way to do networking, for example, which is a big requirement for this project. I could write a core C module that embeds Squirrel and adds the extra routines I need, but then I might as well just use a more well known language like JavaScript on SpiderMonkey, or just embed Lua.

One thing I do however very much like about Squirrel over both JavaScript and Lua is that variables aren’t automatically declared in the global namespace. With either JS or Lua, if you mistype a variable name in an assignment, you not only silently get a new variable, but it’s a global variable. Yuck! In Squirrel, assigning to an undeclared variable results in an error.

If you’re looking for a Lua-like runtime to embed with a syntax closer to JavaScript or C++, check Squirrel out, it’s just what you’re looking for. If that syntax isn’t important to you though, just use Lua instead.

Syndicated 2007-12-05 22:26:25 from Sean Middleditch

Io Language

Next up on my the language tour is Io, a tiny interpreted pure-OO language. Io is small, really small. Lua is a bit larger, actually.

Io has some admirable design goals. From the front page of the Io website:

Io is a small, prototype-based programming language. The ideas in Io are mostly inspired by Smalltalk (all values are objects), Self (prototype-based), NewtonScript (differential inheritance), Act1 (actors and futures for concurrency), LISP (code is a runtime inspectable/modifiable tree) and Lua (small, embeddable).

I can definitely feel the impact of those languages on the design of Io, Smalltalk especially. Everything is an object and all operations are simply messages (methods) sent to objects. Then syntax is just a little foreign for a C weenie like me, but it’s not too far out there, and is something I could get used to quite quickly; it’s not anywhere as foreign feeling as Objective-C, which I feel is a disgusting monstrosity of language design gone wrong. (Seriously, use SmallTalk, or use C. Don’t even get me started on Objective-C++. That language is God’s punishment for the Sins of Mankind.)

I downloaded and built Io, and started working at getting a sample project up and running. Io is a minimal language, so addons are necessary for a lot of things, such as networking. That’s where I hit the snag - the Sockets addon appears to be wholly undocumented on the Io website, and looking at the list of available methods is leaving some questions. Searching around on the net for examples isn’t bringing much up. Io is not really in use for any large production apps yet, so it’s still got a lot of rough edges in the documentation and examples areas.

Like I said with Pike, life is too short to deal with that sort of thing. I’d love to give Io a spin, but not for the project I’m on now. I’ve bookmarked it and plan on taking another look at it in 6-12 months on my next project. Maybe it’ll be ready for some serious use then; the development is active and the community seems fairly healthy, so I expect it’ll grow up pretty quickly.

Syndicated 2007-12-05 21:04:01 from Sean Middleditch

JewelScript Is No Jewel In The Rough

I decided to look up some non-mainstream languages before moving on the list of Big Popular languages I wanted to try for this new project. I took a look at a few that were outright unsuited, and then found and spent some time looking at JewelScript. Like Pike, JewelScript is an interpreted OO language with a syntax very reminiscent of C/C++.

JewelScript has a lot of nice features. The syntax is familiar, but the addition of coroutines and a ‘var’ type in addition to the static typing make certain classes of application a lot easier to write than C++ does. Unfortunately, JewelScript also seems to be so heavily based on C++ that some of the painful parts of C++ programming are firmly a part of JewelScript programming.

The biggest turn of here is the reference system. In JewelScript, all variables are copy-by-value, just like C++. If you want two variables to refer to the same object, you must declare one of the variables as an explicit reference to the other. That’s not so bad, really, until you get to function arguments. Just like in C++, you end up having to declare many function arguments as references solely for performance reasons, and not because the argument actually needs reference semantics. Also just like C++, that can result in programming errors, so JewelScript has a const reference type, which is a slightly different set of semantics but at least allows you to get decent performance without opening yourself up to programming mistakes.

Really, though, if I felt like declaring 90% of my function parameters with a logically unnecessary const and an equally unnecessary & just to work around the performance problems of the language design, I’d have stuck with C++. JewelScript could potentially fix this behavior with the simple addition of copy-on-write behavior for objects and other “fat” datatypes. That gives the programmer the full performance benefits of a const reference without the overhead of manually declaring const references when copy-by-value semantics were what they wanted in the first place.

JewelScript also lacks a comprehensive standard libary. That’s fine in many respects, but that coupled with a too-C++-like language design make it a poor choice for my project. However, anyone looking for a language to embed in C++ that offers a very familiar syntax, JewelScript might be just what you’re looking for.

Syndicated 2007-12-05 18:18:08 from Sean Middleditch

Passing on Pike

I’m starting a new project, and I decided to give the Pike language a try. It looks like a nice language for an old C/C++ hold out like me. Statically typed but still pretty flexible, very C++-ish in syntax, has a decently sized standard library, and not too slow for an interpreted language. Bonus points for having implementations of a ton of application network protocols in the standard library, including the ones I needed.

Sadly, it just isn’t meant for me. The language debugging facilities are atrocious. If you thought C++ template instantiation errors were hell, you’ll not be too pleased with the average Pike backtrace or compilation error. It gives way too much information about things that don’t matter and nothing useful on the actual error itself. For example, if you pass the wrong argument type to a function, you’d expect something like “Argument 2 (client) expects string, got int.” Instead, you get a huge line detailing the entire signature of the function, and then a second line detailing the entire signature of the function call, leaving you to scan through and find the differences.

That wasn’t going to sour the deal for me, though. I’m used to C++, so huge and nearly useless error messages are something I can deal with. Forging on, I found some oddities in the standard library that are just not working out well for me. For example, the String type includes a trim_all_whites function. Why isn’t this trim? Extra typing is half the reason i wanted to avoid using C++ itself. The HTTP implementation forces a ton of extra string copies all over the place. The TELNET implementation is one of the most akward protocol handler classes I’ve ever seen, plus it seems to be rather buggy. These are all relatively minor things. Silly function names I can learn to live with, and it’s not like I’m not up to writing an HTTP or TELNET protocol handler that more closely meets my needs.

The real kicker, however, are the total lack of certain features… or possibly just the lack of documentation on using those features. The official Pike documentation is almost entirely lacking in examples, many functions and classes are undocumented (some of which have a nice Fixme comment in the docs, while others are just blank), and I simply can’t figure out how to do some things that I’d really expect out of a language like Pike. I’m fairly sure Pike can do them, I just can’t figure out how.

Life is too short to spend a ton of time trying to figure out undocumented features of a language, so I’m passing on Pike for now. It’s a shame, because I like the Pike language itself, I just am not willing to deal with a idiosyncratic and partially undocumented standard library if I don’t have to.

There are some other languages that are on my list of Things To Try, so I’ll report back on those when I get the chance to play with them a bit.

Syndicated 2007-12-05 07:50:50 from Sean Middleditch

Irritating Java Environment

Debian/Ubuntu has what I think is pretty dumb Java environment.

Basically, jar files are not automatically found in /usr/lib/java/ and JNI libraries are not automatically found in /usr/lib/jni, requiring you to create a goofy little shell script for every Java app you write that sets these things if you need them. Any Java app that uses external JAR files or JNI files (.e.g., SWT) is instantly made non-portable by the fact that you have to set weird system-specific path settings instead of just being able to run java -jar myapp.jar.

The justification for this seems to be, “well, users might have multiple JVMs, and /usr/bin/java alternative might not be set to the most complete/featureful one, and since we only support software packaged officially for Debian**, we just recommend that packagers include scripts that set the specific JVM and classpath and so on they need, and never ever use the essentially useless /usr/bin/java command.”

Here’s an idea: make /usr/bin/java a system wrapper around the chosen alternative that automatically sets things up so the required JAR files located in the manifest of apps are found without the Debian-specific paths and so that the library search path is set so the Debian-specific /usr/lib/jni path is used for loading JNI shared objects. Then shit will actually work. For the users who set their java alternative to point to some incomplete or non-functional JVM, tell them to kiss your ass and install a JVM that will actually work.

** And this, folks, is still the #1 usability killer in Linux. If it isn’t part of the pre-selected set of almost certainly out of date software packages shipped by the specific version of the specific distribution you’re running, the software is a complete and total bitch to install and use, even when that software happens to be something designed from the ground-up to be portable between distros (or even OSes) in binary format. Packaging systems, for all their benefits, are to many non-technical users just one gigantic artificial barrier to ease of use. The Microsoft software installation model, for all its flaws, actually freaking works when it comes time to install something released after the OS install CD you have was shipped. Linux is the easiest OS in the world to use, so long as you only use it for the things the distro package set says you can.

Syndicated 2007-12-03 03:50:07 from Sean Middleditch

Security Hole of the Day

So a major games site many of us geeks might frequent has a fun security hole. I couldn’t remember the login for my account, but whenever I failed entering the right email and password combo, I noticed it set a ?login=false query parameter in the resulting URL. Sure enough, changing the false to true results in my being logged in, with a user that has a blank name (”Welcome, !”) and no email.

The worst part is, I have no way to login other than using said hole, since there is no “forgot password” link or any other way that I can possibly figure out to get into my account, so I had to make a new one (which is free, just validates email). I suppose it’s not that serious of a problem since user accounts really don’t do anything critical other provide marketing details to the company and allow forum posting, but I’m still pretty sure that they don’t want people bypassing their “subscriber only content” restrictions.

The site isn’t even written in PHP, I think. The URLs don’t give any indication of the language, but I vaguely recall seeing ASP-ish traceback errors a few months ago when something else was broken on the site. The Good Samaritan part of me wants to kindly inform the site operators of their blunder, but given the lawsuit happy and technologically ignorant business types running things in many companies, I’d probably just get a felony charge for “hacking” for my effort. :/ So instead I just hope they realize it on their own, fix it, and maybe add in that “forgot password” link in the process - I liked my old username a lot better than my new one.

It could be worse, I suppose. One of the sites I got to clean up last year, aside from its bazillion other horrendously broken design points, with its code comments all written in (broken) Portugese, did the classic ?admin=true authentication check. At least it wasn’t a JavaScript routine with the username and password stored in the HTML.

Syndicated 2007-12-01 05:29:17 from Sean Middleditch

Apartment Found, FLOSS Work, Language Design, Rambling

Apartment hunting is over already.

Looks like I’ll be living in Aspen Chase, off Golfside between Clark and Washtenaw, right next to WCC and US-23 and I-94, plus right near all the cool shopping places and restaurants and Ann Arbor.

I move in at the end of the month. Thinking about throwing an apartment warming / alcohol cabinet stocking party sometime in early-mid January.

So, now I need to ratchet up how much or work (or where I work). I’m also really interested in getting back into FLOSS work. I’m digging through some Ghostscript bounties (two birds, one stone) and seeing if there’s anything a newcomer to the project without much 2D compositing experience can tackle. A few looks like applicable.

After that, I’m unsure. My three biggest favorite things to work on are games, low-level infrastructure and language tools, and usability. FLOSS games are not much interesting to me; I’m not sure why, but for some reason Open Source just doesn’t seem to be working so hot for game projects as it does for everything else. Possibly because artists/designers/musicians aren’t as into the Give It Away For Free thing as programmers are.

So, for low-level stuff, I was at first thinking X and drivers. Then I’m thinking that that tends to be a pain without a second set of hardware, plus the chance of breaking hardware (hopefully rare, but still a possibility, as I’ve heard), and I don’t really have the cash for spare hardware, graphics cards, etc. and I need this one machine to continue working perfectly so my regular job. So maybe that’ll be an option down the road when I have more spare cash.

That pretty much leaves general desktop app work, or work on lower-level desktop code like HAL or D-BUS. Now, while I’m a GNOME fan of their desktop design, I actually really dislike their underlying frameworks. I mean, OO in C certainly works, but… damn is it ugly. Writing desktop software is a very high-level thing to do, and really would be better with a high-level language. Sadly, C# is dead for political reasons, C++ isn’t really all that great (but it certainly blows C out of the water - compare the pleasure that is the Qt API to the glib/gtk API), Java might very well become a good choice soon what with it being Free and IcedTea coming along, but then I’m not a huge fan of Java (C# is Java “done right,” but see afore-mentioned political issues), and so on. Vala looks like a fun projects (language design, low-level framework… my favorite areas) so I might look into that very soon. I’m specially not fond of how it just translates to C (there’s several very good reasons why C++ no longer does that), so maybe giving it an LLVM backend would be spiff, plus I’ve really been wanting to play with LLVM anyway. Actually, working on the clang frontend for LLVM is another option.

There is then always the part of me that just wants to do something new and exciting, but that’s… difficult. Not so much in writing it, but finding something new and exciting and actually worthwhile. I mean, doing all the web work I do, I’d love to have a langauge dedicated solely to doing web work. PHP, Java, C#, Ruby, Perl… all of these are extremely general-purpose languages that have libraries for working on the web, but they still make things more complicated than you really need. (Ruby on Rails does purportedly make things very easy, but then, you’re not so much coding Ruby as you are in a specialized dialect built on top of Ruby - plus, having hacked on the Ruby interpreter in years past, I’m not a fan of the underlying technology, unless Matz and co have done some serious work on it in the last few years… maybe I should take a look.) Really, 90% of what a web app does it spit out HTML and run SQL queries. Those two things should be SUPER easy and the easiest way to do them should also be both the most efficient and the most secure way to do them. Just makes sense. I have ideas on how to do this, so it’s tempting to write mod_languagethatdoesnotsucklikephp… but that gets back to whether the project would really get used much and really be worthwhile or just be yet another niche language used by three people in tiny projects nobody’s ever heard of.

I’m equally tempted to do a more low-level language. D is a neat language, but the design is a little… fluid. Plus its standard library sucks, and of the two competing projects to write a new one, both feature new ways of sucking as well as little chance of ever actually being “standard” (not that anything in D is standard, since its just a dump of whatever features the lead developers think is cool at the moment). I like C, I really do, and I really hate the way that C# and Java force OOP down your throat even for things that aren’t best modeled by OOP, or for things where their object model is not quite the best fit. It would be nice to do C with an enhanced type system that makes OOP possible and easy, but makes other styles of OOP also easy, as well as providing much better high-level data structures than the way C++ does things. Basically, I’d like a language that has high-level features, but also allows low-level programming, unlike Java or C# which put everything on their custom managed runtime. To be completely honest, neither Java nor C# really helps all that much with being portable except for trivial programs, and the security benefits of managed runtimes aren’t nearly as useful as advertised except for applet-like situations (seriously, it’s not really that much harder to write secure code in C than in any managed language, from Python to C# - buffer overflows and other memory-address-based attacks are less likely, but that’s hardly the sole kind of security hole around). But still, there are a bazillion “a better C/C++” projects out there, and even if I do make The Best(tm), how useful is that really going to be in the grand scheme of things?

It thus seems best to focus on something that people will actually use, instead of yet another quasi-academic intellectual-masturbation sort of project. GNOME and LLVM are my top choices. LLVM is a little more up my alley, but GNOME work can be fun. It’s been years since my last patch to GNOME, too. Maybe it’s time to rectify that sad fact.

Not seeing any of that likely until January, though - need to earn some raw cash now and get ready to move into said new apartment in a month. My rent will be going up by $325/month, plus I won’t be splitting utilities or Internet anymore. Yay fun.

Syndicated 2007-11-30 22:23:53 from Sean Middleditch

Apartment Hunting

My lease is up in a few months, and combined with some general life-style differences with my roommate (who is not a bad guy in general, we just have some vastly different habits), I’ve decided it’s time to start looking for a new place to live.

I’ve decided to try moving out on my own, which will be a first for me. I lived with my parents, then with a roommate, so having a place all to myself will be quite new. I have a feeling I will get massively bored very quickly.

Granted, the apartment I’m currently aiming for is also in the middle of Eastern Michigan University campus, and we’ve already met a number of attractive young ladies around my age living in the area, so maybe I won’t be _that_ bored. :)

The real kicker though is going to be rent and utilities. My current apartment is only $550/month, and I only pay half of that. I also only pay half of the utilities and Internet access. Anyone who’s done a move-out knows that reducing the number of people living in an apartment does not result in an equivalent reduction in utility costs, either. When I moved out of my parents’ house, their utilities bills actually went up - you’d figure the removal of 3-4 computers would have brought a signficant drop.

So, I’m looking to be spending some $550-$650 per month in rent, plus around $100/month in utilities, plus another $30-$50 in Internet access, plus I won’t be able to bum groceries off my roommate anymore. This is going to get expensive, quick. I’m going to need to work on my finance management quite a bit, if this year is any indication.

Being self-employed (although I’m really only working for one guy, I do so as a sub-contractor, not an actual employee) I have to pay the employer’s share of taxes, as well as find my own health care, and I don’t get any 401k or anything. This year I made $14k more than last year. However, for the last two years running, I managed to increase my total savings and stocks by $6k and $4k, but this year I managed to increase it only by a grand total of $361.00, and I’m actually going to be short in my taxes fund come Winter tax time, so that’ll turn to a negative in about a month. I am somewhat boggled as to how I managed to spend around $18k more this year than last year. I mean, a few years of that kind of money saved could buy me a decent house. In cash.

Somewhere in the back of my head I know where all the money went, but trying to put it into a list of things I can cut is being rather challenging. Rent and groceries ate up about $6k of it. The trip to Japan last Spring ate up another $2k. The damn Wii and Wii games probably took up another $1k, and movies and books ate up another $2k. The rest probably went to eating out so often, since I only know how to cook one meal and, while it is my favorite, it’s not something I really want to eat every day, or even every week for that matter.

So, for the upcoming year, aside from finding more/better work, I need to cut back my spending. And that’s after taking into account the fact I’ll be paying more than twice in rent and utilities soon. I need to start cooking at home more often, that’s way cheaper than eating out at expensive restaurants every morning, lunch, and dinner. That should be relatively easy to start doing. I need to spend less on entertainment, which I guess I could replace by getting involved in Open Source stuff again, or maybe just hanging out with people more often doing things that don’t require buying stuff. Maybe I could stop bitching about FLOSS software issues and start working on it more often again. (I started writing a patch for PCRE for a bug I filed, but of course I picked something that requires extensive internal changes to the system, and isn’t really a best first patch for someone who’s never worked on the codebase before - why do I always bite off more than I can chew? Oh well, at least I’m learning how PCRE works very, very quickly.) I’m planning another overseas trip, so that expensive will be back, but this time I can hopefully not spend nearly as much.

Unfortunately, looking at it, all of the “low hanging fruit” in my budget is going to add up to about the same amount as the additional rent and utilities I’m going to be paying. At best, unless I start making more money, I’m going to be stuck at the end of next year with no additional cash in savings. And that won’t work for me - I really wanted to have enough for a down payment on a house by this time, but I missed that mark by about $5k. Which I could have easily saved up in 10 months in years prior. :/

Making more money is something that should be easy, even in my current job. I just have to work more hours. They’re available to me. I just… well, as yesterday’s whine-fest indicated, working more hours at that job is likely going to lead to brain hemmhorging or something. I need to start pulling in additional jobs, or get one of the half-dozen “cool idea” projects I have off the ground. Sadly, most of those projects involve needing a lot of help. My roommate has a very nifty idea that could easily be making hundreds of thousands of dollars a year, but we need artists (our last one spontaneously joined the navy without telling us), and we need time to get this up and running - it’s going to be a massive web (ugh) project. Nothing nearly as complicated as the work we’ve done in the past, and totally within our abilities, but still something that will take a month or two of non-paid dedicated work on both our parts to get up, fully featured, bug-free, and ready for customers. Maybe I should look into getting investors? My friend Scott has told me that it’s not nearly as complicated as it sounds, so long as you know how to pitch it (and I don’t).

Still, at the very least, having my own place to myself will reduce at least some of my stress (no more crappy repetitive rock music we’ve both heard 1,000,000 times playing from the moment my roommate gets home until 1am in the morning, for example) and just make me feel a bit better about my place in life (my current apartment complex is skanky hooker central, and I am NOT just being colorful with my language).

Plus, you know, being somewhere that I might meet girls that aren’t over 40 and having sex with strangers for a living could improve the quality of my life and stress levels quite a bit. It’s been waaaaay too long since I’ve dated last. Not so much as a single date since Laura and I split up, actually, but then, when I live where I’m at and I’m not in school, it’s not like it was really likely I’d ever meet anyone to date. I suppose there are always SCA chicks (”If you can’t get laid in the SCA, you just don’t want any”), but given what I’ve met there… pass. Standards and morals are such a bother sometimes. :) [I am sure there are some very nice girls in the SCA who are available and my age - I just haven’t met any of them yet.]

Actually, that reminds me of my other huge expense: the SCA. As soon as Yurii actually gets back to me (it’s been over a month since I last heard from him… getting worried), I have $1400 waiting to send him for the last few bits of essential armor I need to have a complete combat kit. $1400. On top of the $800 or so I’ve already spent on armor, plus the gas and hotel rooms and such for trips to SCA events (only been to two, unfortunately), plus the money I need to spend soon on clothing and such… not a cheap hobby. But it is fun, and the people are some of the best I’ve ever met. Five years of playing Kanar (a LARP), I met a handful of kick-ass people, but most of them were just selfish/skanky/jerks. Not all of them (those Kanar friends of mine reading this - obviously you’re in the Kick-Ass People group, otherwise you wouldn’t be a friend :p ), but most of them just aren’t even remotely the kind of people I’d want to spend time with. The people I’ve met in the SCA on the other hand… there’ve been a few jerks I’ve run into, but most of these people are just awesome. I do wish I could find a way to hang out with some of those cool Kanar people _without_ having to go to Kanar, though. I feel a bit sad that I’m pretty sure I won’t see some of those people again, unless I maybe run into them at Grace’s or if I actually stop being a loser and go to one of Craig’s get-togethers (which I feel really bad for never going to, because I really do want to, I like Craig and his friends quite a bit). :/

I’m rambling. Bah. Off to work.

Syndicated 2007-11-29 17:20:10 from Sean Middleditch

353 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!