Older blog entries for slamb (starting at number 36)

richdawe:

No POSIX system supports non-blocking disk I/O through the O_NONBLOCK interface. Some support it through a separate asynchronous I/O interface but due to its complexity and non-portability, few programs actually use it. Also, it doesn't support an async open(2).

djb argues, like you do, that they should support this through a more normal mechanism.

Some software uses elaborate workarounds - pools of threads that exist just to open files, mmap(2) them (for zero-copy), use mincore(2) to determine if they are in memory, and do reads on them if not. All that work so the main thread doesn't have to block. If the system you/djb propose existed, it would be unnecessary. (There would also have to be a decent sendfile(2) to use zero-copy, though.)

framework

...is moving along. This is my embedded-code-in-XML files thing. It's like JSP except that it's tweaked for XML - programs pump out SAX streams rather than text. More correct XML (checks validity at compile-time, avoids escaping issues like XSS attacks), better performance (can save unnecessary serialization/parsing steps), etc.

My big news is that I'm doing this as an Independent Programming Project now for 3 semester-hours. So rather than interfere with my coursework, it is my coursework. So I'm able to devote more time to it.

I've already factored out a lot of the Java-producing code into a separate class based on an abstract one. The idea is that I'll soon also have implementations for Python (jython) and BeanShell.

The two things holding me back are:

  • Whitespace in <xfp:logic> blocks. It's horribly messed up, so Python will choke on it. This is the easy one; it's only because of laziness I haven't fixed it yet.
  • The "logicsheets". I stole this idea from Apache Cocoon and now I want my money back; it sucks. You can create libraries for XSPs (their format) or XFPs (my format) through XSLT files called "logicsheets" that are applied to the X[SF]P before compilation. It's a really easy extension mechanism to implement, which is why I stole it. (And presumably why they came up with it.) They can insert logic blocks or whatever. But...

    • They mess with whitespace (Python problems again)
    • They're language-specific (need separate versions for Python, Java, BeanShell, whatever)
    • They're horribly complicated to make (bizarre <xslt:template> rules dealing with literal and non-literal versions of the same thing)
    • Pumping the page first through XSLT means you get no org.xml.sax.Locator. So you can't find the position in the original file in case you have an error to report. Makes debugging an X[SF]P not fun.
    • They just plain suck

    So I'm going back to something more similar to JSP's tag libraries. I half-designed something that lets them have their own stack (to communicate with something like TagSupport.findAncestorWithClass) and a stack to modify the ContentHandler for snaring body content as arguments. But I realized this gives no way to pass a non-string argument to them; that's no good. Need to tweak the design.

I'm making baby steps toward having a JUnit test suite. But nothing that tests the meat of the code yet.

Modifying the syntax. For now, I support {}-bracketed Java expressions in attributes like XSLT. This saves from doing the whole <xfp:attribute> thing when you just want <a href="thread.do?id={threadId}">. I may go all-out and support the terse JSP 2.0 Expression Language syntax in the Java version; there's an easy API for it. In the Python and BeanShell versions, I think I'll stick instead with one language.

I'm also going to increase the performance. For now, it creates (and GCs) a lot of objects on each hit. org.xml.sax.AttributeImpls, StringBuffers, Strings, char[] arrays. There's no reason except laziness. I can make the char[] arrays once and reference them instead; I just need to maintain an additional mapping during the compile. And I can reuse StringBuffers and org.xml.sax.AttributeImpls. Maybe these are what are causing the horrible performance problems I've noticed. framework really should be fast; it's a good design, I swear.

Eventually I'll write documentation. Answering such important questions as :

  • Why would I use framework?
  • How would I use framework?
  • Why was framework designed as it is?

but for now, any spectators will have to be content with my short webpage and notes in diary entries like this.

richdawe

Interesting. djb describes that problem:

The problem with O_NONBLOCK is that it's too heavy: it hits an entire ofile, not just an fd. For example, suppose I set O_NONBLOCK on a pty on stdout, and another program has the same stdout. That other program then has O_NONBLOCK set too. Many programs can't handle O_NONBLOCK and will break horribly in this situation.

This is the first time I've actually seen it come up.

Open-source cell phone UI

In my last entry, I mentioned the idea of making a J2ME-compliant cell phone "game" that is actually a better UI to the phone. The idea is dead, I'm afraid. It seems phones do not really provide an API to the phone bits. Which makes sense, I suppose - a malicious program could run up a huge bill. So I've just looked around a bit more and discovered that the LG phones have relatively good UIs. Not as good as I'd like but far better than the rest. I've bought a LG VX3200, with Verizon service. It works fairly well.

OpenSSL gotchas

A while ago, I complained a lot about OpenSSL. I've softened on it, but anyone uses it should know about some counterintuitive things it does by default and how to turn them off. I do this at SSL_CTX initialization:

    /*
     * Enable partial writes.
     * Otherwise, SSL_write may return error with part of the data written; it
     * will then expect the next call to have identical arguments (it will
     * resume the write). That's completely against our calling conventions
     * for Stream::write.
     */
    SSL_CTX_set_options(me, SSL_MODE_ENABLE_PARTIAL_WRITE);

/* * Enable auto-retry. * Otherwise, SSL_read and SSL_write may return WANT_xxx in blocking mode. * We don't throw IOBlockErrors when blocking; that's just weird. */ SSL_CTX_set_options(me, SSL_MODE_AUTO_RETRY);

/* * Tell SSL to read always read as much into its buffer as possible to * avoid unnecessary system calls. */ SSL_CTX_set_read_ahead(me, 1);

Also, very important: if you use it in multiple threads, you're expected to supply locking callbacks. Otherwise, it is not threadsafe! You need to use CRYPTO_set_locking_callback and CRYPTO_set_id_callback. You should also check how this interacts with whatever SSL-using libraries you may be using: libpq, neon, etc. And each thread should perform ERR_remove_state(0) at thread exit time.

Lastly, if you're doing non-blocking IO, you have to realize that SSL_write can return with an SSL_ERROR_WANT_READ and SSL_read can return with an SSL_ERROR_WANT_WRITE. That's because it can renegotiate at any time, so a logical read operation can involve an actual write operation, or vice versa. You have to take care to add the SSL descriptor to the correct FD_SET in select (or whatever mechanism you use).

Programming

...goes slowly. Life interferes, and I haven't been feeling motivated. Maybe things will pick up soon.

15 Jul 2004 (updated 15 Jul 2004 at 20:46 UTC) »
Open-source cell phone UI

Today I got an Audiovox CDM8410, after my Nokia 3360 had problems with the LCD. I'm disappointed in its user interface:

  • It's hard to read the important information off the display. They've got a small clock on a multicolored background, a tiny signal strength indicator, and a tiny battery life indicator. My old Nokia 3360 has these in huge black letters on a white background. Consequently, the new one is unreadable without the backlight on. The old one's backlight is for dark rooms only.
  • You can put five phone numbers under one name...but they have to be labelled "General", "Mobile", "Office", "Pager", and "Fax". I want custom names like "Mom's House", "Apartment", "Hospital". I have friends with three jobs. Which one gets the "office" label? So I'm stuck with listing out all of the phone numbers individually.
  • The phone book's find feature sucks. I have 80+ numbers in my phone. Being able to get to a number quickly is the phone's most important UI task, and they screwed it up! (And this phone is the best of the three I examined, by far!) I picked it because you could get to Find with one button press. But I missed that it's a substring match, not a prefix one. (If I type in "C", I want all the names that start with "C". Not everyone who has a "C" anywhere in their name; combined with the above, this means a search for "C" shows everyone with a cell phone. Not useful.) And it has a delay before it actually starts searching. And if you backspace or hit an invalid character by mistake, it gets out of "Find" completely. Frustrating.

The old phone's interface was actually a little better. Cell phones are getting worse...the designers are too busy focusing on digital cameras, color displays, Bluetooth, games, and other crap to design phones. My new phone might not be the best available, but I think they all have these essential problems.

I might be able to sneak a good UI onto a cell phone. They sell them with J2ME and such for games now. If I take this back and exchange it for such a phone, I might be able to pull this off. Here goes:

Write a J2ME "game" called GoodUI. It would be an open-source interface to the cell phone. A main screen that shows the clock, antenna status, and battery status in huge letters. A phone book that doesn't suck. It just depends on having a decent API to the phone. In addition to needing enough storage space for its own address book, my game would need access to the real phone bits. It'd make calls, receive calls, get the antenna status, and get the battery status. Store a call log, etc. You'd have to play the game every time you start your cell phone, but after that you'd have a real UI.

21 May 2004 (updated 24 May 2004 at 09:12 UTC) »

Scott's Laws of Email Filtering. Amendments, anyone?

  1. Don't ever silently send email to /dev/null.

    (For bram, among others.) If you mess up your rules, you could drop important emails. You'll be unhappy, the sender will be unhappy, everyone will be unhappy. If you drop an email, let the sender know.

  2. Never send a bounce or reply to a suspected worm/virus email.

    The reason is simple: the address is usually forged. So if it is indeed a worm or virus sending it out, your bounce will go to some poor person whose mailbox is probably overflowing with these messages.

    Together with #1, the only option this leaves is to send a 5XX (permanent) rejection during the SMTP transaction. If it's a worm, it'll be connected directly to your SMTP server, so it will swallow the rejection and the address's real owner is never bothered. If it is a real person, their end of the SMTP connection will pop up a dialog box (if its a mail client connected directly) or generate a bounce (if it's an intermediate SMTP server). They know and can resend.

    The ones that are especially annoying to me are the challenge/response replies with two choices: "I sent this message and it's not spam" or "I never sent this message". I consider this quite rude. They're expecting me to filter their mail for them every time a mail is sent, instead of spending a little bit of their time to set up rules that would save me the trouble most of the time. (Of course, I always hit the button that will send the junk mail on to them; I'm not a turn-the-other-cheek sort of guy.)

  3. If you send automated email replies or forwards, filter likely worm/virus emails first.

    ...and then follow #1 and #2 with those messages, of course. Same reason as before.

  4. Keep your rejection messages polite.

    If an actual person is reading the rejection, you probably wanted to get the email. Be polite and tell the person how to get around the block.

If you follow these rules, you can be more aggressive in your filtering without too many consequences. For example:

  • I reject all messages with a subject line of "Thank you!". I'd like to think people might legitimately want to thank me from time to time. Perhaps they'd say so in the subject line. But the worst thing that happens is they get a polite reply asking them to change the subject line and resend. With a copy of the message included, should they not have saved it. It wastes a little bit of their time, but this is rare enough that I don't feel so bad about it.
  • I reject almost any MIME attachment that could be Windows malware. Even .zip files now. I politely ask them to arrange with me another way to send it. (Sending binaries through email isn't such a good idea anyway; it's indirect, and base64 bloats files by a third.)

richdawe:

mailman is the wrong layer to solve that. SMTP servers (Postfix, sendmail, qmail) can do some filtering based on the From: address. Mine is set up so that it rejects anything claiming to be from slamb.org unless sent from localhost or with SASL authentication. If gnu.org did the same, it would know the email from libtool@gnu.org is bogus.

And SPF, which you mentioned, could do the same across mailservers. I don't see a need for a mailing list-specific variant.

12 May 2004 (updated 12 May 2004 at 18:15 UTC) »
Nafai77

What you're missing is boost::shared_ptr<> and related classes. In my not-so-humble opinion, everyone who uses C++ should be using these classes. They prevent problems with exception safety. This code:

Object *temporary = new Object();
if (foo) {
    bar(); // throws an exception on error
}
delete temporary;

...leaks memory when bar() throws. If I had used a smart pointer (perhaps boost::scoped_ptr<Object> temporary(new Object());), the delete would happen for me whenever I leave this scope, regardless of how. Less code, less possibility of error.

Using objects on the stack for their constructors/destructors like this is called Resources Acquisition Is Initialization (RAII), the guard pattern, or the monitor pattern. You can use it for acquiring/releasing mutexes, printing to debug logs when going into/out of a scope, etc. It's simple and powerful.

11 May 2004 (updated 11 May 2004 at 19:54 UTC) »

I wrote more sigsafe assembly and released 0.1.3 today. (Nine supported systems now. Only six more to go before I run out of machines I have access to.) I'm having fun pretending to be a low-level coder. Some musings...

Delay slots

sparc and PA-RISC have delay slots. They're a consequence of pipelining: the processors is fetching the next instruction while executing the previous one. When jumping, some processors throw this instruction out, since the jump changes its idea of the next instruction. Sparc doesn't. The next position after the jump is called a delay slot. The instruction is already in the pipeline, so it's just executed anyway. For unconditional jumps (or calls or returns), no real problem. For conditional ones, you've got this weird instruction beyond the jump that's executed in both branches.

There seem to be three approaches to dealing with this:

  • use a NOP - easy but inefficient. You're doing nothing useful in either case. The processor might as well have dumped the pipeline.
  • move an instruction from earlier to immediately after the branch. This is the most efficient approach, but it only works if that instruction wasn't necessary to decide whether the branch should be taken.
  • do something useful to one branch (preferably the common one) that isn't harmful to the other. (Like moving something into a register one needs and the other doesn't care about. Or using a trap-if-equal after a branch-if-not-equal: the instruction is a NOP in one case but not the other.)

In my code, I have a mix of all three. I'm perversely proud of the one where I have a delay slot that I also sometimes jump into. The assembler thought that was a mistake; I wrote in a .empty directive to tell it that I know what I'm doing and have analyzed all the different ways to arrive at that instruction. I felt smart when it actually worked.

It seems like delay slots are most efficient when both branches are equally likely. If one is much more likely, you can more often do something useful if you know that instruction will only happen in that branch. I think they also don't help when you've got a ridiculous 20-stage pipeline (Pentium IV). Clearly having a 20-instruction delay slot would end in tears.

sparc also has this weird "BN" - branch never - instruction. It's a NOP with a delay slot, I guess. I don't see the point; they might as well have a "COME FROM" instruction.

I guess every instruction set has its oddities. In PPC code, you can enforce ordering by sprinking "EIEIO"s throughout the code. (Old McDonald had a LWZ...EIEIO.) They've got some ridiculous reverse-engineered acronym expansion for it, like "Enforce In-Order Execution of IO."

ia64

Writing my ia64 code was a mind-bending experience. In the end, it wasn't as different from the other platforms as I'd hoped, though. My code is probably less than perfect, but I think ia64's instruction set just isn't optimized for this sort of thing. It's got some weird things I couldn't really take advantage of:

  • they have a lot of registers. 128 general registers (64-bits each, plus a NaT bit I'll describe below). Nice if you need them, but my code doesn't.
  • they have two (in the current system) sets of three different instruction units. The instructions are written in bundles of three (for different types of execution units) and then broader groups. All the instructions in the group have to be independent of each other. If you've got several things going on, you can keep all the execution units busy. They've got examples in the documentation where they're executing several independent iterations of a loop at once. But if you've just got one thing to do which depends on earlier steps, it doesn't help. I end up feeling guilty about groups that do nothing but load a single value from memory, even though I think it's unavoidable.
  • They have "speculative loads" to avoid making the processor wait for the (comparitively slow) memory system. You tell it "I might need this value in a few cycles" with a ld.s and it attempts to load it. If it is paged out or the machine's not in a good mood, it won't be loaded. Instead, it will set a NaT (not a thing) value associated with that word. Later, you do a chk.s to say "I really need that value now" and it will jump to "recovery code" you specify if the NaT value is set. (Typically, the recovery code does a non-speculative load and jumps back.) But I think at most I could do a load one cycle early with this, at the expense of code size/complexity. Pass.
  • They also have predicated instructions - every instruction is associated with a predicate bit, which says if it should actually be executed or not. So you can follow two different paths in the code without jumping. I'm at least taking advantage of that, sort of.

There are some other weird things, too. It seems like ia64 is optimized for doing heavy-duty computation directly within loops. They can't be in a separate function, or you can't take advantage of the EPIC features well. (Unless it gets inlined, of course.) And you either need a really smart person hand-writing the assembly or a very smart optimizing compiler. (And gcc seems to not qualify.) So it seems like those of us writing mundane code with every-day compilers get left behind. Also, stuff like the optimal number of simultaneous loop iterations is defined by the memory latency and number of execution units, so you would need multiple compiles for different Itanium machines. I could see a really smart JIT having a field day, but otherwise optimal code will never happen. Maybe this is why people call it the Itanic. Lots of complex features no one will ever really use fully.

sigsafe

While I should be doing homework, I've been porting sigsafe to more platforms. Now it basically works on Linux/x86, Linux/alpha, Tru64/alpha, and Darwin/ppc. The surprising thing is that seems to be the easy part, architecture/OS-specific assembly and all. What I don't have working on most of those platforms is the race condition checker. They all behave differently:

  • Linux/x86 works fine.
  • Linux/alpha works...on a uniprocessor machine. On SMP, the traced process dumps core on SIGTRAP occasionally. Very rare vs. the number of steps, but I do enough steps that it happens most times you run it. It seems my tested code is race-free, but the race checker is not. I think the process is with the kernel's ptrace system call. I'll have to see about reporting it, but it might be a known problem, as the machines I have access to are a few releases behind. I don't have root access to try newer kernels.
  • Darwin/ppc doesn't work. The ptrace system call seems to be totally broken. They've got a Mach tracing API also, but I don't understand it yet.
  • Tru64/alpha doesn't work. Dies on SIGTRAP. But consistently; no race condition. Probably my code's fault.
  • FreeBSD/x86 won't compile my code; no CLD_xxx constants for siginfo_t's si_code. But I think I don't need them; I can get everything from waitpid with some reworking.

The good news is that the race condition checker isn't strictly necessary. I can accomplish the same thing with gdb or dbx. I'm not going to try inserting a signal at every possible instruction boundary like the automated system would, but I know where the critical points tend to be. If I see that it does the right things at _sigsafe_XXX_minjmp minus one instruction, _sigsafe_XXX_minjmp, _sigsafe_XXX_maxjmp, and _sigsafe_XXX_maxjmp+1, I can be pretty confident in its correctness.

The FreeBSD/x86 port...might work, race condition checker aside. I tried using it with -pthread but when read started returning EWOULDBLOCK I remembered that this is a user-mode threading library. That's no good. I tested with LinuxThreads on 4.X; it also failed. LinuxThread's sigaction is returning -1/EINVAL for some reason. I didn't try 5.X with the new KSE-based threading library, since I don't have a 5.X machine. I released a new version with the FreeBSD/x86 support in the hopes of luring someone to test it with a newer system. No takers so far. But my friend who runs an ISP downstairs said he has some spare machines; I'll install 5.X on one of them. I bet it will work; it's not so different from Linux/x86.

Still have Solaris/sparc, Solaris/x86, HP-UX/PA-RISC, Linux/ia64, and HP-UX/ia64 to try.

I won't do NetBSD or OpenBSD for a while. They still have only user-mode threads. I'd have to go deep into their code to implement sigsafe, and when I'd get done, user-level threading would still suck. So I'll just wait them out. NetBSD has real pthreads in -CURRENT. (I could also do a single-threaded implementation. But my code is thread-happy, so it wouldn't be much good to me.)

6 Apr 2004 (updated 6 Apr 2004 at 09:10 UTC) »

After a lot of frustration in school, I need motivation. I'm trying to think of the ways I've actually managed to apply my education to programming. (Question: how often do you really put your formal CS education (not software engineering skills) to use? What have you managed to do with it?)

Here's a simple one: it's made me understand limiting cases. I blame learning proofs by induction in Discrete Structures and looking at the extremes throughout two years of physics classes. A couple (overly) simple examples to show what I'm talking about:

  • I used to be quite sloppy about comparisons with intervals: [a, b] vs. [a, b) vs. (a, b] vs. (a, b). I just take a moment now every time and really don't have off-by-one errors anymore. [1] In fact, it's hard for me to understand how this used to be a problem.
  • I write better contracts for my functions, so there's not as many branches to screw up in the calling code. Like my Timespan and Time types have real infinity values now, with the expected behavor of comparisons/addition/etc. Now Condition::wait(Time) waits forever on Condition::wait(Time::tINFINITY). If I'd chosen 0 (a horrible but common choice), I'd have to special-case that in anything that wants to know if I'd return for sure before a certain time. In general, I think when there's a lot of "... except when ..., then ..." in the contract, the designer didn't think well enough about limiting cases.

Another way my education has helped: I can at least understand papers like Processing XML Streams with Deterministic Automata. I want to start reading more papers like that. Eventually, I hope be the one writing them. That one shows off well what a scientist can do instead of a tinkerer. So much code dealing with XML is slow, and they've managed to find a way to use certain XPath expressions that's fast on streaming data, and they proved it. [2]

I think I also need to make sure I get enough out of my CS degree that I can continue learning throughout my career. I know about so many resources to learn software engineering: syntax, APIs, testing, documentation, revision control, new languages, etc. I've been learning those on my own forever. I can't say the same for CS.

[1] - Okay, one exception: if my definitions are just wrong. To me a "termination date" was the first day someone didn't work for us. Thus they worked for us on [started, terminated). I wrote the whole personnel system this way before we realized my definition was different than everyone else's. Oops. At least I was internally consistent.

[2] - Well, the actual proof isn't in there, just a description of what it proves and the importance. Need to scour their site more for the actual proof, as the references in the paper are broken. (Proof in [?]; great.)

3 Apr 2004 (updated 3 Apr 2004 at 16:46 UTC) »

I'm doing a lot of atoms++ design work now. If you're not interested in my musings on designing a fairly broad C++ library, just skip this whole entry.

Non-blocking IO...

...is hard. I'm working on a simple example now - a proxy between two TCP streams. Code here. It's not even close:

  • It never times out connections. I need to add timeouts to the pollers and push it through the right levels of code.
  • It's single-threaded. I want to make it fit into the ThreadPool class based on Jeff Darcy's notes on non-blocking server design. Then it will scale with the number of processors. A bit silly for a server as non-CPU intensive as this one, but it will be a good proof of concept.
  • It can't handle edge-driven polling methods yet, which are supposedly superior.
  • It doesn't flush the other stream's input buffer when one stream is closed, whether it's on failure or success.
  • It can block on close, since it doesn't do the shutdown(2) dance: shutdown the read half, wait for opposite to do likewise (or give up eventually and set SO_LINGER to 0) and then close.

If I'm going to make non-blocking IO practical in atoms++, I need an API that makes all of these things relatively easy. proxy.cc should still be readable when I'm done. If it's not, I've failed as an API designer.

Unicode strings

I want good Unicode support in my code, which I think means:

  1. storing stuff in UTF-8. That's the only Unicode encoding that has decent backward compatibility, so I will frequently need to convert otherwise. There are supposed disadvantages to this approach, but I don't see them:
    • Variable-width characters are awkward to deal with. AFAIK, they only really cause problems when trying mutable operations on a character-by-character basis. I'll just have operations like tolower() operate on entire strings as in Java.
    • UTF-8 is half as efficient than UTF-16 for storing Japanese. Well, it's twice as efficient for English, and that's what I speak. Seriously, this isn't a consideration either way - I don't understand having so much text in memory or on disk that it would be significant. Don't these people have images? audio files? movies?
  2. verifying the UTF-8 as it comes in from the disk or network. Otherwise, there are different possible encodings for '/' and such, which causes security problems.
  3. providing bi-directional character iterators. Some code like boost::regex wants these. (This isn't so hard with UTF-8, as long as they're read-only. Which they should be anyway - see SGI's reasons for hating std::string.)
  4. supporting the collation algorithms, character classes, etc. This is hard code (or at least incredibly complex code), so I won't be implementing it myself. (Don't think the standard C++ stuff guarantees a version of this available for UTF-8. Though I'll be checking that out more to be sure.)
  5. providing a good formatting mechanism. I want at least a type-safe printf which supports argument reordering. boost::format does this much, though it only works for std::basic_string<> (doh!). Ideally, it would also:
    • support more thorough stuff like ICU's formatting: message formatting (a la printf), choice formatting ("are none", "is one", "are N"), number/decimal formatting, date/time formatting, etc. But maybe I can borrow stuff from ncm's locale facets in the C++ standard, even using UTF-8 without std::string. Have to look into it more.
    • allow me to defer picking a locale until the last possible second. I'd like to have locales associated with streams. _("msg") would return a LocalizableString that has a UString getLocalized(Locale) and sendLocalized(Locale,Stream). Likewise for the formatter. Nothing knows what the final language is until it's actually sent to a stream. So servers could support well people who speak different languages from each other.
    • for efficiency, can dump its output directly to a Buffer and return the length (in bytes, not characters) it added. (The count lets me do something like Java's writeUTF efficiently if I reserve some space ahead to fill in the length. Not netstrings quite as efficiently, unfortunately. The byte count is variable width (no leading 0's allowed!), so I'll need to memmove() it over in the most general case.)

std::string is out. It's useless for UTF-8, as far as I can tell. I think I'm going to end up basing it on glib. glib does so much I don't need, but it beats out ICU in that:

  • it's already on a lot of systems
  • I'll possibly use it anyway for my XML support, as libxml and libxslt are glib-based. (Xerces-C and Xalan-C don't seem to be as broadly used.)
  • the internal store is UTF-8. ICU uses UTF-16.
  • the API for this stuff is dirt simple.

I'll have to implement much of #5 on my own, I think. Even if I do use the locale facets, that doesn't do the printf()-like stuff, and there will be a lot of glue code to write. This is one area where ICU would be better. But I still think glib's the better choice.

Update: murrayc mentioned Glib::ustring. I'd already seen it (and it is nice), but he also pointed out that as of version 2.4, gtkmm was split into glibmm and gtkmm so this class doesn't require GTK+. So I might use it - Glibmm isn't so huge to require once I already have glib, even though the C++ wrapper class isn't too hard. If it did the formatting for me, too, it'd be a no-brainer.

27 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!