Older blog entries for raph (starting at number 338)

Urgh, haven't updated in a while. Last weekend, we went to a Quaker retreat at the beautiful Ben Lomond Quaker Center, and then the next few days an Artifex staff meeting.

RH 9, fonts

My laptop's hard drive failed (another quality product from IBM :). This time around, I decided to install RH 9 from scratch on the new drive (it was Debian before).

So far, I like it. I miss apt-get, but more stuff seems to just work. Also, the antialiased fonts are a nice big jump. I am sad that support for subpixel positioning isn't there yet, though. In general, you can get away with integer quantization on the widths when you're doing imprecise text layout (GUI labels and HTML rendering, as opposed to, say, PDF), but there are still definitely cases where the spacing gets wonky.

As far as I know, there is only one text rendering engine that does antialiasing, hinting (specifically, contrast enhancement by subtly altering the position of stems), and subpixel positioning: Acrobat (Reader) 5. Mac OS X does AA and subpixel, while RH 9 (by means of FreeType 2) does AA and hinting. I'm looking forward to the first free software implementation of all three.

At the staff meeting, we've decided not to move forward with our funded project to integrate FreeType as the font renderer for Ghostscript, rather concentrating on improving the existing font renderer. I'd still like to see the FT integration happen, though. The best outcome, I think, would be to recruit a volunteer from the free software community to take over this project.

Fansubs

I've discovered anime fansubs. These are basically Japanese anime shows, with English language subtitles added, then encoded (usually to MPEG-4) and distributed over the Internet. Their legal status is murky at best, but a sane code of ethics prevails: fansubbers release shows that have not been licensed to English-speaking markets. Under this code, everybody wins. Copyright owners of shows don't lose revenue directly, because there isn't any from those shows. Indeed, it's likely that popularity of the fansubs fuels interest in official licensing. And, of course, viewers win because of access to great shows like Hikaru no Go, which would otherwise not be available, or only at great difficulty. Alan has started watching Naruto (I still read the subtitles aloud to him, but I'm sure his reading speed will catch up soon), and enjoys the insight into Japanese culture as well as the Ninja-themed action-adventure storyline.

The best of the fansubbers do really good work on the translation, subtitling, and other stuff; arguably much better than many "official" versions. I think Hollywood could learn much from their example.

People leaving

I've been feeling a bit down the past few days. The departure of some good people from Advogato is probably a factor.

I want this to be a good site, and bring people together as part of a community. I know I can't please everybody all the time. But I am wondering if there are some basic things I can do to make this a more congenial place.

First, I deliberately chose a light touch for applying the trust metrics. They're basically opt-in, especially the more recent diary ratings. I had thought that users of this site would have a fairly thick skin, and that simply giving people the tools to filter out stuff they didn't want to see would be sufficient. But perhaps that assumption isn't right. Maybe the default should be to present the recentlog with a trust metric computed relative to a "seed", so that most people wouldn't see low-ranked entries unless they deliberately chose to seek them out.

I've been thinking of doing something like that for an RSS feed of the recentlog anyway, as there aren't good client-side tools for filtering those. So the question is: would stronger filtering bring back the people who left? Is this an important goal, in any case?

There's always been a tension between people wanting to find their own blog hosting, or have them hosted here. As a blog host, this site has been fairly minimalist, although I can definitely see adding in the really important features over time. But perhaps it's more distributed, more Web-like, for each person to be responsible for their own blog hosting, and use other tools to integrate blogs from disparate servers. In the meantime, I think our recentlog provides a useful and interesting mix of individual postings and communal discussions.

Inline functions

After some more thinking, I don't really like the DEF_INLINE macro I wrote about last time. The simplest approach, I think, is to define the "inline" keyword to that of the compiler, or, if the compiler simply doesn't support inlining, then to the empty string, so that each .c file that includes the .h with the inline functions gets its own static copy. An interesting question is: are there any compilers in widespread use today which do not support inlining? Certainly none of the ones I use.

Inline functions in C

I was looking again at the question of how to do inline functions in C. There's a relatively recent page by one Richard Kettlewell, but it doesn't quite explain how to do inline functions really portably, in particular so that code will work correctly on compilers which don't support inline functions.

I think this goal is achievable without too much pain. Here's a sketch.

1. Guess conservatively whether the compiler has a keyword for inline, and define it to be "inline". Here's a sketch:

#if defined(__GNUC__)
#define inline __inline__
#elif defined(_MSC_VER)
#define inline __inline
#elif !defined(__STDC_VERSION__) || __STDC_VERSION__ < 199901L)
#define DISABLE_INLINING
#define inline
#endif

An enhancement of this code would be to detect the cases where autoconf is able to detect the compiler's support for inline and report it in a "config.h" file or the like. See, though this mail for a cautionary note.

2. Set up a DEF_INLINE macro. This macro makes an inline definition when available, otherwise a standard function declaration:

#ifdef DISABLE_INLINING
#define DEF_INLINE_BODY(proto, body) proto { body }
#define DEF_INLINE(proto, body) proto;
#else
#define DEF_INLINE_BODY(proto, body) static inline proto { body }
#define DEF_INLINE DEF_INLINE_BODY
#endif

3. Then, in your .h files, define your inline functions as follows:

DEF_INLINE( void unref(obj *o),
    if (--o->refcnt == 0) destroy(o);
)

4. Now for the slightly tricky part. In one .c file, include the following fragment:

#undef DEF_INLINE
#define DEF_INLINE DEF_INLINE_BODY
#include "yourfile.h"

A possible enhancement is to set DEF_INLINE back to its old value, but if yourfile.h is the last included file, it's just more code.

This approach has the advantage that you can disable inlining with a simple switch. This might be handy for debugging (for example, setting breakpoints in the inlined function), and also to measure the performance improvement actually achieved.

The drawback, of course, is the use of the C preprocessor. It's too easy to try to create your own language using C macros, and often a bad idea.

If you only care about GCC, MSVC, and C99 compliant compilers, you can get by with just step 1. In fact, it'll probably still work on older compilers, but with the downside of replicating the "inlined" functions in every .c file that includes them.

It's a shame that such a simple and valuable feature is such a mess when it comes to standards and actual implementations. Ah well.

RSS export

Advogato diaries now have RSS feeds. Here's mine. I've checked it with some validation services, but don't really know how well it works.

The DRM struggle

The recent Xbox hack provides further evidence for a widely held belief in hacker circles: real DRM is technologically impossible, at least without huge improvements in the ability to produce bug-free software. Zooko writes passionately that the bad guys may well be winning anyway.

Indeed, the important distinction here is whether freely accessing digital content is convenient or merely possible. If a DRM scheme makes access inconvenient for most people, then it has largely succeeded in its economic goals.

Technologists, and free software hackers in particular, should be careful not to underestimate the importance of convenience. Over the next few years, I think that one of the most compelling applications for free software is a media player that just works, especially without stumbling over harebrained DRM schemes. The underlying technology is mostly here now, including the ability to efficiently move bulk files around over consumer Internet connections. But unless it's easy enough for your mom to use, it won't have much impact.

Essence of XML

I got my POPL proceedings a few days ago, and found Jerome Simeon and Phil Wadler's The Essence of XML to be one of the most interesting papers. He writes:

So, the essence of XML is this: the problem it solves is not hard, and it does not solve the problem well.

In particular, the paper is about type systems for XML, which include DTD's (widely recognized as underpowered), XML Schema, and other proposals. The paper goes into XML Schema in some detail. A central observation is that, while XML Schema obviously intended to provide for unambiguous schemas, it failed to acheive this goal.

The W3C produces mediocre standards. Not so bad as to be unworkable, but certainly not crisp and beautiful either. In many ways, this is better than anarchy, because at least they are standards, and there is a ton of code out there to deal with them. Lisp S-expressions may be prettier, but there are still plenty of details you have to nail down for compatibility, including choice of charset, dealing with string quoting, and so on.

mod_virgule

By popular demand, I've started hacking up RSS export for diaries. It's a little harder than I thought it would be. There are two bits to get past to make it validate, then perhaps some impedance mismatch. One of the bits is conversion from ISO-8601 dates (which is what mod_virgule uses internally) to RFC 822. Another is conversion of relative URL's within diary entries to absolute. Both of these are SMOP's.

The impedance mismatch is that Advogato diary entries don't have a designated title field, and the "description" can be very long. Many people (myself included) follow the convention of titles in <b> tags, but it seems dangerous to rely on this for structural information.

Probably the thing to do is just export the full entry as the RSS <description> for now, and gradually move to the option of more structural markup. I've been wanting to do something similar to provide summaries of mid-rated diaries in the recentlog anyway.

Thanks to dyork's reminder, I've updated the Wiki intermap. Also, I see that Gary did some code towards displaying multiple entries from a single poster in the recentlog. Hopefully, we'll be able to get at least the minimal amount of maintenance done soon.

Life

The last couple of weeks have been really hard on my productivity. I feel like I've been getting behind on a bunch of things, including design and coding work on Fitz, the IJS 1.0 spec, a command-line version of the trust metric, and other things.

I'm feeling a bit more productive now and hope to catch up over the next few weeks.

Proofs

During times of stress, I find it comforting to muse on proofs. The idea of mathematical certainty, is soothing to me.

Much of my thinking is directed towards a scheme for portable and modular proofs. For one, there are many different axiom systems, of various strengths. Most proof systems simply choose one. The problem with this is that proofs can be ported to a system with a stronger axiom system, but not in general to a weaker one.

Further, if you have a minimalist set of axioms (such as second order arithmetic or Zermelo-Fraenkel set theory), then you want to construct a rich library of objects (many flavors of numbers, sequences, functions, etc.) on top of it. In many cases, there will be more than one viable construction (for example, reals can be infinite binary expansions, Dedekind cuts, or the Harrison's clever HOL construction). Proofs shouldn't depend on the details of the construction used. A proof over the reals should go through exactly the same no matter which construction of the reals undergirds it.

So I've been thinking along the ideas of modules and interfaces. The axioms of complex arithmetic would be one example of an interface. A proof over complex numbers imports this interface. A module representing a construction of complex numbers would import the HOL primitive axiom interface and export the complex number interface.

Each module can be checked individually, to make sure that the exports are justified in terms of the imports. Then, you can check a whole pile of modules, by instantiating the abstract interfaces in each "import" and "export" with concrete replacements. For example, the abstract addition function is replaced with a definition appropriate to the chosen construction. The whole thing checks if each import (after instantiation) is satisfied by a matching export (again, after instantiation) from a previous module (ie, no import cycles allowed).

Thus, you could fairly easily check a proof over complex numbers in any one of the axiom schemes powerful enough to represent them. Just supply the construction appropriate to the primitive axioms.

Not all proofs will check in all axiom systems, of course. In general, a proof module should be conservative in what it imports, so that it will check in the largest range of axiom systems. This principle also ensures that proofs can be ported to the widest range of other systems.

I hope to write about these ideas in more detail, including why it's important. It's obvious to me, but other people seem to need convincing. That sounds quite a bit like Ivan Sutherland's recipe for successful research: do something you think is easy, but everybody else thinks is hard.

28 Mar 2003 (updated 28 Mar 2003 at 06:37 UTC) »
Other blog : Notes on the "Saddam prepares to flee to Syria" hoax
Mandrake 9.1 torrent

If you want the Mandrake 9.1 ISOs:

Joining the torrent is especially appreciated if you have good bandwidth and are not behind a NAT. In the early phases of the torrent, downloads will be a little slow (20kB/s), but it should pick up in a couple of hours. If you can leave your BitTorrent application open even after the download is complete, that would help even more.

Feel free to spread the word; the more people who join the torrent, the better it goes.

This can be seen as a trial run for the RedHat 9 ISO release.

RH 9 torrents

Red Hat has announced that ISO's will be available to paying customers on March 31, and on their public FTP server a week later. I consider this a fabulous opportunity to bring BitTorrent to the public attention by showing what a good job it can do with the hosting. Relying on public mirrors will be frustrating, tedious, and probably slow. BitTorrent can deliver excellent latency, bandwidth, and reliability. Are you interested in helping Bram set this up?

MPEG2 interlacing

graydon wrote me with a link to the MPEG2 work of Billy Biggs. His observations match what I've seen as well.

Billy observes that the interlacing codes on DVD's often don't seem to make much sense. In particular, source frames sometimes get sliced so that a single frame on the DVD interleaves fields from two frames. The main point of the RFF flag is to give enough slack to the encoder so that source frames boundaries and DVD frame boundaries line up.

There are two ways to look at the RFF flag. You could consider it a form of semantic information, identifying for each "picture" (meaning field or frame) whether it's interlaced or not, and if telecined, where the frame boundaries are. Alternatively, you can see the MPEG2 sequence on a DVD as nothing more than a compressed NTSC video source, with 2 fields in each frame, 29.997 fps. In the latter view, what the RFF flag buys you is a better compression rate. Duplicated fields need only get encoded once, and you don't have to DCT-encode frames with lots of high-spatial-frequency interlace patterns. Both help quite a bit.

Yet, DVD's have enough bits on them that most movies don't need to squeeze every last drop out of the compression. Thus, I'd guess that a lot of DVD's get encoded using heuristics to guess the RFF flags. So what if the heuristic gets a few frames wrong? It still plays fine, and hardly makes a dent in the overall compression ratio.

The problem, of course, is when people use the RFF flags for something else other than plain NTSC out. Examples include progressive-scan TV's (becoming popular now), playback to computer monitors, and of course transcoding. There, incorrect RFF flags can cause serious artifacts. Even so, since most DVD's get them mostly right, it's probably reasonable to use them even in these applications.

However, free tools (at least the ones I've seen) don't even do a reasonable job coping with mixed interlacing patterns.

  • transcode, in its default mode, assumes a frame rate of 23.976 fps, and, if the source exceeds that framerate, it drops frames. With incorrect RFF on input, result is motion artifacts.

  • mpeg2dec, in most modes, simply ignores the RFF flag info. Similarly, the "object-y" libvo API has no way to get at the RFF flags. The new "state machine-y" API lets you get at them from the info->display_picture structure.

  • The yuv4mpeg format has no way to represent RFF flag info on the decoded sequence.

  • mpeg2enc, the tool used for most MPEG2 encoding, can only encode at the framerate, or with 3:2 pulldown. It cannot generate a sequence with arbitrary RFF flags.

I have hacked up these tools to provide reasonable pulldown when transcoding to SVCD. I instrumented mpeg2dec to output a file with one byte per frame, containing the RFF and TFF flags. Then, I hacked up mpeg2enc to get its RFF and TFF flags from this file, rather than cycling RFF on:off:on:off as is the standard behavior when the -p (--3-2-pulldown) option is set. The resulting files have good A/V sync and no motion artifacts, but the resulting setup is awkward at best, and when the source contains long runs of 29.997 fps frames, mplex complains of underruns. I set the (compile-time) option to ignore these, though, and the DVD player seems to handle them just fine.

The tools need to get fixed. I'm posting this largely to encourage that. However, it's not easy to just fix the code, as the real problem is in the interface between encoder, decoder, and intermediate processing. These tend to be all separate processes, connected with pipes. If that's going to continue, then the tools need to agree on a way to get pulldown flags into the yuv4mpeg format. The other reasonable approach is to try to knit the modules together as shared libraries rather than pipes. That seems to be the approach taken by OpenShiiva.

Other blog: Indymedia, news.google.com, Balochistan post, Reaching out

Home media

One of the big, huge potential killer apps for free software is to run home media centers, such as the ones that VIA is pushing with their C3 chip. With good support for non-DRM audio and video, and good p2p networking (such as BitTorrent), such a system could be overwhelmingly better than the crippled alternatives put out by mainstream corporations.

To this end, OpenShiiva looks particularly interesting. I haven't tried it yet, but it looks like it's addressing both quality and UI. The free MPEG2 tools I've played with so far have serious deficiencies in both departments.

One specific problem is that no free MPEG2 encoder I've seen can handle video sequences with mixed 29.997 fps and 24 fps 3:2 pulldown. The MPEG2 spec allows such mixing freely, through the "repeat first field" flag, which is independently settable for each frame. If it toggles on:off:on:off, it's 3:2 pulldown. If it's always off, it's 29.997 fps. Many DVD's mix the two, for example splicing a video-source animated logo on the front of a 24 fps movie.

Part of the problem is that yuv4mpeg format (used as the input to mpeg2enc) loses the RFF flag information. Thus, as you dig into the source code to tools such as transcode and mencoder, you tend to see a lot of crude hacks to work around the problem.

I've hacked up my local version of mpeg2enc to preserve the RFF flags from the source stream, with good results, but unfortunately the patches aren't general enough for production use; among other things, mplex'ing the resulting stream can result in underruns depending on the exact frame rate.

I have some notes on the pulldown issue that I'm planning on publishing as a Web page. Are there any MPEG2 encoder hackers who care?

Being nice

Zaitcev: passions are running high, on all sides. I ask you (and all Advogato posters) to be respectful of other people's opinions. There is no question that Bush is responsible for death and destruction on a large scale. Whether it's legal according to international law is one question. Whether it improves the situation for the Iraqi people is another (I honestly hope it does). Reasonable people can, and do, differ on these questions.

329 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!