Older blog entries for rmathew (starting at number 157)

ECJ for GCJ: Still in limbo
It has been almost a month since Tom formally proposed integrating ECJ in GCJ to the GCC Steering Committee (SC). There has been no word from the SC yet on this request. However, the SC did ask the GCC developers to avoid gratuitously including source code from external projects in GCC. One consequence of this for GCJ was the removal of fastjar from the GCC source tree. I'm not sure if the SC's decision was coincidental or in fact a result of deliberations triggered by Tom's request.

More Front Ends in GCC
One of the great advantages of structuring a compiler such that the front-end, the middle-end and the back-end are relatively independent is that if you write M front-ends and have N back-ends, you get M*N compilers "for free" assuming you have a good enough intermediate representation in the middle-end. This idea was discussed as far back as the 1950s and UNCOL was an ambitious effort towards this goal. GCC is a stellar example of such a compiler - it supports C, C++, Java, Ada, etc. "out-of-the-box" and can target a whole bunch of platforms. You implement a language front-end for GCC and you immediately have a compiler for that language for a whole lot of platforms; you implement a target back-end for GCC and you immediately have compilers for several languages for that platform. Of course, this is grossly oversimplified, since you have to usually port the language runtime to a platform too or since your language might strain the GCC intermediate representation or expose latent bugs in the middle-end making the effort rather difficult. But the overall idea still remains valid.

The GNU Pascal Compiler (GPC) guys recently proposed an integration of GPC with GCC (in the same source repository, but on a different branch - weird). Some day, the GCC Scheme Compiler (GSC) guys, the PL/I for GCC guys, etc. might also want to integrate their front-ends with GCC. Having more front-ends in the GCC source tree itself means that middle-end changes do not inadvertently break these front-ends, latent middle-end bugs and unwarranted assumptions are exposed, general GCC enhancements are automatically applied, etc. So it's a good thing for GCC, in a way.

However, I personally think it is not a good idea. The GCC mainline is already quite bloated with a number of languages and runtimes and building all of the languages and their runtime libraries (thank you Sun for regularly increasing the bloat in the "standard" Java runtime with every release of the JDK) takes quite a while even on a decent system. Having more languages and their runtimes within GCC will only exacerbate this issue. I personally also feel (though I have no real practical experience in this area) that it does not let the optimisers make assumptions that they can use to perform stronger optimisations. A recurring problem in this area is the folding of constants, where languages like Java specify a bit too much with respect to what can be folded and how it should be folded.

On a slightly different note, the GSC guys have also created a "Hello World" front-end for GCC that shows you how to build a front-end for GCC for your favourite language.

On an entirely different note, I have ended up writing 3,000 lines of text in the user manual of a 4,000 line programme (both rough "wc -l" figures)! Either the manual is unnecessarily verbose or the programme is too complex.

Tatjana van Vark
Tatjana van Vark looks like an amazing Dutch inventor and machinist. Just look at her works and you'll know what I mean. My favourites were the oscilloscope that she created when she was just 14 years old and an Enigma-like "Coding Machine". I just wish they had put up some more information about the devices than just the pictures. Another of those humbling experiences for yours truly.
27 Mar 2006 (updated 27 Mar 2006 at 07:20 UTC) »
GNU Texinfo
I wanted to write the user manual for a small personal project that I have been working on in my free time. I wanted the user manual for the project to be available in both HTML as well as PDF and also look good in either case. I considered both GNU Texinfo as well as DocBook for this purpose, but settled for Texinfo simply because it is installed by default on almost all Linux systems and since GCC and many other Free Software projects use it for their documentation. This way, I can easily contribute to the GCC/GCJ documentation without having to learn a new documentation system, should I wake up one morning with the sudden urge to do so.

Texinfo proved very simple to learn and produces fairly good looking HTML and PDF files (although some people prefer texi2html to "makeinfo --html" for HTML output). It can also output DocBook XML files, though I don't know how good the output is since I don't know the DocBook system yet. I am very happy with the tool so far. I haven't learnt a whole lot of Texinfo yet, but since when has that stopped me from making a fool of myself?

There are still some warts that I see with the Texinfo system though:

  • Info is a nice format/tool and I use it a lot under Linux, but you have to go through so many unnecessary hoops in Texinfo to properly support it. Why do you have to explicitly declare nodes and menus? Why can't Texinfo automatically derive these from the chapters and sections in the document in case they haven't been specified explicitly?

  • Creating an index and bibliography is unnecessarily painful. LaTeX has a far better support for these things via auxiliary tools.

  • As with TeX/LaTeX, inserting images is so painful. It could be one of the reasons why so many Free Software manuals do not bother to include figures at all.

  • Texinfo ostensibly focusses on content rather than presentation, but many presentation-related tags and conventions creep in.

  • Support for mathematical symbols is rather weak. Things look good only in the TeX output. The HTML output should be using MathML instead of just showing the text as-is. I don't particularly like MathML since it makes writing even simple things so tedious (TeX is so much better at this), but it's still a standard, as unfortunate as that situation might be.

  • A lot of things work well only for English documents and it does not seem well-suited to writing documents in other languages. As an aside, I personally cringe when I have to write tags spelt assuming American English (as with HTML, Java, etc.) not British English.

These rants aside, I am still sticking with Texinfo for the documentation for my little projects, though for "paper-like" stuff, I'm going to prefer LaTeX.

Miscellaneous
Steve Yegge is now on Blogger for those of you who can't seem to have enough of his rants.

Ranjit Madampath pointed me to a rather hilarious entry on Frameworks in the Joel on Software discussion group.

Planet Scheme used to be available as planet-scheme.yi.org, but it seems to be dead now. I used to like reading the aggregated weblogs of a lot of smart Scheme hackers, the weblog of José Antonio Ortega Ruiz in particular.

Assembly Language
saju's first post made me recall some of the things I miss in C which were so simple in x86 assembly language. For example, while doing fixed-point arithmetic with 32-bit operands, it was immensely useful that the CPU could hold the result of a multiplication (using MUL) in the EDX:EAX 64-bit register combination without overflow and that the same register combination could be used in a following division (using DIV), neatly separating the quotient and the remainder. The ADC ("Add with Carry") instruction was similarly useful for neatly handling overflows in addition. I don't know if you can achieve this in C without resorting to inline assembly. Readers of Michael Abrash's "Graphics Programming Black Book" and Oldskool PC coders will immediately realise what I'm talking about.

Miscellaneous
Firefox leads to a breakup. I don't know whether I should feel sorry for the bloke who was dumped or the lady who had to change her email address possibly after being bombarded with tonnes of silly emails. I do know that I found this bug report rather funny.

As I had feared, I performed miserably in the qualifying round of the Google Code Jam India 2006. Good luck to the people who moved on to the next round.

20 Mar 2006 (updated 20 Mar 2006 at 10:23 UTC) »
Miscellaneous Readings
Some random stuff to do when you're bored:
Wi-Fi
I bought a D-Link DI-524 wireless router the other day to set up a little Wi-Fi network at home so that Anusha and I can surf the Internet simultaneously using our broadband connection instead of one patiently waiting for (or cursing) the other - she on her laptop, me on my desktop PC.

What surprised me was how cheap the equipment was (Rs 2,600/- after 4% VAT) and how easy it was to set up. There was a slight complication due to the Huawei SmartAX MT880 ADSL modem-cum-router we were using for our BSNL DataOne broadband connection and the assumptions made by the wireless router, but that is easy to resolve if you know the basics of IP (Internet Protocol). It was also relatively easy to secure the access point.

Of course, this adds a few more cables to the jungle of cables behind my PC that had already made cleaning difficult and any expansion a chore.

By the way, I have been seeing the prices of networking equipment (modems, switches, wireless routers) dropping drastically over the last year here in Bangalore, possibly because broadband has become quite affordable and because more and more people have a laptop or two.

Stevey Yegge's Blog Articles
reddit.com regulars would have surely noticed several articles from Stevey Yegge's blogs bubbling up with a lot of moderation points. I must admit that I spent more than a couple of hours reading many of his articles. As with Joel Spolsky, I might not agree with everything he says but I have to say that he writes fairly well most of the time (though he is a bit verbose and somewhat incoherent at times).
GCJ and ECJ
Tom has asked the GCC Steering Committee to provide their verdict on the proposed use of the Eclipse compiler for Java in GCJ. This follows his earlier proposal to abandon GCJX for GCJ and adopt ECJ instead. As of this writing, there has been no response from the GCC SC yet.

A Philistine Watches "2001: A Space Odyssey"
We watched Stanley Kubrick's "2001: A Space Odyssey" yesterday. I was terribly disappointed by this movie: most of the scenes were excruciatingly long, the music (when it was present) seemed mostly arbitrary for the scene in question, the "star gate" scene seemed amateurish and long (and looked as if it was designed to induce a headache), the actors were mostly expressionless, etc. On the positive side, I admired the special effects (awesome for 1968) and was pleased to see how they were shown in a matter-of-fact manner instead of the in-your-face style so common these days. I also like the main music score that was composed for this movie and which is the recurring theme throughout the movie.

The painfully long shots reminded me of the "art movies" we had to see in our childhood. At that time, the state television channel Doordarshan (literally "tele vision" in Hindi) was the only thing we could watch on TV. They used to show a movie every Sunday afternoon in one of the regional Indian languages. Being a Malayalee family, we used to watch every such Malayalam movie out of sheer loyalty. Unfortunately for us, Malayalam (like Bangla, but unlike other languages like Tamil, Telugu, Marathi, etc.) seemed to be blessed by a lot of award-winning directors who insisted on making "meaningful cinema" which was anything but meaningful to the vast majority of the population. It was very painful to sit through such movies.

I still remember a particularly painful scene from one such movie (whose name I cannot recall). The first shot shows an empty and untarred village road receding into the distance. After quite a while you notice a small speck on the horizon, very slowly increasing in size, until you can make out that it is a man on a bicycle slowly approaching your viewpoint. He finally passes your viewpoint after about five long and painful minutes. The next shot shifts the viewpoint so that now you see the same cyclist slowly pedal his way through the same road away from you till he again becomes a small speck on the horizon and till you admire the empty road for quite a while again. This shot lasts another five painful minutes. This scene makes you wonder what the point of the director was. Was it to drain all remaining enthusiasm for the movie from the viewer so that he does not apply much thought to the rest of the movie? Was it to filter the true admirer of meaningful cinema, who is masochistic enough to sit through such scenes, from the wannabes? Was it simply to fill up an extra reel of celluloid? Needless to say, after about 10 or 15 of such movies, our family lost all enthusiasm to watch Malayalam movies aired by Doordarshan. Only the advent of cable television brought relief and the ability to watch normal Malayalam cinema on TV.

Back to "2001: A Space Odyssey". In a couple of shots, there is this chorus of male noises in the background that has been warped to sound somewhat like the collective humming of a swarm of bees. That bit is rather painful on the ear as is the very shrill noise emitted by the black monolith on the moon when it is unearthed by humans. I personally also found some bits of well-known western classical music compositions a bit weird and out-of-place for the respective scenes.

The point of this long rant is that I believe that Kubrick could have so easily made this movie much shorter, much more bearable and much more accessible without losing anything of the story. Such a disappointment.

Google Code Jam India 2006
Google Code Jam India is back. It was quite popular here in India the last time around. I still haven't decided whether I should participate. I haven't been participating in TopCoder matches for a while now and even while I was, my rating was steadily and embarrassingly declining with every match. I can blame it on a brain that deteriorates with age or more honestly admit that even though I like coding and computer science in general, I'm not really as good at it as I would like to believe.
Tar Formats
GNU tar creates archives in various formats and recent versions create archives in the POSIX-2001 format. Unfortunately, while this format is the most flexible and is standardised, it is not yet supported by most of the installations out there. When you distribute archives in this format, users using older versions of tar (even GNU tar before version 1.14), will see "weird" folders like PaxHeaders.1640 extracted along with the ordinary contents of the archive as well as get error messages like "unknown file type `x'".

I was bitten by this problem when I tried to extract an archive created on my home PC using GNU tar 1.15.1 on Linux on different systems elsewhere. It seems that the "v7" format is the most portable at the moment, though it has severe problems with long file names and large files. My project does not have long file names or huge files, so I can use this format for the time being to avoid these problems. The long-term solution however is to encourage everyone to use a tar programme that can handle the far better POSIX-2001 format.

148 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!