16 Dec 2003 trs80   » (Apprentice)

Yet another rant that I've been meaning to write for a while is to do with loss of electronic content. Umberto Eco touched on this slightly in his article Vegetal and mineral memory: The future of books when he says

[Books] do not suffer from power shortages and black-outs, and they are more resistant to shocks.
However, the main thrust of his article is about how the concept of the book as a closed text is not going to go away any time soon (advances in hypertext [1] notwithstanding). The core point of my rant was going to be something about how much data goes missing due to laziness, incompetence and plain misfortune - contradicting an article I read recently (sorry, no cite) which claimed that people who really needed their data would invest the time and money in recovering it, which completely ignores the fact that it might be unrecoverable, assuming it had been saved in the first place (you often don't know you need data until you need it). This would lead to the conclusion that distributed backups (whether via CVS, rsync, FTP, BitTorrent or your P2P client of choice should really be used for all data no matter how (seemingly) unimportant. (And RAID 1 or 5 for quick recovery from inevitable drive failures). Fortunately, Jason Scott (maintainer of textfiles.com ) has written the rant for me. He also makes some points about the two most common mistakes of archives - too much presentation (and perfection) and not enough content, and prevention of copying.

[1] Ftrain, which I have just spent an hour or so exploring, is at the cutting edge of hypertext and knowledge relationships (or something). The same software is now used at the Harper's website to provide a more convienient way of using Harper's hypertextual books (to give their Indices another name). A side note is that the Harper's website is currently rather new, and while the skeleton is there via the Weekly Reviews and Indices, there's very little meat in the way of Features taken from recent issues as yet. Another mild criticism of both Ftrain.com and Harpers.org is occasionally you wish for another degree of freedom (type of relationship between objects), but this is minor given the how rich the sites already are.

info and why it sucks haruspex: I think info sucks because the standard info browser is very emacs like; to (mis)quote a friend: "To get to the next-up section in an info document, press Left-Meta-Ctrl-J" (size, unmemorability and uselessness of keystrokes exaggerated)[2], I can't search the entire file for a particular phrase (which I can while viewing a man page using $PAGER[3]), the navigation is quite cumbersome, it's not structured in the traditional man page format [4]. Diving down into a section requires finding a "menu" (if I recall the terminology correctly) in the text and finding the bit you want - you can't just go "next" and sequentially page through the entire document. This has lead on several occasions to me missing entire subsections about the topic I'm looking at.

Front ends like yelp or khelp do remove some of the interface issues, but the core problems with the format remain. If the official client included a search function, an automagically generated document index (or even the ability to easily page through the document) or the ability to view entire top-level sections at a time (so one could read it like a chapter in a book) it would be much more usable. The fragmentation of topics enforced by the info format often makes it very hard to gain an understanding of the big picture.

Yes, info dates from 1985 when terminals were 80x24 at 9600bps and SGML was the province of book editing geeks, but that doesn't mean it shouldn't be updated to current standards. Compare a standard HOWTO from TLDP to an info file - while HOWTOs are often divided into many subsections, they come as one chapter per HTML page. Also, HOWTOs are far more plentiful than info documents, which should give you a hint as to their relative popularity and usefulness. Maybe an info to HTML (or DocBook) convertor (in the style of HOWTOs) would be best solution.

gilbou: info foo where foo has a man page but not an info document will pull up the man page in the info browser, which means if you can stand the info browser for reading manpages, it will always default to the most useful type of documentation - info pages if they exist (and which are presumably more up to date) or the man page otherwise. But your point about a unfragmented interface to reference documents is very important.

In particular, info pages often refer to a whole sytem of programs, whereas there's one man page for each executable (in theory). As well, there's man pages for file formats (section 4 or 5), system calls and library functions (sections 2 and 3) - finding out the exact syntax of qsort() is much easier by going man 3 qsort than having to dig through an info file for its definition, and similarly to quickly find the syntax of a program (eg route). Perl has manpages, Python and PHP have HTML manuals - I can use either equally well, but info is HTML's bastard child (ok, it precedes HTML) with its too rigid structure of hyperlinks that prevents it from being truly useful. Although, to put it another way - man pages are for reference, while info is for documentation - it's perfectly alright for the man page to refer you to a (HTML, book, plain text, whatever) manual for a more complete discussion of the system, while info tries to be both quick reference and complete manual and fails at both.

[2] I like vi (well, vim), because I can use as much or as little of the available features as I like, which is also why I like Python. As for console web-browsers, I'll take w3m over lynx any day - mmm, image support in xterms and framebuffer consoles.

[3] $PAGER is always set to less for me.

[4] Common man page sections: the NAME of the program, a SYNOPSIS of options at the top, brief DESCRIPTION, full list of OPTIONS, detailed description of how the program works, EXAMPLES, the ENVIRONMENT variables the program understands, the FILES it uses, what BUGS it has, some other man pages to SEE ALSO, a brief HISTORY of the utility, AUTHORS of the man page. While this is not a complete list, and some variation occurs, this format makes it very easy to find your way around a man page.

Ok, that was a rant (although not that incoherent), and I didn't actually fire up info to confirm my prejudices^Wpoints, but it's 5am and I should be sleeping.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!