Older blog entries for fraggle (starting at number 46)

What's so bad about shell scripts?

I think that probably almost all smart people have realised that scripting using the Bourne shell is a bad idea if the script in question is more complicated than simply automating what can be typed by hand. I mostly avoid writing shell scripts, preferring to write scripts in either Ruby or Python. However, the ability to write shell scripts is still a useful skill; there are certain situations where writing a shell script really is the easier thing to do - mostly situations that involve mostly revolve around executing commands, or where they're the "standard" thing to do - init.d scripts, for example. It's also useful to be able to debug shell scripts that other people have written. To this end, I recently set about honing my shell scripting skills.

To this end, I wrote a script called branch_helper, which is for automating some of the drudge of managing Subversion branches. The main aim of this was to make maintenance of Strawberry Doom easier, as it is developed as a branch within the Chocolate Doom repository and needs periodic updates.

The result is a script that is probably as complicated a shell script as I am ever going to write; certainly the most complicated that I am ever going to want to write. The process did, however, give me deeper insight into why shell scripts, as a "programming language" are quite so unscalable and only suitable for very simple scripts.


  • One of the most fundamental drawbacks of shell scripts is the lack of a proper list construct. Almost all programming languages give you arrays of some form or other; the closest that you can get with shell scripts is "a string containing a list of items separated by spaces". While this sort-of suffices for some situations, the most obvious drawback is that you can't put items in the "list" that contain spaces themselves. The result of all this is that almost all semi-complex shell scripts are broken if you try to use them with files/directories that contain a space. To demonstrate this, try running a configure script in Cygwin from a directory containing a space (eg. "Documents and Settings").

    Bash has arrays as an extension, but, obviously, that won't work with any other Bourne shells. However, the standard Bourne shell does have one type of list - namely, the list of arguments to a function. It's sometimes possible to make use of this if you structure the script in the right way.

  • Semi-related to the first problem is the problem of how variables are expanded. command "$arg" and command $arg have different meanings, for example, as they expand into either one argument or (potentially) several arguments, respectively. One useful thing to do trying to write "correct" shell scripts is to continually ask yourself - "what would happen if this variable contained a space?"

  • The inability to easily "return" useful information from a function is one annoying drawback. Every function acts as a "mini-subprogram", which is rather aesthetically pleasing in a way, and actually incredibly useful in some situations. However, it suffers from the fact that the only result that programs in Unix can return is a single 8-bit value (exit code).

    The result is that the typical way to pass a value back from a function to its caller is to do something slightly hideous like this:

    result=`myfunction "$arg1" "$arg2"`


  • You can also get all kinds of insidious "gotchas" from the fact that the shell will sometimes fork. For example, the following give different output:

    result=0
    
    while true; do
        result=1
        break
    done
    
    echo $result

    and
    result=0
    
    echo broken | while true; do
        result=1
        break
    done
    
    echo $result
    

    (In the latter, the loop runs in a separate process, so the "result" variable is set in that separate process, and the value lost when the loop finishes).

  • This is actually another manifestation of the previous problem, but handling error situations can be problematic. The simple requirement of "check if a program runs correctly; if it fails, exit the script with an error" can actually be quite tricky to achieve. As the shell can fork to run different parts of the script (especially if you use the backticks trick to pass back values from functions), the "exit" command does different things in different places. If you're in a main script, "exit" will exit the script, but if you're in a section of code that has been forked off into a separate process, it only exits from that other process.

    I wrote a function called "error" to exit with an error, and used it to check that functions run correctly and, if they don't, chain back up to the top and exit properly. So in the end, calling a function looks like this:
    result=`myfunction "$arg1" "$arg2"` || error


  • Portability issues. This isn't so much of a problem nowadays because you can pretty much rely on bash being installed on most systems and take advantage of its extensions. However, if you really do want to write a proper "portable" Bourne shell script, there are some things that catch you out. For example, bash lets you define functions using "function myfunction() {" but this isn't supported elsewhere. Similarly, when doing comparisons, bash lets you do eg. "[ "$value" == "shoes" ]" in addition to the standard syntax, which is "[ "$value" = "shoes" ]".

    Some very old systems have quirky interpreters that mean you have to do tricks like "[ "x$value" = "xshoes" ]", because, without the "x", if "value" was empty, that would expand to " [ = shoes ]", which is a syntax error.



All in all, some rather nasty quirks that rapidly turn into gigantic annoyances when you try to do anything complicated. However, it's not to say that shell scripting is completely without merits.

Syndicated 2008-06-19 21:25:47 from fragglet

Valgrind with autotools

Automake helpfully provides the ability to run tests with "make check" - you can give it a list of test programs to run, and it will go through each in turn and check that they exit with a success status (0). However, when running test cases for stuff written in C, it's nice to run them in Valgrind - that way, you can pick up on any memory leaks or other subtle memory errors that you wouldn't otherwise notice.

Automake allows you to set a variable called "TESTS_ENVIRONMENT" that is prefixed to all your test commands, so you can run your tests in valgrind with something like:

make check TESTS_ENVIRONMENT=valgrind

Unfortunately, this isn't perfect. First of all, it's rather tedious having to type that every time you want to run some tests, and secondly, it doesn't automatically fail in error cases.

So I wrote some automake magic to make it all a bit more streamlined. Firstly, a --enable-valgrind flag to configure, to run tests with valgrind. It's then a simple matter of tweaking Makefile.am to set TESTS_ENVIRONMENT when we have valgrind enabled. Finally, a short wrapper script for valgrind to fail the test on any valgrind error output. I run with the -q (quiet) option to hide the normal valgrind blurb.

One thing that is important is to ensure that the tests are real executables and not magic libtool wrapper scripts (automake does this if you build against a .la file). Valgrind gets confused otherwise.

All in all, fairly straightforward. I guess autotools isn't always such a pain after all.

Syndicated 2008-06-10 00:20:25 from fragglet

John McCain is ooooold

List of inventions that presidential candidate John McCain is older than: the Jet engine, Nylon, the ballpoint pen, the helicopter, the microwave oven, holograms, nuclear weapons, the transistor, the Rubik's cube, communications satellites, velcro, the contraceptive pill, light emitting diodes plus every computer ever made, including every computer program ever written, every programming language ever designed, computer networks, video games and anything else based on a computer whatsoever.

His lifetime spans the entirety of World War II, the founding of the United Nations, the entirety of the cold war including the construction and demolition of the Berlin wall, the first man in space and the space race that followed, the American civil rights movement and all rock music ever made.

Syndicated 2008-06-03 13:42:26 from fragglet

Scientology "war"

Ah, Internet drama. So a bunch of kids have decided to "destroy" the Church of Scientology by DDoS'ing the Scientology website and making lots of prank calls to the various church buildings. Now, I'm thoroughly anti-Scientology and think that it's an incredibly dangerous and subversive cult; however, the rhetoric being thrown around by the members of "Anonymous" is almost as hilarious as the idea that a multi-million dollar business is going to be "destroyed" by a few kids ordering pizzas to the Scientology buildings and flooding their website off the Internet.


Perhaps the most stupid part of this whole affair is that it's possibly the worst possible action to take. Scientology likes to smear any of its critics as suppressive persons, effectively labelling them as hopelessly mentally ill people with anti-social and destructive tendencies. By "attacking" Scientology, the members of "Anonymous" are fitting themselves exactly into the role that the Scientologists would like to portray them as: "The antisocial personality supports only destructive groups and rages against and attacks any constructive or betterment group". Now it's easy for Scientology to dismiss any Internet criticism as having been concocted by antisocial "suppressives".


While people continue to believe in Hubbard's teachings, Scientology will continue to exist. The way to destroy Scientology is to destroy those beliefs, to show the lies that the church propagates and all the crazy stories about aliens found in the upper levels. The greatest weapon against Scientology is the truth, and the Internet is the most effective way to disseminate it. Of course, now, the church has an excuse to get more of its members running censorship software - "protect yourself from dangerous Internet subversives, out to destroy Scientology!". David Miscavige himself couldn't have come up with such an effective scheme.


There is obviously a large group of people participating in the "war". What a shame that so much energy has been put towards such an utterly counterproductive effort.

Syndicated 2008-01-25 13:26:06 from fragglet

Macbook Air

The minimum price for a Macbook Air is £1199. For this, you get a slow processor, 2 gig of RAM with no option to upgrade ever, mono speakers - although I guess it doesn't need decent speakers, since there is no DVD drive to watch movies on anyway, tiny (and slow) hard drive (just in case you thought you could download movies to watch instead), no Ethernet port, and a single USB port just to fuck you over in case you thought you could plug in a USB ethernet dongle and external USB hard drives and DVD drives to work around the above inadequacies.


The best part of all is that if you pay £2000, you can get the higher spec model, which has a slightly faster processor and even less storage.

Syndicated 2008-01-24 10:41:54 from fragglet

EU trolling

One of the features of the EU treaty being signed today is that it gives the Charter of fundamental rights of the EU legal force. I noticed this in Article 41:

4. Every person may write to the institutions of the Union in one of the languages of the Treaties and must have an answer in the same language.

Some things to consider:
  • Institutions of the European Union are obliged to respond to questions in any of the languages of the EU with replies in the same language.
  • The EU parliament is an "Institution of the European Union".
  • Therefore, MEPs are members of an "institution of the EU".
  • Does this mean that I can write to random MEPs (Robery Kilroy-Silk, for example) in random EU languages (Hungarian, for example), and they are obliged to reply to me in the same language?

I see great potential here for foreign language-based trolling.

Syndicated 2007-12-13 15:26:36 from fragglet

Offensive scrabble words

Ubisoft recently sparked some outrage over including the word "Lesbo" in their Nintendo DS version of Scrabble, which some people found offensive.


I decided to do some minor research, here is a list of several more words present in Scrabble DS:

Cursing: Asshole, Cunt, Fuck, Jism, Mofo, Shit, Wank

Homophobic: Fag, Ponce, Poof, Poon

Racist: Cracker, Dago, Gook, Jew (as a verb, meaning to haggle), Jigaboo, Kike, Raghead, Spic, Wog, Yid

There were many more racist terms but some of them seemed to be obscure words specific to a specific dialect, that I've never even heard before. Ubisoft certainly used a comprehensive dictionary!

Syndicated 2007-10-05 23:19:08 from fragglet

Psychic debugging

< AlexMax_> Oh fuck yes
< AlexMax_> my bash kung fu is still strong
< AlexMax_> heh this is getting messy, windows svn doesnt like being
            called from a shell script so now I'm using the batch file to
            update and shell script for everything else
< AlexMax_> heaven forbid anyone else try to replicate what I'm doing
< AlexMax_> OK this is really weird
< AlexMax_> If I put in a command at the bash command line, it runs fine
< AlexMax_> but if i put in that same command into a shell script, the
            command acts like it doesnt recognize the paramitors
<@fraggle> sh != bash
< AlexMax_> I'm using winbash
< AlexMax_> sh is winbash
< AlexMax_> wait a minute
<@fraggle> do you have #!/bin/sh at the top of your file?
< AlexMax_> what?
< AlexMax_> No, but why should i have to, I involke it using sh
            autobuild.sh
< AlexMax_> actually fuck
<@fraggle> try bash autobuild.sh
< AlexMax_> yeah, i could have sworn bash and sh were the same on this
            system
<@fraggle> i think it can behave differently depending on whether you
           invoke it as sh or bash
< AlexMax_> i know that sh and bash are usually distinct on linux
< AlexMax_> but i just remembered that sh is the msys sh and bash is
            winbash
<@fraggle> your bash kung foo may be strong but my psychic debugging
           powers are stronger

Syndicated 2007-10-02 21:44:34 from fragglet

CCTV cameras and Big Brother

I saw (linked from Slashdot), "This is London" is reporting that "Despite tens of thousands of CCTV cameras, 80% of crime remains unsolved.

First of all, the article analyses "crime clearup rate", which is not a measure of the amount of crime, but of how much crime is solved. So what it is really claiming is that "CCTV cameras do not help police to solve crimes". It's important to make this distinction, because it's easy to misinterpret this as meaning "CCTV cameras do not deter criminals", which, indeed, is what the submitter to Slashdot thought.

Secondly, the figures themselves are used in a way that is practically meaningless. "Police in [District X] only have a clearup rate of 20%, despite [N] cameras!". Now, I'm not discounting that there may be a relationship between CCTV cameras and crime clearup rate, but I'm sure there are plenty of other factors that are likely to be much more significant when comparing clearup rates between districts - the number of police officers, their competence, and the actual crime rates in those districts, for example. We're also given no indication of what a "good" crime clearup rate is supposed to be, or how those rates have changed over time since the introduction of CCTV.

I'm always skeptical about stories about CCTV cameras (especially ones where they are described as a "publicly funded spy network"), because a lot of people seem to have an irrational fear of them. Whenever CCTV is mentioned, cries of "Big Brother" and "invasion of privacy" abound. Big Brother and George Orwell form an interesting parallel to Godwin's Law: Any discussion regarding CCTV cameras will inevitably descend into comparisons with Big Brother. "Big Brother" has become a reason unto itself to bash CCTV: a book exists, depicting a dictatorial world, and it features CCTV, therefore CCTV is bad.

Similarly I'm not quite sure how filming a public place constitutes an invasion of privacy. Nobody that I've talked to has yet been able to answer this. If there was a policeman standing on the street in the place of the camera, would that also constitute an "invasion of privacy"? The funniest answer I've had so far is that people would no longer be able to commit minor crimes that they would previously be free to commit.

Of course, I don't believe that there are no potential issues whatsoever surrounding the use of CCTV cameras, but I really detest the sensationalism and irrational paranoia that surrounds them.

Syndicated 2007-09-21 10:34:24 from fragglet

It's the bandwidth, stupid: part 2

Bill Dougherty has posted Part 2 of this "It's the latency, stupid" article. Sadly, this is filled with as many factual errors as the previous one.

Where do I start? First of all, HTTP: "HTTP 1.1 signals the web server to use gzip compression for file transfers". This is pure and simply wrong. Go and read the HTTP/1.1 specification. Although gzip is mentioned, there's no requirement that a HTTP/1.1 server should use gzip compression. I'd say that no browser has shipped for at least five years that uses HTTP/1.0, so this is a totally irrelevant suggestion to make. Even then, switching to HTTP/1.1 will not magically add gzip compression: it's up to the web server to optionally send you compressed data instead of the normal uncompressed data. 99+% will not do this.

Using HTTP/1.1 CAN provide an advantage, but for different reasons that are entirely unrelated to compression. The major difference between HTTP/1.0 and 1.1 is that HTTP/1.1 can reuse an existing connection to retrieve more files. HTTP/1.0 immediately closes the connection when a download has completed. This has advantages because of the way the congestion control algorithms work: they start off with a small TCP window size that is increased in order to determine the available bandwidth of the channel. With HTTP/1.0, this process is restarted when downloading each file. HTTP/1.1 allows you to reuse your existing connection that has already settled to a reasonable TCP window size. This is important for modern websites that have lots of images and other embedded content. As I mentioned before though, this is utterly irrelevant because all modern browsers already use HTTP/1.1 by default.

Then Bill comes up with this gem: "One effective method is to change the protocol. Latency is a problem because TCP waits for an acknowledgement". This is also wrong. He seems to be under the mistaken impression that TCP is a stop and wait protocol: that each packet is sent, an acknowledgement waited for, and then the next packet sent. What actually happens is that TCP sends a bunch of packets across the channel, and as the acknowledgement is received for each of packet, the next packet is sent. To use the trucks analogy again, imagine twenty trucks, equally spaced, driving in a circle between two depots, carrying goods from one depot to the other. Latency is not a problem, just like distance between the depots is not a problem: provided that you have enough trucks, the transfer rate is maintained. The TCP congestion control algorithms automatically determine "how many trucks to use".

TCP will restrict the rate at which you can send data. Suppose, for example, you're writing a sockets program and sending a file across a TCP connection: you cannot send the entire file at once. After you have written a certain amount of data into the pipe, you cannot write any more until the receiving end has read the data. This is a good thing! What is happening here is called flow control. You physically can't send data faster than the bandwidth of the channel you're using can support. Suppose that you're using 10KB/sec channel: you can't send 50KB/sec of data across that channel. All that TCP is doing is limiting you to sending data at the physical limit of the channel.

"If you control the code, and can deal with lost or mis-ordered packets, UDP may be the way to go". While this is true, it's misleading and potentially really bad advice, certainly to any programmers writing networked applications. If your application mainly involves transfer of files, the best thing to do is stick with TCP. The reason is that TCP already takes care of these problems: they've been thoroughly researched and there are many tweaks and optimisations that have been applied to the protocol over the years. One important feature is the congestion control algorithms, that automatically determine the available bandwidth. If you don't use these kind of algorithms, you can end up the kind of collapse that Jacobson describes in his original paper on network congestion. If you use UDP, you're forced to reinvent this and every other feature of TCP from scratch. As a general rule of thumb, it's best to stick with TCP unless there is some specific need to use UDP.

Finally, I'd like to examine his list of "tricks that network accelerators use":

"1. Local TCP acknowledgment. The accelerator sends an ack back to the sending host immediately. This ensures that the sender keeps putting packets on the wire, instead waiting for the ack from the actual recipient". This is nonsense. TCP keeps putting packets onto the wire in normal operation. It doesn't stop and wait for an acknowledgement. TCP acknowledgements should already be being transmitted correctly If you're interfering with the normal transmission of acknowledgements, all you're doing is breaking the fundamental nature of how the protocol and the sliding window algorithm work.

"2. UDP Conversion. The accelerators change the TCP stream to UDP to cross the WAN. When the packet reaches the accelerator on the far end, it is switched back to TCP. You can think of this a tunneling TCP inside of UDP, although unlike a VPN the UDP tunnel does not add any overhead to the stream." I fail to see what possible advantage this could bring.

"3. Caching. The accelerators notice data patterns and cache repeating information. When a sender transmits data that is already in the cache, the accelerators only push the cache ID across the WAN. An example of this would be several users accessing the same file from a CIFS share across your WAN. The accelerators would cache the file after the first user retrieves it, and use a token to transfer the subsequent requests." This is useful in the very specific case of CIFS, because SMB has known performance issues when running over high latency connections - it was designed for use on LANs, and the protocol suffers because of some assumptions that were made in its design. This doesn't apply, however, to the majority of other network protocols.

"4. Compression. In addition to caching, network accelerators are able to compress some of the data being transmitted. The accelerator on the other end of the WAN decompresses the data before sending it to its destination. Compressed data can be sent in fewer packets, thus reducing the apparent time to send." Amusingly, what this actually does is decrease the bandwidth used, and has nothing to do with latency.

Syndicated 2007-06-06 02:55:11 from fragglet

37 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!