ah, time for my yearly smattering of blog posts
Annnnnnnnnnd we're back.
Today's hot tip:
@filearray = <FILE>;
...is a bad idea if <FILE> is rilly large!
Thank you good night.
(No Perl jokes, please.)
ah, time for my yearly smattering of blog posts
Annnnnnnnnnd we're back.
Today's hot tip:
@filearray = <FILE>;
...is a bad idea if <FILE> is rilly large!
Thank you good night.
(No Perl jokes, please.)
disabling the caches in Linux
Sometimes you'd like to disable the file (and other caches) in Linux for performance testing reasons. Well, now you can. Simply:
To drop pagecache only, enter: echo 1 > /proc/sys/vm/drop_caches
2 is dentries and inodes only, and 3 is pagecache, dentries, and inodes. You can easily whip together a little script to drop caches regularly enough to simulate running with no caches.
(This information is also under "drop_caches" in linux/Documentation/filesystems/proc.txt.)
"You wouldn't like me when I'm angry..." OR "Dr. Jekyll and Jiminy Cricket"
I'm a collector of odd facts. I remember hearing this one some time ago, about how grasshoppers morph into brutal locusts under certain conditions but I find that often the "facts" you hear word of mouth aren't always factual. (Unfortunately, a lot of facts in those 1001 Fascinating Fact books are often wrong too... I've thought of writing a 1001 Fascinating Falsehoods [you thought were facts!], but I'll just have to add that to my to do list.)
remote administration in the post-onboard-serial era
Please Don't Take Away My Serial Port!:
I've been dismayed lately at the lack of serial ports on COTS desktop PCs. I understand eliminating things like parallel ports, game ports, even audio line in... but serial ports are special... and expansion devices are not as good as on-board hardware.
If you ever end up using the PC as a server that you have to administer remotely, you may need a serial console so you can see what's going on at boot time, or to manually select a kernel in the bootloader -- say if your testing kernel fails. When most people need a serial port on a serial-less machine (e.g., to access the serial port on a router from a laptop) they get a serial expansion card or a USB serial dongle. It seems like this should work. But don't count on it.
grub and expansion devices:
Expansion serial ports are barely passable as a solution for Linux/grub users -- and in my opinion are not acceptable for real production systems. In particular, the bootloader and kernel will not be able to send output to expansion devices until after the hardware has been properly recognized and the bootloader or kernel is prepared to handle it. During the time prior to initialization of each system (early in the bootloader and early in the kernel startup sequence), you'll be flying blind.
However, grub does not understand USB *anyway*, so unless you are lucky enough to have a BIOS which supports USB serial ports as native devices (like it does for keyboards and mice), your USB dongle will not allow you to control your bootloader. Period. This is because making a USB serial port work requires a functional USB subsystem, which is more than a bootloader is supposed to handle. As of now, it's not clear if or when grub will support USB. So laptops are screwed.
But a desktop machine can us a PCI card, right? You'd like to think that, wouldn't you? Unfortunately, grub only knows the standard IO ports (memory addresses and IRQs) for COM 1-4 (units 0-3 in grub parlance) -- which means if your PCI serial card appears at a different address, grub will not be able to use it. There is code in the pipe for PCI expansion serial ports in grub, but I'm not sure of its status. It doesn't work in my Hardy Heron Ubuntu, although I'm hopeful that this will work reliably in the future. (If so, then PCI cards could be a good solution for "desktop" PCs.)
The Linux Kernel and Expansion Devices:
OK, so grub support is thin. How about the kernel?
The outlook is better here, inasmuch as there is some hope. The kernel has driver software for using these devices, but again, until your device is initialized, you won't see anything. Getting it initialized early enough to use as a console may require you to use a ramdisk to boot. While most distros already do this, there are valid reasons for wanting to avoid using a ramdisk. Too bad.
If you don't mind using a ramdisk, it may work if you're using a stock kernel and your devices are common enough. If they're uncommon, you may be required to compile alternative drivers into a custom ramdisk, including extra USB stuff. Building custom ramdisks for your hardware is never as simple as it should be, and is a ticking time bomb for maintenance. At some point, the update path will change, you'll get a new kernel but the ramdisk won't be properly modified, you reboot, and suddenly you don't have a console anymore.
But let's say you go this way (out of necessity or because you feel a need for more stress in your life) and you get the ramdisk set up with appropriate drivers for your devices. Now that the kernel can see your hardware, you need to create the appropriate devices. In the old days, that meant creating a physical device and setting permissions on it. But nowadays, devices are created automagically. This is a good thing in general, but it can also be a pain when you only have to fiddle with it every two years.
Anyway, you'll have to monkey around to get your event system (e.g., udev or the older hotplug or what have you) to set up the devices for you properly, including proper permissions. You'll also have to add inittab or event.d rules to start a getty process on the new port if you want a terminal on the console and add rules in securetty if you want to log in as root. Many HOWTOs covering this information are online.
Short version: Expansion COM ports are flaky PITAs. If your computer just had a single built in COM port, it would be easy-peasy, stable and un-cheesy. (Hey HW guys, if you're listening, a serial device could just be an on-board device with bare pins and we can add a ribbon cable if we need it! Everybody wins!)
There is some good news, however. If you can get grub installed, you can make temporary kernel selections before reboot -- without modifying your menu.lst file. The basic idea is that you set up a "fallback" priority for the kernels and then temporarily select your testing kernel as a "one time boot" using the command grub-set-default. Ok -- things are looking brighter. But what if your kernel panics? Now it's hung. Ah, but you can make Linux reboot upon panic by adding the line "kernel.panic = 15" to /etc/sysctl.conf. (Before it panics!)
... But You Still Want That Smart Power Strip:
Of course, if your kernel deadlocks, you're screwed. The kernel's so hung it can't even reboot itself. Time to use that Linux-friendly IP-enabled power strip you've been meaning to buy for a while. I found the above device being sold under a different label on eBay. The interface isn't the best, but it works, it doesn't require any special software, it has four independent outlets, and it's relatively cheap.
Now, if your machine goes dark when you're remotely testing that bleeding edge kernel, just reboot it! If the disk isn't scrogged, it will happily fall back to the known good kernel and you're golden!
Abstraction and Trade-offs: The Devil You Know
Computer Security is kind of like a lot of things in life, where when you finally figure out the "secret" it's kind of a letdown. We want to find silver bullets for our problems, like a vegetable that is minus 1000 calories per serving (yet is packed with vitamins!), and we want to find that one get rich quick scheme or magic pill that actually works. We want that to be true so badly that whole industries are dedicated to coming up with new schemes, pills, and kitchen gadgets (SlapChop!).
We want to find the winning strategy, but sometimes we find out the answer is some disappointing koan like "the only winning move is not to play." With security, we want to come up with some scheme where we can make our networks and machines bulletproof to any attack, in any scenario. But the more you work on it and think about it, the more you're faced with economics. We still have to play the game, but life is just too short and you don't have enough resources to make everything perfect. It's almost certainly impossible even if you did have unlimited resources. (As Steven Wright said, "You can't have everything -- where would you put it?")
So what's most important to you? Are you more worried about electronic attacks? Or are you more worried about physical attacks? If you're worried about physical attacks, are you worried about your neighbor? The govenment? Aliens? Are you more worried about confidentiality or availability? Etc., etc.
If you're more worried about physical attacks, you'll probably spend a lot of time on physical security -- sensors, bunkers, acid-spitting robots. But if you're only interested in electronic security, you probably won't spend any money on additional physical security. (Most of us worry about malware and identity fraud, but how many of us have put extra locks on our doors because of it?)
At some point, you have to sit down and work it out. What threats do I think are most likely to succeed? What's my worst case scenario and how much would it cost me if it happened? Which countermeasures would be worthwhile and which ones would be a waste of time? So we make trade-offs. We don't reinforce the door to the server room, but we do encrypt the backups. We run a firewall on our router, but we still use wifi at home. We get a battery backup, but we don't get an emergency air conditioner or a halon fire suppression system.
Abstraction is just one more tool in the toolbox. And just like any tool, it has strengths and weaknesses. Yes, abstraction embeds weaknesses at levels you may not be able to control, but it also keeps you from reimplementing the wheel every day. It saves time. It makes code simpler. You have to ask yourself: "What's more likely to cause problems: an imperfect standard (with well-understood flaws that can be designed around), or a homemade solution likely to be full of unknown problems which are potentially worse? In most cases, the right choice is to be use well-known albeit imperfect systems because the alternative is so much scarier.
For me, the lesson is two-fold. First, make smart trade-offs. You're going to make trade-offs one way or the other -- if you don't know what they are, you could be making bad decisions. Second -- and this is true for everyone from the hobby hacker to (especially) people on standards task forces -- use your influence to develop and choose good abstractions. Please.
[Acknowledgment: Bruce Schneier talks a lot about trade offs. I'm certainly not trying to parrot him, but the reality of trade offs has been impressed upon me through several recent experiences, so It's on my mind. I started this long post because I wanted to talk about the problem of embedding flaws in layers through abstraction. But the truth is that abstraction is almost certainly worth the risk -- because ultimately, it's a trade off.]
Abstraction -- Friend or Foe?
Abstraction is one of the pillars of software and systems engineering. It allows designers and developers to worry about their own specific problem (e.g., writing application A) without having to understand all the details of the system. It's almost impossible to imagine building a complicated system like an operating system or local area network without relying on abstraction to make the problem manageable. In the CS world, you're taught that abstraction, not dogs, are Man's best friend.
But abstraction comes at a cost. It can be less efficient, of course; the abstraction gives flexibility, at the cost of cycles or memory. These days, most of us are willing to pay an efficiency cost if it means for quicker development time, etc. But most people don't think about the inherent security cost of using abstractions.
Abstraction is powerful specifically because it hides the details of one system from another system, exposing only an interface. But, the devil is in the details as usual. Using an abstraction means trusting the abstraction to have whatever properties are important to you, including security properties like confidentiality and authenticity. I could provide a messaging abstraction for you that I advertise to be secure, but in reality isn't. Depending on the kind of application, you might not have any control over the abstraction or even ability to verify my claims. You might not even have a choice about the abstraction to begin with.
Consider the typical networking stack for LANs. Your network application relies on all the abstraction in your app (libraries), and every abstraction underneath it -- network layer, transport layer, etc. ARP Poisoning attacks attack the ARP resolution of Ethernet, which affects your machine's ability to communicate to the Internet. In essence, it's trivial to use ARP Poisoning to sniff portions of a LAN, or perform data modification, even on switched networks. There are very few common methods to deal with the possibility of ARP Poisoning on your average corporate network, so you just have to live with the fact that it's vulnerable.
But if Internet traffic is vulnerable, then so is DNS, and from there you're lost. Thanks to abstraction, you're actually helpless to replace those layers in the system. You might be able to replace your application libraries, but you can't easily get rid of ARP.
Tomorrow: Tradeoffs: Maybe Abstraction Is Worth It Anyway...
Throbbing You Blind
So, I couldn't leave well enough alone. I was also too burned out on work to make progress, so I went back and revisited the N throbber, using the simpler technique (shrink at RGB, then index colors), and also added an alpha channel. I think it looks a lot better.
But I couldn't stop there -- I did what I probably should have done first -- I made a throbbing "F" for Firefox in the style of the N throbber. See yesterday's post for instructions to install it.
I've always wanted to get the old-timey Netscape "N" throbber working in Firefox, just for some of that retro feel. I found an old copy of the "N" throbber in my files and updated it for use with Firefox.
Here are some instructions for changing your throbber.
And here is an archive with the necessary files for that 90s feel. (Soul Asylum MP3s not included.)
The original throbber was 16 colors (EGA) and 30x30, but Firefox uses 16x16, 20x20, and 24x24 throbbers. Scaling the old dithered throbber looked awful, so I converted the GIF up to RGB, replaced the dithering with a solid color and made resized versions. However, the original GIF had an EGA palette, so converting the resized GIFs looked awful -- what I needed were more shades of the colors used in the image. With a limited palette, I could get several shades of each color within the 8-bit limit (DOOM used a similar trick). So, I stretched and blurred the RGB image and converted it back to 8-bit to make a reasonable indexed colormap (here are my source files). Then, I applied that map to my resized GIFs. Now you can party like its 1995!
IT'S NOT A RACE CONDITION
Here's another in an embarrassing series of stupid programming mistakes:
I don't know about you, but I have a bad habit: when I encounter a strange bug in my code and I'm not sure how it could have happened, or it involves some kind of event that happened or didn't happen when I thought it was supposed to, and especially if it involves data corruption, I start thinking it's a race condition. Which, of course reminds me of a certain House, M.D. meme.
It COULD be a race condition. It's possible. Especially if I'm testing on a multi-core machine. (And these days, aren't they all?) It's also more likely to be a race if it locks the machine, or if it's unpredictable. But every time -- every single time -- I have thought that an elusive bug was a race condition, it wasn't. It was an ordinary, mundane, bone-headed move on my part.
Allow me to digress for a moment. In religious circles, you hear people talk about "sins of commission" versus "sins of omission." In the former, it's something you've done, such as stealing, murdering, or coveting your neighbor's gorgeous donkey. Sins of omission are the things you haven't done, but should have, such as failing to honor your father and mother or not loving your neighbor as much as you love yourself. It's easier to recognize the wrong things we do than to recognize the right things we fail to do. It's similar in programming. Most mundane bugs are "wrong things we do" -- erroneous code we wrote into the program. Most race conditions (in my experience) are the result of necessary things we failed to put into the program (locking and/or synchronization).
It's fundamentally hard to pore over your own work to find mistakes. You kind of have to pretend you didn't write it -- or assume that it's wrong -- because if you'd known it was broken, you wouldn't have written it that way to begin with. It can also be hard to swallow your pride and admit that your work is also the most likely source of error. Subconsciously choosing to assume that the problem is a race somehow saves face (even while it dooms you to hours of adding specious locks and fruitless poking around).
In my experience at least, you're only fooling yourself. Next time, unless you're absolutely positive it's a race -- assume that it's not, and start looking for assumptions you made in your own code. Maybe you calculated that pointer incorrectly. Or maybe your loop is exiting earlier than it should. That innocuous helper function that you skimmed may be stabbing you in the back.
Remedial Coding: Never. Assume. Anything.
I have friends who are great programmers, technologists, and scientists because they have a really remarkable clarity of thought and methodicity that translates well into programming. Programming is easy for them because they naturally structure problems the way programs are written. I'm not one of those people. My talents are more intuitive than analytical. I tend to think out loud. But that kind of approach can get you into trouble when you're doing a kernel project with a 20-minute compile-test cycle.
Writing code is like playing Operation. Anyone can do it. (Ok, 4 and up due to the choking hazard.) The question is, how many times are you going to get electrocuted in the process? Core dumps, compilers, stack traces, debuggers and print statements will shock you every time you make a mistake -- just like that buzzer and red light. But eventually, you'll get all the little bones out of the patient -- it just might be ugly along the way.
Anyway, in the past, my "fly by the seat of your pants" approach has made my programming a little like playing Operation while riding in the back of a truck. The process is painful, the end result can really be a mess, and it takes 10 times longer than it should. I try to be halfway methodical, but that just ends up being a waste of time. So, I'm trying to replace these sloppy habits with useful, meaningful structure.
My first hard-earned lesson is this: don't assume anything. You know how they say, "Don't assume, it makes an ass out of you and me?" Well, when you're programming, and you assume things, it only makes an ass out of you. I do this all the time though. I'll be writing some code, and I'll see a function named "does_stuff()" that gets called, and I'll think to myself "Oh, I know what that does." And then I spend an hour debugging my code before I think, "I guess I better figure out what that little function does..." I call these things "grey boxes". They're like black boxes, except less scary. You think you understand them, which gives you a comforting yet false sense of understanding.
I know what you're thinking. You tell me, "But does_stuff() could be doing anything! How can I know what it actually does?" Well, I'd like to remind you that it is a PROGRAM executing on a COMPUTER. Chances are, its behavior is totally deterministic. You could take half a day peppering your code with print statements until it narrates its behavior to you in the Queen's English, or you could take 5 minutes to actually look at the function. You'll probably learn something useful. Yes, sometimes it's not worth going down every rabbit hole until you understand the context surrounding it... but at some point, you will need to know what it's doing. Otherwise, you're betting the correctness of your program on something you assumed to be true. And the more code you write around grey boxes, the harder it will be for you to figure out later. You might even forget that you assumed things to begin with!
Assuming things is more pernicious than simply guessing about what does_stuff() does. It's not just functions that abstract away behavior. Function pointers, the meaning of variables, macros, goto labels, structures, methods -- in short anything that is not explicit or obvious -- could be playing games with your head. Functions snicker at programmers who naively assume things about their behavior. So do yourself a favor. Any time you hear the voice in your head say "I think that does/means X..." take 5 minutes and figure out what it ACTUALLY does. You'll thank yourself later.
New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.
Keep up with the latest Advogato features by reading the Advogato status blog.
If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!