Older blog entries for slamb (starting at number 57)

24 Mar 2007 (updated 24 Mar 2007 at 10:17 UTC) »
make oddities

I'm trying to correctly express the dependencies for running yacc, which produces multiple targets from a single invocation. Let's start with a rule from racoon2's lib/Makefile.in:


.y.c:
        $(YACC) $(YFLAGS) $<
        mv -f y.tab.c cfparse.c

There are three problems with this:

  • it has a hardcoded filename in a pattern rule,
  • it has an intermediate file with a generic filename, which causes problems when run in parallel. I use make -j4...this gets run in parallel if there are multiple .y files, and in a non-obvious case I'll mention below
  • It doesn't mention the y.tab.h target that other files (cftoken.o) depend on.

As far as I see, there's no way to express to make the generic intermediate file problem. The best you could do is to use lockfile(1). But it's not universally available, and if I'm going to try convincing a project to switch to tools not universally available, I might as well just try for bison, which produces unique filenames directly. Now my rule can look like this:


%.tab.c %.tab.h: %.y
	$(BISON) $(YFLAGS) $<

This works under GNU make, and almost works under BSD make. The problem there is that it's run twice. Easy to see with a test file:


.PHONY: all
all: foo.out1 foo.out2


%.out1 %.out2: %.in lockfile -r0 mylock touch $*.out1 $*.out2 sleep 1 rm -f mylock

clean: rm -f mylock foo.out[12]

It works with gmake but fails with bsdmake. And here's something odd: replace that pattern rule with a static one...


foo.out1 foo.out2: foo.in
	lockfile -r0 mylock
	touch foo.out1 foo.out2
	sleep 1
	rm -f lock

...and GNU make fails, too. A non-intuitive difference between static and pattern rules: static rules use multiple targets on a line to say that both targets are made with similar commands (but different $@), while a pattern one says that both targets are made with the exact same invocation.

What about cheating by making one target depend on another?


%.tab.c: %.y
	$(BISON) $(YFLAGS) $<


%.tab.h: %.tab.c

I thought about it, but there's no guarantee the targets are produced in a particular order, and if they happen in the one opposite what I give, it will rebuild. It might end up doing that over and over again if my choice is consistently wrong.

I guess what I can do is make cftoken.o depend on cfparse.tab.c, as a surrogate for cfparse.tab.h. It's silly, but it works.

My conclusion: make sucks, and GNU make sucks a little less than most. But I guess I already knew that from Recursive Make Considered Harmful. (Besides making the argument you'd expect from the title, it has some good points like how GNU make's reparsing of changed include files put it a step above the rest.)

lkcl, re: message passing

First, you're wrong about rename(): it's not atomic. There is an intermediate step that processes can see. From the Linux manpage:

However, when overwriting there will probably be a window in which both oldpath and newpath refer to the file being renamed.

(The atomicity people say it provides is carefully limited...newpath either refers to the old file or the new file. That's generally all you need.)

Second, your statement that Linux lacks message passing and atomic operations is false. Linux has many forms of message passing between processes/threads - pipes/FIFOs/sockets, POSIX message queues, etc, and they are entirely suitable for the task pphaneuf asked about. I'm not sure what atomicity guarantee he'd need that Linux doesn't provide. You call write() to send a byte on a wake-up pipe, and that byte is either in the buffer or it's not. There's no intermediate state; therefore, it is atomic. What more do you want? Block until the other process/ thread has actually grabbed the byte? Why? You can build that if desired.

Third, your mention of microkernels is a nonsequitur. I find Tanenbaum's work, and the L4 project you're alluding to by mentioning those universities, to be quite interesting. However, microkernels are not necessary to provide any particular IPC facility for userspace processes to communicate amongst themselves.

9 Feb 2007 (updated 9 Feb 2007 at 08:43 UTC) »
ncm, here are five tidbits you probably didn't know about me:

  1. I first learned about the Internet by dialing ISCABBS through my dad's dual-speed 300/1200 baud modem when I was 10.

  2. My official major in college was electrical engineering then computer science, but my favorite classes and professors were in physics. If I hadn't already loved computing for nine years before finding the physics for majors classes, I'd be on a different path today.

  3. When I drove from Iowa City to the Bay Area with a carload of belongings, I showed up at the doorstep of good friends I met through ISCABBS. I'd never been to Northern California or met them in person, and I lived with them for the next year.

  4. I was once nearly thrown in a Tanzanian jail. The Arusha and Serengeti offices of Tanzanian Immigration and Customs disagree on the legality of my crossing the border by bicycle from Kenya so far from an official border post. I probably shouldn't have taken the advice of a Kenyan who'd spent three days in Egyptian military custody for entering without a visa.[*]

  5. During that crossing, my friends and I went through a dozen inner tubes and two dozen patches before running out twelve kilometers from our destination and walking. Three days cycling from Karen, Kenya to Loliondo, Tanzania - much of it on a rocky, hilly, windy earthen path that diverged wildly from our map - will do that. We had to bum a 450 km ride to Arusha where we could get replacements. Probably the stupidest thing I've ever done, but the trip was one of the greatest experiences of my life. The people were kind, generous, and amazing in the truest sense of the word.

[*] "What did you see?" "Camels and sand. Camels and sand. What do you see?" But all worked out for him, too - he met his wife in Egypt.

5 Feb 2007 (updated 5 Feb 2007 at 22:00 UTC) »
ncm, fejj

The data structure you're talking about is called a Bloom filter. It seems to be one that I see and think "oh, that'll come in handy sometime" but then it never does. The nastiest limitation is that you can't ever remove anything from it. So if your web crawler should eventually rescan the same URL (as I hope it would), it'd be unsuitable for hashing URLs. I confess that I don't really understand what the interview question is getting at.

1 Feb 2007 (updated 1 Feb 2007 at 07:18 UTC) »
nconway, re: pgmemcache

Interesting project, but I don't understand how to use it correctly with transactions.

I can see that you and/or Sean Chittenden put some thought into this - your README.pgmemcache suggests using an AFTER UPDATE trigger, which runs within the transaction making the change. It further suggests deleting the stale entry from the cache rather than replacing it because the latter is not "transaction safe". By that, I assume it means that if the transaction gets rolled back, the cache will incorrectly hold the new value.

But how do records get added? If they're added on retrieval (as would make sense for a cache), it seems like there's still a potential problem:


conn a          conn b
------          ------
                begin work;
begin work;
update;
(remove entry)
commit work;
                select;
                (add entry)

PostgreSQL has MVCC, so I think this schedule is possible, even at set transaction isolation level serializable. At the end, the cache is stale and there doesn't seem to be a mechanism to correct it.

The readme linked to this presentation (PDF) which suggests an ON COMMIT trigger. I think an AFTER COMMIT trigger could mostly prevent this from happening, provided that the add does nothing if there's already an entry in place. But it introduces new problems:

  1. if there's a transient network outage affecting the memcached server, the entry can't be replaced, and it's too late to fail the transaction.
  2. other connections may briefly see a stale entry. It will usually be quickly corrected, so it may not be a problem for many applications
  3. if memcached's LRU ejection happens between the replace and the conditional add, the stale entry will be added back in.

The only easy possible solution I see is to lock the row on any cache miss or update, keeping your current procedure otherwise. I guess that's not so bad really. No real performance hit, select ... for share instead of just select ....

ekashp, re: advogato recentlog quality

There are two things you can do to make advogato's recentlog more focused on open source development:

  • Set a good example. Talk about something you've contributed to open source recently. Maybe people will get the hint and do something trendy like syndicating http://their.blog.here/?tag=opensource instead of http://their.blog.here/, thus keeping their open source stuff while suppressing their animé commentary.
  • Use advogato's "interestingness" metric. It will suppress stuff under a threshold, and it will even have some effect on entries you haven't rate at all. I don't know the exact implementation, but for entries you haven't rated, the metric's supposed to guess based on ratings from people whose diaries you've rated highly. My recentlog says "9 entries suppressed at threshold 3" at the bottom, and the first number will be higher next time the metric is recomputed.
4 Jan 2007 (updated 4 Jan 2007 at 23:07 UTC) »
apenwarr, re: scheduling

Thanks for your entry on Schedulator a while back. At work, we have been struggling with scheduling. I'd suggested making more use of bugzilla to track issues and tasks instead of just bugs. Then we had some management changes. Finally, we saw your entry. Looks like it was the tipping point. Your code is useless to us (for starters, we use Linux, Bugzilla, MediaWiki, and Python rather than Windows, FogBugz, GratefulTavi, and C#), but the documentation is wonderful.

We're looking at using bugzilla's existing time tracking features and adding better reporting. Your slip factor, automatic due dates, and load factors will feature in prominently, as well as Joel's original vs. current estimates. (Our existing schedules have had a certain "we've always been at war with Eurasia" feel, so there's no way to review historical slip factors.) We have a lot of other balls bouncing, but hopefully we'll be sending some patches back upstream eventually.

1 Jan 2007 (updated 1 Jan 2007 at 21:10 UTC) »

The spammers are getting more subtle. Case in point: 733tguru. Legitimate[*] post about mysql/ threading performance on Linux vs. FreeBSD, a few spammy links stuck in the middle.

[*] - though stolen, badly formatted, and rather outdated; it refers to FreeBSD 5.x as under development. I'm running 6.1-RELEASE.

lkcl, SQLite

when you get up to 30mbyte files, such as those used by embedded.org, the access time will drive you nuts.

People use SQLite databases much larger than that without problem, so your blanket statement is false - there's more to it than that. You said this is a monotone database? Maybe it's not normalized well, it isn't creating the right indexes, its queries are ill-considered, or it's doing unusual things like creating huge blobs in the database. (The last seems likely, since it's a version control system. I don't know if anyone has bothered to optimize this case in SQLite.) Has anyone benchmarked an equivalent PostgreSQL database for comparison?

Admittedly, there are ways SQLite could be causing your performance problem. I don't think the query planner is too sophisticated. And worse, SQLite doesn't support any sort of concurrency, though there's talk on their wiki of implementing it.

p.s. salmoni: if you get a crash in the middle of a write to a flat-file, then, just like ms-access, your data is corrupted. don't use it.

I assert otherwise, with the exception that your application corrupting SQLite's data structures can of course make anything happen. It sounds like you've encountered a problem with a pretty complex application that happens to use SQLite. Link to your bug report with a small demo application and I'll believe you when you say that SQLite is at fault.

27 Nov 2006 (updated 27 Nov 2006 at 22:56 UTC) »
salmoni, SQLite

SQLite's reliability is quite good. If you set pragma synchronous = full (the default in SQLite 3.0) and pragma fullfsync = 1 (for OS X, where they've decided to make fsync(2) dangerously worthless and nonstandard), SQLite is an ACID database. All bets are off if you corrupt its memory, which is possible since it runs in-process. Otherwise, it's the real deal.

48 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!