Everything you never wanted to know about file locking
(Foreshadowing: I found a bug in MacOS X 10.6's fcntl(F_SETLK) locking that
could cause corruption of sqlite databases. To see if your system has the
bug, compile and run locky.c.)
I've never previously had the (mis)fortune of writing a program that relied
on file locking. Well, I've used databases and gdbm and whatnot, and they
"use" file locking, but they've always hidden it behind an API, so I've
never had to actually lock my own files.
I heard file locking is terribly messy and arcane and varies wildly between
operating systems, even between different Unix systems; even between
different versions of the same system (like Linux). After some
experimentation, I can confirm: it really is that arcane, and it really is
that variable between systems, and it really is untrustworthy. I'm normally
a pessimist when it comes to computers, but Unix file locking APIs have
actually outdone even my own pessimism: they're worse than I ever imagined.
Other than simple lockfiles, which I won't go into (but which you might just
want to use from now on, after reading this article :)), there are three
Unix file locking APIs: flock(), fcntl(), and lockf().
flock() locking
flock() is the simplest sort of lock. According to my Linux man page, it
dates back to BSD. It is *not* standardized by POSIX, which means that some
Unix systems (probably SysV-related ones, I guess) don't support flock().
flock() locks an entire file at a time. It supports shared locks (LOCK_SH:
multiple people can have the file locked for read at the same time) and
exclusive locks (LOCK_EX: only one person can make an exclusive lock on the
file; shared and exclusive locks are not allowed to coexist). If you
learned about concurrency in textbooks, flock() locks are "reader/writer"
locks. A shared lock is a reader lock, and an exclusive lock is a writer
lock.
According to the Linux man page for flock(), flock() does not work over NFS.
Upgrading from a shared flock() lock to an exclusive one is racy. If you
own a shared lock, then trying to upgrade it to an exclusive lock, behind
the scenes, actually involves releasing your lock and acquiring a new one.
Thus, you can't be guaranteed that someone else hasn't acquired the
exclusive lock, written to the file, and unlocked it before your attempt at
upgrading the lock returns. Moreover, if you try to upgrade from shared to
exclusive in a non-blocking way (LOCK_NB), you might lose your shared lock
entirely.
Supposedly, flock() locks persist across fork() and (unlike fcntl locks, see
below) won't disappear if you close unrelated files. HOWEVER, you
can't depend on this, because some systems - notably earlier versions of
Linux - emulated flock() using fcntl(), which has totally different
semantics. If you fork() or if you open the same file more than once,
you should assume the results with flock() are undefined.
fcntl() locking
POSIX standardized the fcntl(F_SETLK) style of locks, so you should
theoretically find them on just about any Unix system. The Linux man page
claims that they've been around since 4.3BSD and SVr4.
fcntl() locks are much more powerful than flock() locks. Like flock(),
there are both shared and exclusive locks. However, unlike flock(), each
lock has an associated byte range associated with it: different byte ranges
are completely independent. One process can have an exclusive lock on byte
7, and another process can have an exclusive lock on byte 8, and several
processes can have a shared lock on byte 9, and that's all okay.
Note that calling them "byte ranges" is a bit meaningless; they're really
just numbers. You can lock bytes past the end of the file, for example.
You could lock bytes 9-12 and that might mean "records 9-12" if you want,
even if records have variable length. The meaning of a fcntl() byte range
is up to whoever defines the file format you're locking.
As with all the locks discussed in this article, these byterange locks are
"advisory" - that is, you can read and write the file all day long even if
someone other than you has an exclusive lock. You're just not supposed to
do that. A properly written program will try to acquire the lock first, at
which time it will be "advised" by the kernel whether everything is good or
not.
The locks are advisory, which is why the byte ranges don't have to refer to
actual bytes. The person acquiring the lock can interpret it however you
want.
fcntl() locks are supposedly supported by NFSv3 and higher, but
different kernels do different random things with it. Some kernels just
lock the file locally, and don't notify the server. Some notify the server,
but do it wrong. So you probably can't trust it.
According to various pages on the web that I've seen, fcntl() locks don't
work on SMB (Windows file sharing) filesystems mounted on MacOS X. I don't
know if this is still true in the latest versions of MacOS; I also don't
know if it's true on Linux. Note that since flock() doesn't work on NFS,
and fcntl() doesn't work on SMB fs, that there is no locking method that
works reliably on all remote filesystems.
It doesn't seem to be explicitly stated anywhere, but it seems that fcntl()
shared locks can be upgraded atomically to fcntl() exclusive locks. That
is, if you have a shared lock and you try to upgrade it to an exclusive
lock, you can do that without first releasing your shared lock. If you
request a non-blocking upgrade and it fails, you still have your shared
lock.
(Trivia: sqlite3 uses fcntl() locks, but it never uses *shared* fcntl()
locks. Instead, it exclusively locks a random byte inside a byterange.
This is apparently because some versions of Windows don't understand shared
locks. As a bonus, it also doesn't have to care whether upgrading a lock
from shared to exclusive is atomic or not. Update 2010/12/13: specifically,
pre-NT versions of Windows had LockFile, but not LockFileEx.)
fcntl() locks have the handy feature of being able to tell you which pid
owns a lock, using F_GETLK. That's pretty cool - and potentially useful for
debugging - although it might be meaningless on NFS, where the pid could be
on another computer. I don't know what would happen in that case.
fcntl() locks have two very strange behaviours. The first is merely an
inconvenience: unlike nearly everything else about a process, fcntl() locks
are not shared across fork(). That means if you open a file, lock some byte
ranges, and fork(), the child process will still have the file, but it won't
own any locks. The parent will still own all the locks. This is weird, but
manageable, once you know about it. It also makes sense, in a perverse sort
of way: this makes sure that no two processes have an exclusive lock on the
same byterange of the same file. If you think about it, exclusively locking
a byte range, then doing fork(), would mean that *two* processes have the
same exclusive lock, so it's not all that exclusive any more, is it?
Maybe you don't care about these word games, but one advantage of this
absolute exclusivity guarantee is that fcntl() locks can detect deadlocks.
If process A has a lock on byte 5 and tries to lock byte 6, and process B
has a lock on byte 6 and tries to lock byte 5, the kernel can give you
EDEADLK, which is kind of cool. If it were possible for more than one
process to own the same exclusive locks, the algorithm for this would be
much harder, which is probably why flock() locks can't do it.
The second strange behaviour of fcntl() locks is this: the lock doesn't
belong to a file descriptor, it belongs to a (pid,inode) pair. Worse, if
you close *any* fd referring to that inode, it releases all your locks on
that inode. For example, let's say you open /etc/passwd and lock some
bytes, and then call getpwent() in libc, which usually opens /etc/passwd,
reads some stuff, and closes it again. That process of opening /etc/passwd
and closing it - which has nothing to do with your existing fd open on
/etc/passwd - will cause you to lose your locks!
That behaviour is certifiably insane, and there's no possible justification
for why it should work that way. But it's how it's always worked, and POSIX
standardized it, and now you're stuck with it.
An even worse example: let's say you have two sqlite databases, db1 and db2.
But let's say you're being mean, and you actually make db1 a hardlink to
db2, so they're actually the same inode. If you open both databases in
sqlite at the same time, then close the second one, all your open sqlite
locks on the first one will be lost! Oops! Except, actually, the sqlite
guys have already thought of this, and it does the right thing. But if
you're writing your own file locking routines, beware.
So anyway, beware of that insane behaviour. Also beware of flock(), which
on some systems is implemented as a call to fcntl(), and thus inherits the
same insane behaviour.
Bonus insanity feature: the struct you use to talk to fcntl() locks is
called 'struct flock', even though it has nothing to do with flock(). Ha
ha!
lockf() locking
lockf() is also standardized by POSIX. The Linux man page also mentions
SVr4, but it doesn't mention BSD, which presumably means that some versions
of BSD don't do lockf().
POSIX is also, apparently, unclear on whether lockf() locks are the same
thing as fcntl() locks or not. On Linux and MacOS, they are documented to
be the same thing. (In fact, lockf() is a libc call on Linux, not a system
call, so we can assume it makes the same system calls as fcntl().)
The API of lockf() is a little easier than fcntl(), because you don't have
to mess around with a struct. However, there is no way to query a lock to
find out who owns it.
Moreover, lockf() may not be supported by pre-POSIX BSD systems, it seems,
so this little bit of convenience also costs you in portability. I
recommend you avoid lockf().
Interaction between different lock types
...is basically undefined. Don't use multiple types of locks - flock(),
fcntl(), lockf() - on the same file.
The MacOS man pages for the three functions proudly proclaim that on MacOS
(and maybe on whatever BSD MacOS is derived from), the three types of locks
are handled by a unified locking implementation, so in fact, you *can* mix
and match different lock types on the same file. That's great, but on other
systems, they *aren't* unified, so doing so will just make your program fail
strangely on other systems. It's non-portable, and furthermore, there's no
reason to do it. So don't.
When you define a new file format that uses locking, be sure to document
exactly which kind of locking you mean: flock(), fcntl(), or lockf(). And
don't use lockf().
Mandatory locking
Stay far, far away, for total insanity lies in wait.
Seriously, don't do it. Advisory locks are the only thing that makes any
sense whatsoever. In any application. I mean it.
Need another reason? The docs say that mandatory locking in Linux is
"unreliable." In other words, they're not as mandatory as they're
documented to be. "Almost mandatory" locking? Look. Just stay away.
Still not convinced? Man, you really must like punishment. Look, imagine
someone is holding a mandatory lock on a file, so you try to read() from it
and get blocked. Then he releases his lock, and your read() finishes, but
some other guy reacquires the lock. You fiddle with your block,
modify it, and try to write() it back, but you get held up for a bit,
because the guy holding the lock isn't done yet. He does his own write() to
that section of the file, and releases his lock, so your write() promptly
resumes and overwrites what he just did.
What good does that do anyone? Come on. If you want locking to do you any
good whatsoever, you're just going to have to acquire and release your own
locks. Just do it. If you don't, you might as well not be using locks at
all, because your program will be prone to horrible race conditions and
they'll be extra hard to find, because mandatory locks will make it *mostly*
seem to work right. If there's one thing I've learned about programming,
it's that "mostly right" programs are *way* worse than "entirely wrong"
programs. You don't want to be mostly right. Don't use mandatory locks.
Bonus feature: file locking in python
python has a module called "fcntl" that actually includes - or rather, seems
to include - all three kinds of locks: flock(), fcntl(), and lockf(). If
you like, follow
along in the python source code to see how it works.
However, all is not as it seems. First of all, flock() doesn't exist on all
systems, apparently. If you're on a system without flock(), python will
still provide a fcntl.flock() function... by calling fcntl() for you. So
you have no idea if you're actually getting fcntl() locks or flock() locks.
Bad idea. Don't do it.
Next is fcntl.fcntl(). Although it pains me to say it, you can't use this
one either. That's because it takes a binary data structure as a parameter.
You have to create that data structure using struct.pack(), and parse it
using struct.unpack(). No problem, right? Wrong. How do you know what the
data structure looks like? The python fcntl module
docs outright lie to you here, by providing an example of how to build
the struct flock... but they just made assumptions about what it looks
like. Those assumptions are definitely wrong if your system has 64-bit file
offset support, which most of them do nowadays, so trying to use the example
will just give an EINVAL. Moreover, POSIX doesn't guarantee that struct
flock won't have other fields before/after the documented ones, or that the
fields will be in a particular order, so even without 64-bit file offsets,
your program is completely non-portable. And python doesn't offer any other
option for generating that struct flock, so the whole function is useless.
Don't do it. (You can still safely use fcntl.fcntl() for
non-locking-related features, like F_SETFD.)
The only one left is fcntl.lockf(). This is the one you should use. Yeah,
I know, up above I said you should avoid lockf(), because BSD systems might
not have it, right? Well yeah, but that's C lockf(), not python's
fcntl.lockf(). The python fcntl module documentation says of fcntl.lockf(),
"This is essentially a wrapper around the fcntl() locking calls." But
looking at the source, that's not quite true: in fact, it is *exactly* a
wrapper around the fcntl() locking calls. fcntl.lockf() doesn't call C
lockf() at all! It just fills in a struct flock and then calls fcntl()!
And that's exactly what you want. In short:
- in C, use fcntl(), but avoid lockf(), which is not necessarily the same thing.
- in python, use fcntl.lockf(), which is the same thing as fcntl() in C.
(Unfortunately, although calling fnctl.lockf() actually uses fcntl() locks,
there is no way to run F_GETLK, so you can't find out which pid owns the
lock.)
Bonus insanity feature: instead of using the C lockf() constants (F_LOCK,
F_TLOCK, F_ULOCK, F_TEST), fcntl.lockf() actually uses the C flock()
constants (LOCK_SH, LOCK_EX, LOCK_UN, LOCK_NB). There is no conceivable
reason for this; it literally just takes in the wrong contants, and converts
them to the right ones before calling fcntl().
So that means python gives you three locks in one! The constants from
flock(), the functionality of fcntl(), and the name lockf(). Thanks,
python, for making my programming world so simple and easy to unravel.
Epilogue
I learned all this while writing a program (in python, did you
guess?) that uses file locking to control concurrent access to files.
Naturally, I wanted to pick exactly the right kind of locks to solve my
problem. Using the logic above, I settled on fcntl() locks, which in my
python program means calling fcntl.lockf().
So far, so good. After several days of work - darn it, I really hate
concurrent programming - I got it all working perfectly. On Linux.
Then I tried to port my program to MacOS. It's python, so porting was
pretty easy - ie. I didn't change anything - but the tests failed. Huh?
Digging deeper, it seems that some subprocesses were acquiring a lock, and
sometime later, they just didn't own that lock anymore. I thought it might
be one of the well-known fcntl() weirdnesses - maybe I fork()ed, or maybe I
opened/closed the file without realizing it - but no. It only happens when
*other* processes are locking byteranges on the same file. It appears the
MacOS X (10.6.5 in my test) kernel is missing a mutex somewhere.
I wrote a minimal test case
and filed a bug with Apple. If you work at Apple, you can find my bug
report as number 8760769.
Dear Apple: please fix it. As far as I know, with this bug in place, any
program that uses fcntl() locks is prone to silent file corruption. That
includes anything using sqlite.
Super Short Summary
- don't use flock() (python: fcntl.flock()) because it's not in POSIX and it doesn't work over NFS.
- don't use lockf() (python: does not exist) because it's not in BSD, and probably just wraps fcntl().
- don't use fcntl() (python: fcntl.lockf()) because it doesn't work over SMB on MacOS, and actually, on MacOS it doesn't work right at all.
Oh, and by the way, none of this applies on win32.
Are we having fun yet? I guess lockfiles are the answer after all.
I bet you're really glad you read this all the way to the end.
Syndicated 2010-12-13 11:03:10 (Updated 2010-12-14 04:06:09) from apenwarr - Business is Programming