12 Nov 2012
(updated 30 Jan 2013 at 13:44 UTC) »
Gold linker + CentOS5 NFS client + Solaris 10 NFSd = ballache
I've spent a day and a half being completely bewildered by a weird NFS bug where ELF binaries (but not other files) written to an NFS mount show up on remote hosts with the correct file size but consisting entirely of nul zero bytes, but only when written from CentOS5 hosts, not from Solaris, Fedora or RHEL6 hosts.
I eventually narrowed it down to the Gold linker, which writes files using
mmap, and the CentOS5 2.6.18 kernel has a bug when writing files with
mmap to NFS mounts.
There was a very similar RHEL4 bug that should be fixed in my kernel, but for some reason the
kernel-2.6.18-redhat.patch file in the SRPM comments out the fix. I don't know why.
Maybe this post will show up for anyone else searching for the symptoms, because I didn't have much luck searching the web for it.
My solution is to avoid Gold on CentOS5 (since we can't easily stop using NFS, unfortunately) but I wish I could get that day of my life back.
distcc FAQ (search for Files written to NFS filesystems are corrupt) mentions this problem and refers to a post to the distcc list and a post to the linux-nfs list where a workaround using the
no_subtree_check option for nfsd is given, but that assumes the NFS server is linux, and mine isn't