Good news! I found a flaw in the gcc code generation on the ARM port. There was a subtle bug significant interaction between the instruction and data cache references and the on chip buffers of the SDRAM. The ARM code has now sped up to the point that compiling code on the Netwinders is faster than on my K6-350.
