11 May 2004 slamb   » (Journeyer)

I wrote more sigsafe assembly and released 0.1.3 today. (Nine supported systems now. Only six more to go before I run out of machines I have access to.) I'm having fun pretending to be a low-level coder. Some musings...

Delay slots

sparc and PA-RISC have delay slots. They're a consequence of pipelining: the processors is fetching the next instruction while executing the previous one. When jumping, some processors throw this instruction out, since the jump changes its idea of the next instruction. Sparc doesn't. The next position after the jump is called a delay slot. The instruction is already in the pipeline, so it's just executed anyway. For unconditional jumps (or calls or returns), no real problem. For conditional ones, you've got this weird instruction beyond the jump that's executed in both branches.

There seem to be three approaches to dealing with this:

  • use a NOP - easy but inefficient. You're doing nothing useful in either case. The processor might as well have dumped the pipeline.
  • move an instruction from earlier to immediately after the branch. This is the most efficient approach, but it only works if that instruction wasn't necessary to decide whether the branch should be taken.
  • do something useful to one branch (preferably the common one) that isn't harmful to the other. (Like moving something into a register one needs and the other doesn't care about. Or using a trap-if-equal after a branch-if-not-equal: the instruction is a NOP in one case but not the other.)

In my code, I have a mix of all three. I'm perversely proud of the one where I have a delay slot that I also sometimes jump into. The assembler thought that was a mistake; I wrote in a .empty directive to tell it that I know what I'm doing and have analyzed all the different ways to arrive at that instruction. I felt smart when it actually worked.

It seems like delay slots are most efficient when both branches are equally likely. If one is much more likely, you can more often do something useful if you know that instruction will only happen in that branch. I think they also don't help when you've got a ridiculous 20-stage pipeline (Pentium IV). Clearly having a 20-instruction delay slot would end in tears.

sparc also has this weird "BN" - branch never - instruction. It's a NOP with a delay slot, I guess. I don't see the point; they might as well have a "COME FROM" instruction.

I guess every instruction set has its oddities. In PPC code, you can enforce ordering by sprinking "EIEIO"s throughout the code. (Old McDonald had a LWZ...EIEIO.) They've got some ridiculous reverse-engineered acronym expansion for it, like "Enforce In-Order Execution of IO."

ia64

Writing my ia64 code was a mind-bending experience. In the end, it wasn't as different from the other platforms as I'd hoped, though. My code is probably less than perfect, but I think ia64's instruction set just isn't optimized for this sort of thing. It's got some weird things I couldn't really take advantage of:

  • they have a lot of registers. 128 general registers (64-bits each, plus a NaT bit I'll describe below). Nice if you need them, but my code doesn't.
  • they have two (in the current system) sets of three different instruction units. The instructions are written in bundles of three (for different types of execution units) and then broader groups. All the instructions in the group have to be independent of each other. If you've got several things going on, you can keep all the execution units busy. They've got examples in the documentation where they're executing several independent iterations of a loop at once. But if you've just got one thing to do which depends on earlier steps, it doesn't help. I end up feeling guilty about groups that do nothing but load a single value from memory, even though I think it's unavoidable.
  • They have "speculative loads" to avoid making the processor wait for the (comparitively slow) memory system. You tell it "I might need this value in a few cycles" with a ld.s and it attempts to load it. If it is paged out or the machine's not in a good mood, it won't be loaded. Instead, it will set a NaT (not a thing) value associated with that word. Later, you do a chk.s to say "I really need that value now" and it will jump to "recovery code" you specify if the NaT value is set. (Typically, the recovery code does a non-speculative load and jumps back.) But I think at most I could do a load one cycle early with this, at the expense of code size/complexity. Pass.
  • They also have predicated instructions - every instruction is associated with a predicate bit, which says if it should actually be executed or not. So you can follow two different paths in the code without jumping. I'm at least taking advantage of that, sort of.

There are some other weird things, too. It seems like ia64 is optimized for doing heavy-duty computation directly within loops. They can't be in a separate function, or you can't take advantage of the EPIC features well. (Unless it gets inlined, of course.) And you either need a really smart person hand-writing the assembly or a very smart optimizing compiler. (And gcc seems to not qualify.) So it seems like those of us writing mundane code with every-day compilers get left behind. Also, stuff like the optimal number of simultaneous loop iterations is defined by the memory latency and number of execution units, so you would need multiple compiles for different Itanium machines. I could see a really smart JIT having a field day, but otherwise optimal code will never happen. Maybe this is why people call it the Itanic. Lots of complex features no one will ever really use fully.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!