Been, amoungst other stuff, optimising my DES implemenation today. The key setup time has tripled at the moment (though I have an idea how to get that down), but the number of cycles taken to process a 64 bit block is down to 12% of the original. This was achived by a mixture of static preprocessing, preprocessing the key setup, removing most uint64_t values and using two uint32_t values where possible (my target plaform is the ARM, so 32 bit operations are a Good Thing(tm)), and applying some human optimisation to the bit permutations. So far this has been just using C - I've not needed to use any asm - I've managed to express everything I want so far in C.
Next a little more optimisation of the C version, before I try and apply hardware to some parts of the algorithm (mainly the bit swizling). I have two hardware versions in mind - one a plain partial hardware/software implementation, and one which uses partial evalution to put the key into the round calculations.
Managed to half the key setup time. I've completely removed 64 bit values except as the inputs to the algorithm. By a BOE calculation, it can now process about 0.8Mbytes/sec on a 1GHz XScale/StrongARM, but that's not including cache-misses. Not particularly impressive.
Had to reboot a keyboard today. I think the microcontroller inside had got confused, and the PC it was attached to refused to acknowledge it until I powered off the machine (rebooting it didn't help).
Installed GNAT (the gnu Ada compiler) today - scary. I got flashbacks to when I was being taught Real Time Systems in Ada.
Learned how to use a diablo.
Oh, and I forgot to say congrats to adamd who submitted his PhD dissertation last week. He now seems to have a perpetual smile on his face :-)