Older blog entries for randombit (starting at number 22)

Hey Kid, Need a Crypto Card?

I am currently in possession of a large number of things I really don't need to have around, including, because I'm that kind of weirdo, a couple of PCI crypto cards - an AEP2000 (donated to me by AEP so I could write drivers for botan for it) and a Hifn 7811 (an ebay impulse buy).

The AEP2000 card is basically a modular exponentiator engine - the 2000 in the name refers to the number 1024-bit private key RSA operations it can perform per second (so, 4000 512-bit exponentations per second (they were counting CRT optimizations when they made the model numbers)), and you can use moduli up to 2048 bits. In testing I found that it could indeed reach 2000 ops per second in practice. There are open source Linux drivers available for this card, but nobody has ever updated them for anything past a 2.4 kernel, and it doesn't seem like they are SMP safe either. Since I don't have the time (or inclination) to update and fix them, I would rather give the card away to an open source developer who can make use of it somehow.

The Hifn 7811 offers symmetric encryption (DES, RC4, possibly AES?), MD5 and SHA-1 hashing, and a hardware PRNG. It is similiar, but not identical, to the Soekris vpn1401. There are drivers for this card included in the Linux kernel, but only 32 bit kernels are supported (I asked Evgeniy Polyakov, the author of the driver about this, and he indicated it is a hardware limitation). The only 32-bit machines I have left are laptops and netbooks, which obviously can't really take a PCI card, leaving me with a card with no place to go.

If you would like to play with either (or both) of these cards, drop me an email and let me know. I would likely give preference to someone who will be using them to support an open source project, but feel free to contact me even if this is not the case; mostly I'd like to give them a good home where they will be doing something useful.

Syndicated 2010-01-20 02:18:26 from Jack Lloyd

Reality and Politics Do Not Mix

For obvious reasons, politicians and other policy makers generally avoid discussing what ought to be considered an "acceptable" number of traffic deaths, or murders, or suicides, let alone what constitutes an acceptable level of terrorism. Even alluding to such concepts would require treating voters as adults-something which at present seems to be considered little short of political suicide.
- from Undressing the Terror Threat, Paul Campos

Syndicated 2010-01-18 17:23:31 from Jack Lloyd

Using std::async for easy parallel computations

C++0x, the next major revision of C++, includes a number of new language and library facilities that I am greatly looking forward to, including a standard thread interface. Initially the agenda for C++0x had included facilities built on threads, such as a thread pool, but as part of the so-called 'Kona compromise' (named after the location of the committee meeting where the compromise was made) all but the most basic facilities were deferred for a later revision.

However there were many requests for a simple facility for creating an asynchronous function call, and a function for this, named std::async, was voted in at the last meeting. std::async is a rather blunt tool; it spawns a new thread (though wording is included which would allow an implementation to spawn threads in a fixed-size thread pool to reduce thread creation overhead and reduce hardware oversubscription) and returns a "future" representing the return value of the function. A future is a placeholder for a value which can be passed around the program, and if and when the value is actually needed, it can be retrieved from the future; the get operation which might block if the value has not yet been computed. In C++0x the future/promise system is primarily intended for use with threads, but there doesn't seem to be any reason a system for distributed RPC (ala E's Pluribus protocol) could not provide an interface using the same classes.

An operation which felt like easy low-hanging fruit for parallel invocation is RSA's decrypt/sign operation. Mathematically, when one signs a message using RSA, the message representation (usually a hash function output plus some specialized padding) is converted to an integer, and then raised to the power of d, the RSA private key, modulo another number. Both of these numbers are relatively large, typically 300 to 600 digits long. A well known trick which takes advantage of the underlying structure allows one to instead compute two modular exponentiations, both using numbers about half the size of d, and combine them using the Chinese Remainder Theorem (thus this optimization is often called RSA-CRT). The two computations are both still quite intensive, and since they are independent it seemed reasonable to try computing them in parallel. Running one of the two exponentiations in a different thread showed an immediate doubling in speed for RSA signing on a multicore! Other mathematically intensive algorithms that offer some amount of parallel computation, including DSA and ElGamal, also showed nice improvements.

As std::async is not included in GCC 4.5, I wrote a simple clone of it. This version does not offer thread pooling or the option of telling the runtime to run the function on the same thread; it is mostly a 'proof of concept' version I'm using until GCC includes the real deal in libstdc++. Here is the code:

#include <future>
#include <thread>

template<typename F>
auto std_async(F f) -> std::unique_future<decltype(f())>
   typedef decltype(f()) result_type;
   std::packaged_task<result_type ()> task(std::move(f));
   std::unique_future<result_type> future = task.get_future();
   std::thread thread(std::move(task));
   return future;

The highly curious auto return type of std_async uses C++0x's new function declaration syntax; ordinarily there is no reason to use it but here we want to specify that the function returns a unique_future paramaterized by whatever it is that f returns. Since f can't be referred to until it has been mentioned as the name of an argument, the return value has to come after the parameter list.

Unlike the version of std::async that was finally voted in, std_async assumes its argument takes no arguments (one of the original proposals for std::async used a similar interface). This would be highly inconvenient except for the assistance of C++0x's lambdas, which allow us to pack everything together. For instance here is the code for RSA signing, which packages up one half of the computation in a 0-ary lambda function:

   auto future_j1 = std_async([&]() { return powermod_d1_p(i); });
   BigInt j2 = powermod_d2_q(i);
   BigInt j1 = future_j1.get();
   // Now combine j1 and j2 using CRT

Using C++0x's std::bind instead of a lambda here should work as well, but I ran into problem with that in the 4.5 snapshot I'm using; the current implementation follows the TR1 style of requiring result_type typedefs which will not be necessary in C++0x thanks to decltype. Since the actual std::async can take an arbitrary number of arguments, the declaration of future_j1 will eventually change to simply:

   auto future_j1 = std::async(powermod_d1_p, i);

The implementation of std_async may strike you as excessively C++0x-ish, for instance by using decltype instead of TR1's result_of metaprogramming function. Part of this is due to current limitations of GCC and/or libstdc++; the version of result_of in 4.5's libstdc++ does not understand lambda functions (C++0x's result_of is guaranteed to get this right, because it itself uses decltype, but apparently libstdc++ hasn't changed to use this yet).

Overall I'm pretty happy with C++0x as an evolution of C++98 for systems programming tasks. Though I am certainly interested to see how Thompson and Pike's

Go works out; now that BitC is more or less dead after the departure of its designers to Microsoft, Go seems to be the only game in town in terms of new systems programming languages.

Syndicated 2009-11-24 15:09:30 from Jack Lloyd

I recently packaged botan for Windows using InnoSetup, an open source installation creator. Overall I was pretty pleased with it - it seems to do everything I need it to do without much of a hassle, and I'll probably use it in the future if I need to package other programs or tools for Windows.

After I got the basic package working, a nit I wanted to deal with was converting the line endings of all the header files and plain-text documentation (readme, license file, etc) to use Windows line endings. While many Windows programs, including Wordpad and Visual Studio, can deal with files with Unix line endings, not all do, and it seemed like it would be a nice touch if the files were not completely unreadable if opened in Notepad.

There is no built in support for this, but InnoSetup includes a scripting facility (using Pascal!), including hooks that can be called at various points in the installation process, including immediately after a file is installed, which handles this sort of problem perfectly. So all that was required was to learn enough Pascal to write the function. I've included it below to help anyone who might be searching for a similar facility:

   LF = #10;
   CR = #13;
   CRLF = CR + LF;

procedure ConvertLineEndings();
     FilePath : String;
     FileContents : String;
   FilePath := ExpandConstant(CurrentFileName)
   LoadStringFromFile(FilePath, FileContents);
   StringChangeEx(FileContents, LF, CRLF, False);
   SaveStringToFile(FilePath, FileContents, False);

Adding the hook with AfterInstall: ConvertLineEndings caused this function to run on each of my text and include files.

Syndicated 2009-11-23 23:51:23 from Jack Lloyd

SSE2 Serpent on Atom N270: twice as fast as AES-128

On the Intel Atom N270 processor, OpenSSL 0.9.8g's implementation of AES-128 runs at 25 MiB per second (CBC mode, using openssl speed). In contrast, the Serpent implementation using SSE2 I described last month runs at over 60 MiB per second in ECB mode (2.4x faster) and 48 MiB per second in CTR mode (1.9x faster).

Syndicated 2009-10-21 06:11:40 from Jack Lloyd

Programming trivia: 4x4 integer matrix transpose in SSE2

The Intel SSE2 intrinsics has a macro _MM_TRANSPOSE4_PS which performs a matrix transposition on a 4x4 array represented by elements in 4 SSE registers. However, it doesn't work with integer registers because Intel intrinsics make a distinction between integer and floating point SSE registers. Theoretically one could cast and use the floating point operations, but it seems quite plausible that this will not round trip properly; for instance if one of your integer values happens to have the same value as a 32-bit IEEE denormal.

However it is easy to do with the punpckldq, punpckhdq, punpcklqdq, and punpckhqdq instructions; code and diagrams ahoy.

continued »

Syndicated 2009-10-08 21:53:12 from Jack Lloyd

The Case For Skein

After the initial set of attacks on MD5 and SHA-1, NIST organized a series of conferences on hash function design. I was lucky enough to be able to attend the first one, and had a great time. This was the place where the suggestion of a competition in the style of the AES process to replace SHA-1 and SHA-2 was first proposed (to wide approval). This has resulted in over 60 submissions to the SHA-3 contest, of which 14 have been brought into the second round.

Of the second round contenders, I think Skein is the best choice for becoming SHA-3, and want to explain why I think so.

continued »

Syndicated 2009-10-08 19:22:12 from Jack Lloyd

Speeding up Serpent: SIMD Edition

The Serpent block cipher was one of the 5 finalists in the AES competition, and is widely thought to be the most secure of them due to its conservative design. It was also considered the slowest candidate, which is one major reason it did not win the AES contest. However, it turns out that on modern machines one can use SIMD operations to implement Serpent at speeds quite close to AES.

continued »

Syndicated 2009-09-09 19:02:08 from Jack Lloyd

Inverting Mersenne Twister's final transform

The Mersenne twister RNG 'tempers' its output using an invertible transformation:

unsigned int temper(unsigned int x)
   x ^= (x >> 11);
   x ^= (x << 7) & 0x9D2C5680;
   x ^= (x << 15) & 0xEFC60000;
   x ^= (x >> 18);
   return x;

The inversion function is:

unsigned int detemper(unsigned int x)
   x ^= (x >> 18);
   x ^= (x << 15) & 0xEFC60000;
   x ^= (x << 7) & 0x1680;
   x ^= (x << 7) & 0xC4000;
   x ^= (x << 7) & 0xD200000;
   x ^= (x << 7) & 0x90000000;
   x ^= (x >> 11) & 0xFFC00000;
   x ^= (x >> 11) & 0x3FF800;
   x ^= (x >> 11) & 0x7FF;

   return x;

This inversion has been confirmed correct with exhaustive search.

This is more a note to my future self than anything else; I'm cleaning out my ~/projects directory, and I can either publish this somewhere or check it into revision control (well, actually the contents of this blog are also in revision control, but no matter).

Syndicated 2009-07-21 20:40:01 from Jack Lloyd

Isn't Autoconf Supposed To Be, Well, Automatic?

Wherein I complain about things that annoy me.

While attempting to build Carrier:

$ ./configure
[many tests run, taking several minutes]
GtkSpell development headers not found.
Use --disable-gtkspell if you do not need it.
$ ./configure --disable-gtkspell
[many tests run, taking several minutes]
GStreamer development headers not found.
Use --disable-gstreamer if you do not need GStreamer (sound) support.

And iterate that six times. Would it be so hard to just carry on and inform the user that some things won't be built because the headers were not found? Without an explicit --enable-blah request, these should not be hard errors (well, IMO).

Syndicated 2009-06-22 19:04:00 from Jack Lloyd

13 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!