And here it is, your Moment of Zen:
Those who assume the worst in other people, are by their very nature among the worst people.
Those who assume the best in other people, are by their very nature among the best people.
-- Jeffrey Stedfast
5 Mar 2007 (updated 6 Mar 2007 at 04:56 UTC) »
For historical reasons, a Binary Insertion Sort was probably also used to eliminate recursive function call overhead as well as stack overflows.
Note that while the binary search portion of the Binary Insertion Sort algorithm is recursive, since it is tail recursive, it can be implemented in such a way as to even avoid having to use an internal stack.
See the last implementation discussed here for an example on how to do that.
Thanks for the link too, btw, I will be sure to check that out later when I get home from work!
4 Mar 2007 (updated 4 Mar 2007 at 22:26 UTC) »
haruspex: Hence my comment above that line,
If your program makes heavy use of a sorting routine, you may want to consider implementing a tailored sort function rather than just using qsort().
By no means do I suggest writing your own low-level routines for no reason... but if your program is spending a lot of time sorting, then you might consider writing a tailored sort routine. Obviously this wouldn't make sense if sorting is barely even on the radar as far as time spent in your program.
Hopefully this clarifies my position and the intent of my wording (which I admit may have been unclear).
Note, also, that all of my entries in my blog series on sorting start off with an unoptimized version of the sort routine, implementing them as closely to the description of the algorithm as I can. From there, in each entry, I went on to describe my mode of thinking on how to achieve better performance (largely as an educational challenge to myself) - e.g. I never suggest pre-optimizing.
Update: more on sorting here.
4 Mar 2007 (updated 4 Mar 2007 at 15:08 UTC) »
Like Merge Sort, Quick Sort promises O(n lg n) time in the average case, and, like Merge Sort, is also a recursive algorithm. However, it is important to be aware that Quick Sort is O(n * n) worst case, where, ironically, the input is already mostly sorted in either direction.
The way Quick Sort works is that it first chooses a pivot value with which to partition the working sequence into two segments. Next, it puts all items with values smaller than the pivot value into the first segment and all items with values greater than the pivot value into the second segment. Recursively repeat the process on each segment.
Lets put this into code (keeping our same call signature from previous sorting posts):
static int
QuickSortPartiton (int a[], int low, int high)
{
int pivot, lo, hi;
int tmp;
hi = high - 1;
lo = low;
pivot = a[lo];
while (1) {
while (lo < high && a[lo] < pivot)
lo++;
while (hi > low && a[hi] >= pivot)
hi--;
if (lo < hi) {
/* swap */
tmp = a[lo];
a[lo] = a[hi];
a[hi] = tmp;
hi--;
} else {
return hi + 1;
}
}
}
static void
QuickSortRecurse (int a[], int low, int high)
{
int mid;
mid = QuickSortPartition (a, low, high);
if (mid < high) {
QuickSortRecurse (a, low, mid);
QuickSortRecurse (a, mid, high);
}
}
void
QuickSort (int a[], int n)
{
if (n < 2)
return;
QuickSortRecurse (a, 0, n);
}
Before we go any further, lets see how this compares to our Merge Sort implementation from earlier.
Since I no longer have the same machine that I timed Merge Sort with, I had to re-time it on my laptop (which is where I've been doing this since I can hack while laying back on my comfortable couch). My laptop is an IBM T40 ThinkPad which sports an Intel Pentium M Processor rated at 1600 MHz.
Since I left off sorting 10,000,000 items in the Merge Sort post, I figured I'd use the same array size here.
For Merge Sort, I'm consistently getting around 6.96s. For our Quick Sort implementation, I'm getting about 5.40s.
Not bad... but lets see if we can speed it up a bit more. The easiest change to make is to combine QuickSortPartition() into QuickSortRecurse() in order to try and reduce extra function-call overhead. The resulting code is as follows:
static void
QuickSortRecurse (int a[], int low, int high)
{
int pivot, lo, hi;
int tmp;
hi = high - 1;
lo = low;
pivot = a[lo];
while (1) {
while (lo < high && a[lo] < pivot)
lo++;
while (hi > low && a[hi] >= pivot)
hi--;
if (lo < hi) {
/* swap */
tmp = a[lo];
a[lo] = a[hi];
a[hi] = tmp;
hi--;
} else {
hi++;
break;
}
}
if (hi < high) {
QuickSortRecurse (a, low, hi);
QuickSortRecurse (a, hi, high);
}
}
void
QuickSort (int a[], int n)
{
if (n < 2)
return;
QuickSortRecurse (a, 0, n);
}
This new version gets times closer to 5.20s on average.
It would be nice if we could get rid of the function call overhead of recursion like we did with Binary Insertion Sort, as we've seen that getting rid of the need to call out to QuickSortPartition() improved the performance a bit (not much, but every little bit counts, right?).
We're going to have to get crafty in order to work around the need to make our function recursive. What we can do is keep our own stack, but growing it dynamically could potentially kill any performance gains we could hope for... so, lets see what Donald Knuth has to say about the mathematical properties of this sort algorithm.
An auxiliary stack with at most [lg N] entries is needed for temporary storage.
As it happens, log base 2 of the max unsigned value of any integer type is the number of bits in said integer type. This means that on my system, where integers are 32bit, I'll need a stack size of 32.
What do we store on our stack? Well, what variables do we need for QuickSortRecurse()? We need a[], high, and low. If we're going to make QuickSort() non-recursive, then that means we'll always have access to a[] which means the only variables we need to hold on our stack are high and low.
So here it is, your Moment of Zen:
typedef struct qstack {
size_t lo;
size_t hi;
} qstack_t;
void
QuickSort (int a[], size_t n)
{
qstack_t stack[32], *sp;
register size_t lo, hi;
size_t low, high;
int pivot, tmp;
if (n < 2)
return;
/* push our initial values onto the stack */
sp = stack;
sp->lo = 0;
sp->hi = n;
sp++;
while (sp > stack) {
/* pop lo and hi off the stack */
sp--;
high = sp->hi;
low = sp->lo;
hi = high - 1;
lo = low;
pivot = a[lo];
while (1) {
while (lo < high && a[lo] < pivot)
lo++;
while (hi > low && a[hi] >= pivot)
hi--;
if (lo < hi) {
/* swap */
tmp = a[lo];
a[lo] = a[hi];
a[hi] = tmp;
hi--;
} else {
hi++;
if (hi == high) {
/* done with this segment */
break;
}
/* push the larger segment onto the
* stack and continue sorting the
* smaller segment. */
if ((hi - low) > (high - hi)) {
sp->lo = low;
sp->hi = hi;
sp++;
hi = high;
low = lo;
} else {
sp->hi = high;
sp->lo = hi;
sp++;
high = hi;
lo = low;
}
pivot = a[lo];
hi--;
}
}
}
}
Once again, we see a slight improvement in execution time: 4.59s, just a hair under 5.00s, down from ~5.20s.
Just a bit faster, eh?
Out of my own interest, I've been reimplementing all of these sorting algorithms to target glibc's qsort() API, so as to make any one of them a drop-in replacement (not that it's likely I'd do that, since until now, it's unlikely any of them could have even approached the speed of qsort()... ).
Without further ado, here it is, another Moment of Zen:
static void
memswap (void *a, void *b, size_t n)
{
register unsigned int *ai, *bi;
unsigned char *ac, *bc, *an;
register unsigned int i;
unsigned char c;
an = (unsigned char *) a + n;
ai = (unsigned int *) a;
bi = (unsigned int *) b;
while (((unsigned char *) (ai + 1)) <= an) {
i = *ai;
*ai++ = *bi;
*bi++ = i;
}
ac = (unsigned char *) ai;
bc = (unsigned char *) bi;
while (ac < an) {
c = *ac;
*ac++ = *bc;
*bc++ = c;
}
}
typedef struct qstack {
unsigned char *lo;
unsigned char *hi;
} qstack_t;
void
QuickSort (void *base, size_t nmemb, size_t size,
int (* compare) (const void *, const void *))
{
register unsigned char *lo, *hi, *pivot;
qstack_t stack[32], *sp = stack;
unsigned char *high, *low;
if (nmemb < 2)
return;
/* push our initial values onto the stack */
sp->lo = (unsigned char *) base;
sp->hi = sp->lo + (nmemb * size);
sp++;
while (sp > stack) {
/* pop lo and hi off the stack */
sp--;
high = sp->hi;
low = sp->lo;
hi = high - size;
lo = low;
pivot = lo;
while (1) {
while (lo < high && compare (lo,
pivot) < 0)
lo += size;
while (hi > low && compare (hi,
pivot) >= 0)
hi -= size;
if (lo < hi) {
/* swap */
memswap (lo, hi, size);
if (lo == pivot)
pivot = hi;
else if (hi == pivot)
pivot = lo;
hi -= size;
} else {
hi += size;
if (hi == high) {
/* done with this segment */
break;
}
/* push the larger segment onto the
* stack and continue sorting the
* smaller segment. */
if ((hi - low) > (high - hi)) {
sp->lo = low;
sp->hi = hi;
sp++;
hi = high;
low = lo;
} else {
sp->hi = high;
sp->lo = hi;
sp++;
high = hi;
lo = low;
}
pivot = lo;
hi -= size;
}
}
}
}
It turns out that my implementation is neck-and-neck with glibc's qsort() compiled at -O0, but if I compile with -Os, my implementation consistently takes the lead:
QuickSort() takes on average 12.82s while qsort() takes about 12.98s. However, the integer tailored version of our QuickSort() only takes 6.46s on average.
If your program makes heavy use of a sorting routine, you may want to consider implementing a tailored sort function rather than just using qsort().
To prove the point further, if we restrict the input array to be made up of pointers (rather than arbitrarily sized elements), we can replace the memswap() routine with a simple pointer swap. This simple change shaves a whole second off the execution time!
Not only can you tailor your sort implementation to be able to do faster swaps/comparisons, but you can also tailor it to be a "stable sort" (as I have done in these implementations), which is something that the libc qsort() does not guarantee.
As Michael Abrash mentions in one of his books, there are a number of fallacies that programmers use to justify not hand-writing some of their low-level routines, one of which is the assumption that the C library is written in optimized assembly and therefore it's unlikely that you could possibly write a replacement routine faster yourself.
As I've just proven, this is not the case. And I haven't even bothered to re-implement my Quick Sort routine in highly optimized assembly yet!
4 Mar 2007 (updated 4 Mar 2007 at 14:50 UTC) »
Here it is:
;--------------------------------------------------------------- ; Sorts an array of ints. C-callable (small model). 25 bytes. ; void sort (int n, int a[]); ; ; Courtesy of David Stafford. ;---------------------------------------------------------------.model small .code public _sort
top: mov dx,[bx] ; swap two adjacent ints xchg dx,[bx+2] xchg dx,[bx]
cmp dx,[bx] ; in the right order? jl top ; no, swap them back
inc bx ; go to the next integer inc bx loop top
_sort: pop dx ; get return address (entry point) pop cx ; get count pop bx ; get pointer push bx ; restore pointer dec cx ; decrement count push cx ; save count push dx ; save return address jg top ; if cx > 0
ret
end
fzort: Ah, thank you for the explanation :)
I was just about to note that expanding on my previous estimation, a slightly better one would be:
r = (x >> 1) - (x >> 2) - (x >> 3) + (x >> 6)
And then of course I could perhaps expand on the (x >> 6) in a similar fashion to get an even more accurate result.
Anyways... kind of an interesting problem.
1 Feb 2007 (updated 1 Feb 2007 at 10:50 UTC) »
Interesting interview questions:
Q: What's a fast way to divide an integer by 7
using the bit shift operator? (apparently asked by an EA Sports interviewer)
A: I thought about this for a bit and came up
with the following estimation:
r = (x >> 2) - (x >> 3) + (x >> 6);
I mostly mention this because I had my own interview today where I felt... well, less than adequate. Suffice it to say, my ability to figure out the Big-O notation for algorithms was less than stellar. I was also unable to come up with a solution for his webcrawler scenario, which was along the lines of "if you've got some huge number of pages to crawl, how could you prevent the crawler from scanning the same page multiple times?" to which I had to admit to him, I hadn't the slightest idea how to go about it (the prelude to this question had been "the simplest way to do such a thing if memory was not an issue" to which I replied I'd use a hash table, using the page urls as the key).
Been listening to Drivin' Too Fast lately, really has a nice groove to it.
Amazing what a break from looking at code can do to help you fix a bug.
Back in 2004, I had been writing articles explaining different sorting algorithms, how they worked, and how to implement them in C (and in most cases, how to implement them more efficiently than the "standard textbook way"). Well, I had gotten distracted around the time I had been working on an article for Quick Sort and never got around to finishing it. I remember having a bug in my QuickSort routine somewhere that I wasn't able to find after a few nights of looking at it and having had more pressing things to attend to, set it aside for later (but not before documenting a few test cases that failed and how they failed).
Well, yesterday, after coming across that code, I opened it up in my trusty Emacs editor and in just a few minutes had a solution... it was basically a simple one-off bug.
Not long after, I got a call from someone at Google who had seen my resume and repeatedly told me they found it "interesting". I have no idea what that means, exactly, but I take it that it's a Good Thing(tm) :)
23 Jan 2007 (updated 24 Jan 2007 at 15:16 UTC) »
Thanks to Ross's blog for informing me about Dave Cridland's Push-IMAP projects (Polymer, Telomer) and the Lemonaide specs. This was quite an interesting read... I've been wanting something like this for years, it's really exciting stuff.
Despite what pvanhoof claims in his own response to this news, offline functionality is not the hardest part of implementing an IMAP client.
FOAF updates: Trust rankings are now exported, making the data available to other users and websites. An external FOAF URI has been added, allowing users to link to an additional FOAF file.
Keep up with the latest Advogato features by reading the Advogato status blog.
If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!