# Older blog entries for ssp (starting at number 18)

15 Oct 2012 (updated 19 Oct 2012 at 09:08 UTC) »

Big-O Misconceptions

In computer science and sometimes mathematics, big-O notation is used
to talk about how quickly a function grows while disregarding multiplicative and additive constants. When classifying algorithms, big-O notation is useful because it lets us abstract away the differences between real computers as just multiplicative and additive constants.

Big-O is not a difficult concept at all, but it seems to be common even for people who should know better to misunderstand some aspects of it. The following is a list of misconceptions that I have seen in the wild.

But first a definition: We write $$f(n) = O(g(n))$$ when $$f(n) \le M g(n)$$ for sufficiently large $$n$$, for some positive constant $$M$$.

Misconception 1: “The Equals Sign Means Equality”

The equals sign in $$f = O(g(n))$$ is a widespread travestry. If you take it at face value, you can deduce that since $$5 n$$ and $$3 n$$ are both equal to $$O(n)$$, then $$3 n$$ must be equal to $$5 n$$ and so $$3 = 5$$.

The expression $$f = O(g(n))$$ doesn’t type check. The left-hand-side is a function, the right-hand-side is a … what, exactly? There is no help to be found in the definition. It just says “we write” without concerning itself with the fact that what “we write” is total nonsense.

The way to interpret the right-hand side is as a set of functions: $$O(f) = \{ g \mid g(n) \le M f(n) \text{ for some $$M > 0$$ for large $$n$$}\}.$$ With this definition, the world makes sense again: If $$f(n) = 3 n$$ and $$g(n) = 5 n$$, then $$f \in O(n)$$ and $$g \in O(n)$$, but there is no equality involved so we can’t make bogus deductions like $$3=5$$. We can however make the correct observation that $$O(n) \subseteq O(n \log n)\subseteq O(n^2) \subseteq O(n^3)$$, something that would be difficult to express with the equals sign.

Misconception 2: “Informally, Big-O Means ‘Approximately Equal’”

If an algorithm takes $$5 n^2$$ seconds to complete, that algorithm is $$O(n^2)$$ because for the constant $$M=7$$ and sufficiently large $$n$$, $$5 n^2 \le 7 n^2$$. But an algorithm that runs in constant time, say 3 seconds, is also $$O(n^2)$$ because for sufficiently large $$n$$, $$3 \le n^2$$.

So informally, big-O means approximately less than or equal, not approximately equal.

If someone says “Topological Sort, like other sorting algorithms, is O(n log n)”, then that is technically correct, but severely misleading, because Toplogical Sort is also $$O(n)$$ which is a subset of $$O(n \log n)$$. Chances are whoever said it meant something false.

If someone says “In the worst case, any comparison based sorting algorithm must make $$O(n \log n)$$ comparisons” that is not a correct statement. Translated into English it becomes:

“In the worst case, any comparison based sorting algorithm must make fewer than or equal to $$M n \log (n)$$ comparisons”

which is not true: You can easily come up with a comparison based sorting algorithm that makes more comparisons in the worst case.

To be precise about these things we have other types of notation at our disposal. Informally:

 $$O()$$: Less than or equal, disregarding constants $$\Omega()$$: Greater than or equal, disregarding constants $$o()$$: Stricly less than, disregarding constants $$\Theta()$$: Equal to, disregarding constants

and some more. The correct statement about lower bounds is this: “In the worst case, any comparison based sorting algorithm must make $$\Omega(n \log n)$$ comparisons. In English that becomes:

“In the worst case, any comparison based sorting algorithm must make at least $$M n \log (n)$$ comparisons”

which is true. And a correct, non-misleading statement about Topological Sort is that it is $$\Theta(n)$$, because it has a lower bound of $$\Omega(n)$$ and an upper bound of $$O(n)$$.

Misconception 3: “Big-O is a Statement About Time”

Big-O is used for making statements about functions. The functions can measure time or space or cache misses or rabbits on an island or anything or nothing. Big-O notation doesn’t care.

In fact, when used for algorithms, big-O is almost never about time. It is about primitive operations.

When someone says that the time complexity of MergeSort is $$O(n \log n)$$, they usually mean that the number of comparisons that MergeSort makes is $$O(n \log n)$$. That in itself doesn’t tell us what the time complexity of any particular MergeSort might be because that would depend how much time it takes to make a comparison. In other words, the $$O(n \log n)$$ refers to comparisons as the primitive operation.

The important point here is that when big-O is applied to algorithms, there is always an underlying model of computation. The claim that the time complexity of MergeSort is $$O(n \log n)$$, is implicitly referencing an model of computation where a comparison takes constant time and everything else is free.

Which is fine as far as it goes. It lets us compare MergeSort to other comparison based sorts, such as QuickSort or ShellSort or BubbleSort, and in many real situations, comparing two sort keys really does take constant time.

However, it doesn’t allow us to compare MergeSort to RadixSort because RadixSort is not comparison based. It simply doesn’t ever make a comparison between two keys, so its time complexity in the comparison model is 0. The statement that RadixSort is $$O(n)$$ implicitly references a model in which the keys can be lexicographically picked apart in constant time. Which is also fine, because in many real situations, you actually can do that.

To compare RadixSort to MergeSort, we must first define a shared model of computation. If we are sorting strings that are $$k$$ bytes long, we might take “read a byte” as a primitive operation that takes constant time with everything else being free.

In this model, MergeSort makes $$O(n \log n)$$ string comparisons each of which makes $$O(k)$$ byte comparisons, so the time complexity is $$O(k n \log n)$$. One common implementation of RadixSort will make $$k$$ passes over the $$n$$ strings with each pass reading one byte, and so has time complexity $$O(n k)$$.

Misconception 4: Big-O Is About Worst Case

Big-O is often used to make statements about functions that measure the worst case behavior of an algorithm, but big-O notation doesn’t imply anything of the sort.

If someone is talking about the randomized QuickSort and says that it is $$O(n \log n)$$, they presumably mean that its expected running time is $$O(n \log n)$$. If they say that QuickSort is $$O(n^2)$$ they are probably
talking about its worst case complexity. Both statements can be considered true depending on what type of running time the functions involved are measuring.

Syndicated 2012-10-15 09:16:39 (Updated 2012-10-19 08:11:32) from Søren Sandmann Pedersen

26 Sep 2011 (updated 9 Oct 2011 at 21:07 UTC) »

Over is not Translucency

The Porter/Duff Over operator, also known as the “Normal” blend mode in Photoshop, computes the amount of light that is reflected when a pixel partially covers another:

The fraction of bg that is covered is denoted alpha. This operator is the correct one to use when the foreground image is an opaque mask that partially covers the background:

A photon that hits this image will be reflected back to your eyes by either the foreground or the background, but not both. For each foreground pixel, the alpha value tells us the probability of each:

This is the definition of the Porter/Duff Over operator for non-premultiplied pixels.

But if alpha is interpreted as translucency, then the Over operator is not the correct one to use. The Over operator will act as if each pixel is partially covering the background:

Which is not how translucency works. A translucent material reflects some light and lets other light through. The light that is let through is reflected by the background and interacts with the foreground again.

Let’s look at this in more detail. Please follow along in the diagram to the right. First with probability a, the photon is reflected back towards the viewer:

With probability (1 – a), it passes through the foreground, hits the background, and is reflected back out. The photon now hits the backside of the foreground pixel. With probability (1 – a), the foreground pixel lets the photon back out to the viewer. The result so far:

But we are not done yet, because with probability a the foreground pixel reflects the photon once again back towards the background pixel. There it will be reflected, hit the backside of the foreground pixel again, which lets it through to our eyes with probability (1 – a). We get another term where the final (1 – a) is replaced with a * fg * bg * (1 – a):

And so on. In each round, we gain another term which is identical to the previous one, except that it has an additional a * fg * bg factor:

Or more compactly:

Because we are dealing with pixels, both a, fg, and bg are less than 1, so the sum is a geometric series:

Putting them together, we get:

I have sidestepped the issue of premultiplication by assuming that background alpha is 1. The calculations with premultipled colors are similar, and for the color components, the result is simply:

The issue of destination alpha is more complicated. With the Over operator, both foreground and background are opaque masks, so the light that survives both has the same color as the input light. With translucency, the transmitted light has a different color, which means the resulting alpha value must in principle be different for each color component. But that’s not possible for ARGB pixels. A similar argument to the above shows that the resulting alpha value would be:

where b is the background alpha. The problem is the dependency on fg and bg. If we simply assume for the purposes of the alpha computation that fg and bg are equal to a and b, we get this:

which is equal to

Ie., exactly the same computation as the one for the color channels. So we can define the Translucency Operator as this:

for all four channels.

Here is an example of what the operator looks like. The image below is what you will get if you use the Over operator to implement a selection rectangle. Mouse over to see what it would look like if you used the Translucency operator.

Both were computed in linear RGB. Typical implementations will often compute the Over operator in sRGB, so that’s what see if you actually select some icons in Nautilus. If you want to compare all three, open these in tabs:

And for good measure, even though it makes zero sense to do this,

Syndicated 2011-09-26 17:28:20 (Updated 2011-10-09 20:12:55) from Søren Sandmann Pedersen

10 Aug 2011 (updated 4 Sep 2011 at 02:06 UTC) »

Gamma Correction vs Premultiplied Pixels

Pixels with 8 bits per channel are normally sRGB encoded because that allocates more bits to darker colors where human vision is the most sensitive. (Actually, it’s really more of a historical accident, but sRGB nevertheless remains useful for this reason). The relationship between sRGB and linear RGB is that you get an sRGB pixel by raising each component of a linear pixel to the power of 1/2.2.

A lot of graphics software does alpha blending directly on these sRGB pixels using alpha values that are linearly coded (ie., an alpha value of 0 means no coverage, 0.5 means half coverage, and 1 means full coverage). Because alpha blending is best done with premultiplied pixels, such systems store pixels in this format:

[ alpha,  alpha * red_s,  alpha * green_s,  alpha * blue_s ]

where alpha is linearly coded, and (red_s, green_s, blue_s) are sRGB coded. As long as you are happy with blending in sRGB, this works well. Also, if you simply discard the alpha channel of such pixels and display them directly on a monitor, it will look as if the pixels were alpha blended (in the sRGB space) on top of a black background, which is the desired result.

But what if you want to blend in linear RGB? If you use the format above, some expensive conversions will be required. To convert to premultiplied linear, you have to first divide by alpha, then raise each color to 2.2, then multiply by alpha. To convert back, you must divide by alpha, raise to 1/2.2, then multiply with alpha.

The conversions can be avoided if you store the pixels linearly, ie., keeping the premultiplication, but coding red, green, and blue linearly instead of as sRGB. This makes blending fast, but the downside is that you need deeper pixels. With only 8 bits per pixel, the linear coding loses too much precision in darker tones. Another problems is that to display these pixels, you will either have to convert them to sRGB, or if the video card can scan them out directly, you have to make sure that the gamma ramp is set to compensate for the fact that the monitor expects sRGB pixels.

Can we get the best of both worlds? Yes. The format to use is this:

[ alpha,  alpha_s * red_s,  alpha_s * green_s,  alpha_s * blue_s ]

That is, the alpha channel is stored linearly, and the color channels are stored in sRGB, premultiplied with the alpha value raised to 1/2.2. Ie., the red component is now

(red * alpha)^(1/2.2),

where before it was

alpha * red^(1/2.2).

It is sufficient to use 8 bits per channel with this format because of the sRGB encoding. Discarding the alpha channel and displaying the pixels on a monitor will produce pixels that are alpha blended (in linear space) against black, as desired.

You can convert to linear RGB simply by raising the R, G, and B components to 2.2, and back by raising to 1/2.2. Or, if you feel like cheating, use an exponent of 2 so that the conversions become a multiplication and a square root respectively.

This is also the pixel format to use with texture samplers that implement the sRGB OpenGL extensions (textures and framebuffers). These extensions say precisely that the R, G, and B components are raised to 2.2 before texture filtering, and raised to 1/2.2 after the final raster operation.

Syndicated 2011-08-10 14:25:55 (Updated 2011-09-04 01:42:00) from Søren Sandmann Pedersen

15 Jul 2011 (updated 4 Sep 2011 at 02:06 UTC) »

Sysprof 1.1.8

A new version 1.1.8 of Sysprof is out.

This is a release candidate for 1.2.0 and contains mainly bug fixes.

Syndicated 2011-07-15 17:48:39 (Updated 2011-09-04 01:42:00) from Søren Sandmann Pedersen

4 Jul 2011 (updated 4 Sep 2011 at 02:06 UTC) »

Fast Multiplication of Normalized 16 bit Numbers with SSE2

If you are compositing pixels with 16 bits per component, you often need this computation:

uint16_t a, b, r;

r = (a * b + 0x7fff) / 65535;


There is a well-known way to do this quickly without a division:

uint32_t t;

t = a * b + 0x8000;

r = (t + (t >> 16)) >> 16;


Since we are compositing pixels we want to do this with SSE2 instructions, but because the code above uses 32 bit arithmetic, we can only do four operations at a time, even though SSE registers have room for eight 16 bit values. Here is a direct translation into SSE2:

a = punpcklwd (a, 0);
b = punpcklwd (b, 0);
a = pmulld (a, b);
a = paddd (a, 0x8000);
b = psrld (a, 16);
a = paddd (a, b);
a = psrld (a, 16);
a = packusdw (a, 0);


But there is another way that better matches SSE2:

uint16_t lo, hi, t, r;

hi = (a * b) >> 16;
lo = (a * b) & 0xffff;

t = lo >> 15;
hi += t;
t = hi ^ 0x7fff;

if ((int16_t)lo > (int16_t)t)
lo = 0xffff;
else
lo = 0x0000;

r = hi - lo;


This version is better because it avoids the unpacking to 32 bits. Here is the translation into SSE2:

t = pmulhuw (a, b);
a = pmullw (a, b);
b = psrlw (a, 15);
t = paddw (t, b);
b = pxor (t, 0x7fff);
a = pcmpgtw (a, b);
a = psubw (t, a);


This is not only shorter, it also makes use of the full width of the SSE registers, computing eight results at a time.

Unfortunately SSE2 doesn’t have 8 bit variants of pmulhuw, pmullw, and psrlw, so we can’t use this trick for the more common case where pixels have 8 bits per component.

Exercise: Why does the second version work?

Syndicated 2011-07-03 12:11:20 (Updated 2011-09-04 01:42:00) from Søren Sandmann Pedersen

30 Jun 2011 (updated 30 Jun 2011 at 11:39 UTC) »
6 Aug 2008 (updated 4 Jul 2011 at 18:28 UTC) »
17 Jun 2008 (updated 4 Jul 2011 at 18:29 UTC) »
10 Oct 2007 (updated 4 Jul 2011 at 18:29 UTC) »

9 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!