wingo is currently certified at Master level.

Name: Andy Wingo
Member since: 2001-05-29 05:20:46
Last Login: 2009-12-14 09:39:54

FOAF RDF Share This

Homepage: http://wingolog.org/

Notes:

Some projects I hack on:

Interests: Currently hacking at Fluendo in Barcelona, making a platform for streaming live video, with on-demand as a bit of an afterthought.

Prior to that, I spent two years teaching math and science in rural northern Namibia for the Peace Corps.

My advo diary is mirrored from my web log over at wingolog.org. There are a few other things hosted there as well.

Projects

Recent blog entries by wingo

Syndication: RSS 2.0

state of js implementations, 2014 edition

I gave a short talk about the state of JavaScript implementations this year at the Web Engines Hackfest.


29 minutes, vorbis or mp3; slides (PDF)

The talk goes over a bit of the history of JS implementations, with a focus on performance and architecture. It then moves on to talk about what happened in 2014 and some ideas about where 2015 might be going. Have a look if that's a thing you are in to. Thanks to Adobe, Collabora, and Igalia for sponsoring the event.

Syndicated 2014-12-09 10:29:20 from wingolog

there are no good constant-time data structures

Imagine you have a have a web site that people can access via a password. No user name, just a password. There are a number of valid passwords for your service. Determining whether a password is in that set is security-sensitive: if a user has a valid password then they get access to some secret information; otherwise the site emits a 404. How do you determine whether a password is valid?

The go-to solution for this kind of problem for most programmers is a hash table. A hash table is a set of key-value associations, and its nice property is that looking up a value for a key is quick, because it doesn't have to check against each mapping in the set.

Hash tables are commonly implemented as an array of buckets, where each bucket holds a chain. If the bucket array is 32 elements long, for example, then keys whose hash is H are looked for in bucket H mod 32. The chain contains the key-value pairs in a linked list. Looking up a key traverses the list to find the first pair whose key equals the given key; if no pair matches, then the lookup fails.

Unfortunately, storing passwords in a normal hash table is not a great idea. The problem isn't so much in the hash function (the hash in H = hash(K)) as in the equality function; usually the equality function doesn't run in constant time. Attackers can detect differences in response times according to when the "not-equal" decision is made, and use that to break your passwords.

So let's say you ensure that your hash table uses a constant-time string comparator, to protect against the hackers. You're safe! Or not! Because not all chains have the same length, "interested parties" can use lookup timings to distinguish chain lookups that take 2 comparisons compared to 1, for example. In general they will be able to determine the percentage of buckets for each chain length, and given the granularity will probably be able to determine the number of buckets as well (if that's not a secret).

Well, as we all know, small timing differences still leak sensitive information and can lead to complete compromise. So we look for a data structure that takes the same number of algorithmic steps to look up a value. For example, bisection over a sorted array of size SIZE will take ceil(log2(SIZE)) steps to get find the value, independent of what the key is and also independent of what is in the set. At each step, we compare the key and a "mid-point" value to see which is bigger, and recurse on one of the halves.

One problem is, I don't know of a nice constant-time comparison algorithm for (say) 160-bit values. (The "passwords" I am thinking of are randomly generated by the server, and can be as long as I want them to be.) I would appreciate any pointers to such a constant-time less-than algorithm. However a bigger problem is that the time it takes to access memory is not constant; accessing element 0 of the sorted array might take more or less time than accessing element 10. In algorithms we typically model access on a more abstract level, but in hardware there's a complicated parallel and concurrent protocol of low-level memory that takes a non-deterministic time for any given access. "Hot" (more recently accessed) memory is faster to read than "cold" memory.

Non-deterministic memory access leaks timing information, and in the case of binary search the result is disaster: the attacker can literally bisect the actual values of all of the passwords in your set, by observing timing differences. The worst!

You could get around this by ordering "passwords" not by their actual values but by their cryptographic hashes (e.g. by their SHA256 values). This would force the attacker to bisect not over the space of password values but of the space of hash values, which would protect actual password values from the attacker. You still leak some timing information about which paths are "hot" and which are "cold", but you don't expose actual passwords.

It turns out that, as far as I am aware, it is impossible to design a key-value map on common hardware that runs in constant time and is sublinear in the number of entries in the map. As Zooko put it, running in constant time means that the best case and the worst case run in the same amount of time. Of course this is false for bucket-and-chain hash tables, but it's false for binary search as well, as "hot" memory access is faster than "cold" access. The only plausible constant-time operation on a data structure would visit each element of the set in the same order each time. All constant-time operations on data structures are linear in the size of the data structure. Thems the breaks! All you can do is account for the leak in your models, as we did above when ordering values by their hash and not their normal sort order.

Once you have resigned yourself to leaking some bits of the password via timing, you would be fine using normal hash tables as well -- just use a cryptographic hashing function and a constant-time equality function and you're good. No constant-time less-than operator need be invented. You leak something on the order of log2(COUNT) bits via timing, where COUNT is the number of passwords, but since that's behind a hash you can't use it to bisect on actual key values. Of course, you have to ensure that the hash table isn't storing values in sorted order and short-cutting early. This sort of detail isn't usually part of the contract of stock hash table implementations, so you probably still need to build your own.

An alternative is to encode your data structure differently, for example for the "key" to itself contain the value, signed by some private key only known to the server. But this approach is limited by network capacity and the appropriateness of copying for the data in question. It's not appropriate for photos, for example, as they are just too big.

Corrections appreciated from my knowledgeable readers :) I was quite disappointed when I realized that there were no good constant-time data structures and would be happy to be proven wrong. Thanks to Darius Bacon, Zooko Wilcox-O'Hearn, Jan Lehnardt, and Paul Khuong on Twitter for their insights; all mistakes are mine.

Syndicated 2014-12-02 22:01:38 from wingolog

scheme workshop 2014

I just got back from the US, and after sleeping for 14 hours straight I'm in a position to type about stuff again. So welcome back to the solipsism, France and internet! It is good to see you on a properly-sized monitor again.

I had the enormously pleasurable and flattering experience of being invited to keynote this year's Scheme Workshop last week in DC. Thanks to John Clements, Jason Hemann, and the rest of the committee for making it a lovely experience.

My talk was on what Scheme can learn from JavaScript, informed by my work in JS implementations over the past few years; you can download the slides as a PDF. I managed to record audio, so here goes nothing:


55 minutes, vorbis or mp3

It helps to follow along with the slides. Some day I'll augment my slide-rendering stuff to synchronize a sequence of SVGs with audio, but not today :)

The invitation to speak meant a lot to me, for complicated reasons. See, Scheme was born out of academic research labs, and to a large extent that's been its spiritual core for the last 40 years. My way to the temple was as a money-changer, though. While working as a teacher in northern Namibia in the early 2000s, fleeing my degree in nuclear engineering, trying to figure out some future life for myself, for some reason I was recording all of my expenses in Gnucash. Like, all of them, petty cash and all. 50 cents for a fat-cake, that kind of thing.

I got to thinking "you know, I bet I don't spend any money on Tuesdays." See, there was nothing really to spend money on in the village besides fat cakes and boiled eggs, and I didn't go into town to buy things except on weekends or later in the week. So I thought that it would be neat to represent that as a chart. Gnucash didn't have such a chart but I knew that they were implemented in Guile, as part of this wave of Scheme consciousness that swept the GNU project in the nineties, and that I should in theory be able to write it myself.

Problem was, I also didn't have internet in the village, at least then, and I didn't know Scheme and I didn't really know Gnucash. I think what I ended up doing was just monkey-typing out something that looked like the rest of the code, getting terrible errors but hey, it eventually worked. I submitted the code, many years ago now, some of the worst code you'll read today, but they did end up incorporating it into Gnucash and to my knowledge that report is still there.

I got more into programming, but still through the back door, so to speak. I had done some free software work before going to Namibia, on GStreamer, and wanted to build a programmable modular synthesizer with it. I read about Supercollider, and decided I wanted to do something like that but with the "unit generators" defined in GStreamer and orchestrated with Scheme. If I knew then that Scheme could be fast, I probably would have started on an entirely different course of things, but that did at least result in gainful employment doing unrelated GStreamer things, if not a synthesizer.

Scheme became my dominant language for writing programs. It was fun, and the need to re-implement a bunch of things wasn't a barrier at all -- rather a fun challenge. After a while, though, speed was becoming a problem. It became apparent that the only way to speed up Guile would be to replace its AST interpreter with a compiler. Thing is, I didn't know how to write one! Fortunately there was previous work by Keisuke Nishida, jetsam from the nineties wave of Scheme consciousness. I read and read that code, mechanically monkey-typed it into compilation, and slowly reworked it into Guile itself. In the end, around 2009, Guile was faster and I found myself its co-maintainer to boot.

Scheme has been a back door for me for work, too. I randomly met Kwindla Hultman-Kramer in Namibia, and we found Scheme to be a common interest. Some four or five years later I ended up working for him with the great folks at Oblong. As my interest in compilers grew, and it grew as I learned more about Scheme, I wanted something closer there, and that's what I've been doing in Igalia for the last few years. My first contact there was a former Common Lisp person, and since then many contacts I've had in the JS implementation world have been former Schemers.

So it was a delight when the invitation came to speak (keynote, no less!) the Scheme Workshop, behind the altar instead of in the foyer.

I think it's clear by now that Scheme as a language and a community isn't moving as fast now as it was in 2000 or even 2005. That's good because it reflects a certain maturity, and makes the lore of the tribe easier to digest, but bad in that people tend to ossify and focus on past achievements rather than future possibility. Ehud Lamm quoted Nietzche earlier today on Twitter:

By searching out origins, one becomes a crab. The historian looks backward; eventually he also believes backward.

So it is with Scheme and Schemers, to an extent. I hope my talk at the conference inspires some young Schemer to make an adaptively optimized Scheme, or to solve the self-hosted adaptive optimization problem. Anyway, as users I think we should end the era of contorting our code to please compilers. Of course some discretion in this area is always necessary but there's little excuse for actively bad code.

Happy hacking with Scheme, and au revoir!

Syndicated 2014-11-27 17:48:44 from wingolog

on yakshave, on color, on cosines, on glitchen

Hold on to your butts, kids, because this is epic.

on yaks

As in all great epics, our prideful, stubborn hero starts in a perfectly acceptable state of things, decides on a lark to make a small excursion, and comes back much much later to inflict upon you pictures from his journey.

So. I have a web photo gallery but I don't take many pictures these days. Dealing with photos is a bit of a drag, and the ways that are easier like Instagram or what-not give me the (peer, corporate, government: choose 3) surveillance hives. So, I had vague thoughts that I should update my web gallery. Yakpoint 1.

At the same time, my web gallery was written for mod_python on the server, and I don't like hacking in Python any more and kinda wanted to switch away from Apache. Yakpoint 2.

So I rewrote the server-side part in Scheme. (Yakpoint 3.) It worked fine but I found I needed the ability to get the dimensions of files on the server, so I wrote a quick-and-dirty JPEG parser. Yakpoint 4.

I needed EXIF data as well, as the original version displayed EXIF data, and for that I used a binding to libexif that I had written a few years ago when I thought about starting this project (Yakpoint -1). However I found some crashers in the library, because it had never really been tested in production, and instead of fixing them I said "what the hell, I'll just write an EXIF parser". (Yakpoint 5.) So I did and adapted the web gallery to use it (Yakpoint 6, for the adaptation.)

At this point, I looked back, and looked forward, and looked all around, and all was good, but what was with this uneasiness I was feeling? And indeed, I hadn't actually made anything better, and I wasn't taking more photos, and the workflow was the same.

I was also concerned about the client side of things, which was still in Python and using some breakage-prone legacy libraries to do the photo scaling and transformations and what-not, and relied on a desktop application (f-spot) of dubious future. So I started to look at what it would take to port that script to Scheme (Yakpoint 7). Well it used some legacy libraries to copy files over SSH (gnome-vfs; switching away from that would be Yakpoint 8) and I didn't want to make a Scheme GIO binding (Yakpoint 9, narrowly avoided), and I then -- and then, dear reader -- so then I said "well WTF my caching story on the server is crap anyway, I never know when the sqlite database has changed or not so I never know what responses I can cache, what I really want is a functional datastore" (Yakpoint 10), which is what I have with Git and Tekuti (Yakpoint of yore), and so why not just store my photos in Git like I do in Tekuti for blog posts and serve them from there, indexing as needed? Of course I'd need some other server software (Yakpoint of fore, by which I meantersay the future), but then I could just git push to update my photo gallery, and I wouldn't have to endure the horror that is GVFS shelling out to ssh in a FUSE daemon (Yakpoint of ne'er).

So. After mulling over these thoughts for a while I decided, during an autumnal walk on the Salève in which we had the greatest views of Mont Blanc everrrrr and yet where are the photos?, that really what I needed was new photo management software, not just a web gallery. I should be able to share photos from my phone or from my desktop, fix them up either place, tag and such, and OK woo hoo! Such is the future! And the present for many people? Thing is, I also needed good permissions management (Yakpoint what, 10 I guess?), because you know a dude just out of college is not the same as that dude many years later. Which means serving things over HTTPS (Yakpoints 11-47) in such a way that the app has some good control over who gets what.

Well. Anyway. My mind ran ahead, and runs ahead, and yet we haven't actually tasted the awesome sauce yet. So! The photo management software, whereever it lives, needs to rotate photos at least, and scale them down to a few resolutions. I smell a yak! I looked at jpegtran which can do some lossless rotations but it's not available as a library, which is odd; and really I don't like shelling out for core program functionality, because every time I deal with the file system it's the wild west of concurrent mutation. If naming things is one of the two hardest problems in computer science, the file system is the worst because you have to give a global name to every intermediate value.

At the same time to scale images, what was I to do? Make a binding to libjpeg? Well I started (Yakpoint 48) but for reals kids, libjpeg is not fun. It works great and is really clever but

  1. it's approximately impossible to use from a dynamic ffi; you want a compiler to verify that you are using the right structure definitions

  2. there has been an inane ABI and format break imposed by the official IJG libjpeg but which other implementations have not followed, but how could you know which one you are using?

  3. the error handling facility encourages longjmp in C programs; somewhat terrifying

  4. off-heap image manipulation libraries always interact poorly with GC, because the GC only sees the small pointer to the off-heap image, and so doesn't GC often enough

  5. I have zero guarantee that libjpeg won't change ABI in weird ways, and I don't want to touch this software for the next 10 years

  6. I want to do jpegtran-like lossless transformations, but that's not available as a library, and it's totes ridics that binding libjpeg does not help you out here

  7. it's still an unsafe C library, battle-tested yes, but terrifyingly unsafe, and I'd be putting it on my server and who knows?

Friends, I arrived at the pasture, and I, I chose the yak less shaven. I took my lame JPEG parser and turned it into a full decoder (Yakpoint 49), realized it wasn't much more work to do an encoder (Yakpoint 50), and implemented the lossless transformations (Yakpoint 51).

on haters

Before we go on, I know some people would think "what is this kid about". I mean, custom gallery software, a custom JPEG library of all things, all bespoke, why don't you just use off-the-shelf solutions? Why aren't you normal and use a normal language and what about the best practices and where's your business case and I can't go on about this because there's a technical term for people that say this kind of thing and it's "hater".

Thing is, when did a hater ever make anything cool? Come to think of it, when did a hater make anything at all? In my experience the most vocal haters have nothing behind their names except a long series of pseudonymous rants in other people's comment boxes. So friends, in the joyful spirit of earning-anew, let's talk about JPEG!

on color

JPEG is a funny thing. Photos are our lives and our memories, our first steps and our friends, and yet I for one didn't know very much about them. My mental model that "a JPEG is a rectangle of pixels" doesn't turn out to be quite right.

If you actually look in a normal JPEG, you see three planes of information. If I take this image, for example:

If I decode it, actually I get three images. Here's the first one:

This is just the greyscale version of the image. So, storytime! Remember black and white television? We had an old one that got moved around the house sometimes, like if Mom was working at something in the kitchen. We also had a color one in the living room, and you could watch one or the other and they showed the same stuff. Strange when you think about it though -- one being in color and the other not. Well it turns out that color was literally just added on, both historically and technically. The main broadcast was still in black and white, and then in one part of the frequency band there were separate color signals, which color TVs would pick up, mix with the black and white signal, and come out with color. Wikipedia notes that "color TV" was really just "colored TV", which is a phrase whose cleverness I respect. Big ups to the W P.

In the context of JPEG, this black-and-white signal is sometimes called "luma", but is more precisely called Y', where the "prime" (the apostrophe) indicates that the signal has gamma correction applied.

In the image above, I replaced the color planes (sometimes collectively called the "chroma") with zeroes, while losslessly keeping the luma. Below is the first color plane, with the Y' plane replaced with a uniform 50% luma, and the other color plane replaced with zeros.

This color signal is technically known as CB, which may be very imperfectly understood as the bluish component of the color. Well the original image wasn't very blue, so we don't see very much here.

Indeed, our eyes have a harder time seeing differences in color than differences in intensity. Apparently this goes all the way down to biology -- we have more receptors in our eyes for "black and white" and fewer for color.

Early broadcasters took advantage of this difference in perception by actually devoting more bandwidth in their broadcasts to luma than to chroma; if you check the Wikipedia page you will see that the area in the spectrum allocation devoted to color is much smaller than the area devoted to intensity. So it is in JPEG: the above image being half-width indicates that actually we're just encoding one CB sample for every two Y' samples.

Finally, here we have the CR color plane, which can loosely be thought of as the "redness" of the image.

These test images and crops preserve the actual encoding of this photo as it came from my camera, without re-encoding. That's partly why there's not much interesting going on; with the megapixels these days, it's hard to fit much of anything in a few hundred pixels square. This particular camera is sub-sampling in the horizontal direction, but it's also common to subsample vertically as well, producing color planes that are half-width and half-height. In my limited investigations I have found that cameras tend to sub-sample just in the X direction, producing what they call 4:2:2 images, and that standard software encoders subsample in both, producing 4:2:0.

Incidentally, properly scaling up the color planes is quite an irritating endeavor -- the standard indicates that the color is sampled between the locations of the Y' samples ("centered" chroma), but these images originally have EXIF data that indicates that the color samples are taken at the position of the first Y' sample ("co-sited" chroma). I'm pretty sure libjpeg doesn't delve into the EXIF to check this though, so it would seem that all renderings I have seen of these photos are subtly off.

But how do you get proper color out of these strange luma and chroma things? Well, the Y'CBCR colorspace is really just the same color cube as RGB, except rotated: the Y' axis traverses the diagonal from (0, 0, 0) (black) to (255, 255, 255) (white). CB and CR are perpendicular to that diagonal, pointing towards blue or red respectively. So to go back to RGB, you multiply by a matrix to rotate the cube.

It's not a very intuitive color system, as you can see from the images above. For one thing, at zero or full luma, the chroma axes have no meaning; black and white can have no hue. Indeed if you imagine trying to fit a cube corner-down into a similar-sized box, you end up either having empty space in the box, or you have to cut off corners from the cube, or both. Cut corners means that bits of the Y'CBCR signal are wasted; empty space means there are RGB colors that are not representable in Y'CBCR. I'm not sure, but I think both are true for the particular formulation of Y'CBCR used in JPEG.

There's more to say about color here but frankly I don't know enough to do so, even though I worked in digital video for many years. If this is something you are mildly interested in, I highly, highly recommend watching Wim Taymans' presentation at this year's GStreamer conference. He takes a look at color in video that is constructive, building up from biology through math to engineering. His is a principled approach rather than a list of rules. It really clarified a number of things for me (and opened doors to unknown unknowns beyond).

on cosines

Where were we? Right, JPEG. So the proper way to understand what JPEG is is to understand the encoding process. We've covered colorspace conversion from RGB to Y'CBCR and sub-sampling. Next, the image canvas is divided into equal-sized "macroblocks". (These are called "minimum coded units" (MCUs) in the JPEG context, but in video they are usually called macroblocks, and it's a better name.) Without sub-sampling, each macro-block will contain one 8-sample-by-8-sample block for each component (Y', CB, CR) of the image. In my images above, the canvas space corresponding to one chroma block is the space of two luma blocks, so the macroblocks will be 16 samples wide and 8 samples tall, and contain two Y' blocks and one each of CB and CR. If the image canvas can't be evenly divided into macroblocks, it is padded to fit, usually by duplicating the last column or row of samples.

Then to make a JPEG, each block is encoded separately, then the whole thing is just written out to a file, and you're done!

This description glosses over a couple of important points, but it's a good big-picture view to have in mind. The pipeline goes from RGB pixels, to a padded RGB canvas, to separate Y'CBCR planes, to a possibly subsampled set of those planes, to macroblocks, to encoded macroblocks, to the file. Decoding is the reverse. It's a totally doable, comprehensible thing, and that was one of the big takeaways for me from this project. I took photography classes in high school and it was really cool to see how to shoot, develop, and print film, and this is similar in many ways. The real "film" is raw-format data, which some cameras produce, but understanding JPEG is like understanding enlargers and prints and fixer baths and such things. It's smelly and dark but pretty cool stuff.

So, how do you encode a block? Well peoples, this is a kinda cool thing. Maybe you remember from some math class that, given n uniformly spaced samples, you can always represent that series as a sum of n cosine functions of equally spaced frequencies. In each litle 8-by-8 block, that's what we do: a "forward discrete cosine transformation" (FDCT), which is just multiplying together some matrices for every point in the block. The FDCT is completely separable in the X and Y directions, so the space of 8 horizontal coefficients multiplies by the space of 8 vertical coefficients at each column to yield 64 total coefficients, which is not coincidentally the number of samples in a block.

Funny thing about those coefficients: each one corresponds to a particular horizontal and vertical frequency. We can map these out as a space of functions; for example giving a non-zero coefficient to (0, 0) in the upper-left block of a 8-block-by-8-block grid, and so on, yielding a 64-by-64 pixel representation of the meanings of the individual coefficients. That's what I did in the test strip above. Here is the luma example, scaled up without smoothing:

The upper-left corner corresponds to a frequency of 0 in both X and Y. The lower-right is a frequency of 4 "hertz", oscillating from highest to lowest value in both directions four times over the 8-by-8 block. I'm actually not sure why there are some greyish pixels around the right and bottom borders; it's not a compression artifact, as I constructed these DCT arrays programmatically. Anyway. Point is, your lover's smile, your sunny days, your raw urban graffiti, your child's first steps, all of these are reified in your photos as a sum of cosine coefficients.

The odd thing is that what is reified into your pictures isn't actually all of the coefficients there are! Firstly, because the coefficients are rounded to integers. Mathematically, the FDCT is a lossless operation, but in the context of JPEG it is not because the resulting coefficients are rounded. And they're not just rounded to the nearest integer; they are probably quantized further, for example to the nearest multiple of 17 or even 50. (These numbers seem exaggerated, but keep in mind that the range of coefficients is about 8 times the range of the original samples.)

The choice of what quantization factors to use is a key part of JPEG, and it's subjective: low quantization results in near-indistinguishable images, but in middle compression levels you want to choose factors that trade off subjective perception with file size. A higher quantization factor leads to coefficients with fewer bits of information that can be encoded into less space, but results in a worse image in general.

JPEG proposes a standard quantization matrix, with one number for each frequency (coefficient). Here it is for luma:

(define *standard-luma-q-table*
  #(16 11 10 16 24 40 51 61
    12 12 14 19 26 58 60 55
    14 13 16 24 40 57 69 56
    14 17 22 29 51 87 80 62
    18 22 37 56 68 109 103 77
    24 35 55 64 81 104 113 92
    49 64 78 87 103 121 120 101
    72 92 95 98 112 100 103 99))

This matrix is used for "quality 50" when you encode an 8-bit-per-sample JPEG. You can see that lower frequencies (the upper-left part) are quantized less harshly, and vice versa for higher frequencies (the bottom right).

(define *standard-chroma-q-table*
  #(17 18 24 47 99 99 99 99
    18 21 26 66 99 99 99 99
    24 26 56 99 99 99 99 99
    47 66 99 99 99 99 99 99
    99 99 99 99 99 99 99 99
    99 99 99 99 99 99 99 99
    99 99 99 99 99 99 99 99
    99 99 99 99 99 99 99 99))

For chroma (CB and CR) we see that quantization is much more harsh in general. So not only will we sub-sample color, we will also throw away more high-frequency color variation. It's interesting to think about, but also makes sense in some way; again in photography class we did an exercise where we shaded our prints with colored pencils, and the results were remarkable. My poor, lazy coloring skills somehow rendered leaves lifelike in different hues of green; really though, they were shades of grey, colored in imprecisely. "Colored TV" indeed.

With this knowledge under our chapeaux, we can now say what the "JPEG quality" setting actually is: it's simply that pair of standard quantization matrices scaled up or down. Towards "quality 100", the matrix approaches all-ones, for no quantization, and thus minimal loss (though you still have some rounding, often subsampling as well, and RGB-to-Y'CBCR gamut loss). Towards "quality 0" they scale to a matrix full of large values, for harsh quantization.

This understanding also explains those wavey JPEG artifacts you get on low-quality images. Those artifacts look like waves because they are waves. They usually occur at sharp intensity transitions, which like a cymbal crash cause lots of high frequencies that then get harshly quantized. Incidentally I suspect (but don't know) that this is the same reason that cymbals often sound bad in poorly-encoded MP3s, because of harsh quantization in the frequency domain.

Finally, the coefficients are written out to a file as a stream of bits. Each file gets a huffman code allocated to it, which ideally is built from the distribution of quantized coefficient sizes seen in all of the blocks of an image. There are usually different encodings for luma and chroma, to reflect their different quantizations. Reading and writing this bitstream is a bit of a headache but the algorithm is specified in the JPEG standard, and all you have to do is implement it. Notably, though, there is special support for encoding a run of zero-valued coefficients, which happens often after quantization. There are rarely wavey bits in a blue blue sky.

on transforms

It's terribly common for photos to be wrongly oriented. Unfortunately, the way that many editors fix photo rotation is by setting a bit in the EXIF information of the JPEG. This is ineffectual, as web browsers don't look in the EXIF information, and silly, because it turns out you can losslessly rotate most JPEG images anyway.

Consider that the body of a JPEG is an array of macroblocks. To rotate an image, you just have to rearrange those macroblocks, then rearrange the blocks inside the macroblocks (e.g. swap the two Y' blocks in my above example), then transform the blocks themselves.

The lossless transformations that you can do on a block are transposition, vertical flipping, and horizontal flipping.

Transposition flips a block along its downward-sloping diagonal. To do so, you just swap the coefficients at (u, v) with the coefficients at (v, u). Easy peasey.

Flipping is trickier. Consider the enlarged DCT image from above. What would it take to horizontally flip the function at (0, 1)? Instead of going from light to dark, you want it to go from dark to light. Simple: you just negate the coefficients! But you only want to negate those coefficients that are "odd" in the X direction, which are those coefficients whose column is odd. And actually that's all there is to it. Flipping vertically is the same, but for coefficients whose row is odd.

I said "most images" above because those whose size is not evenly divided by the macroblock size can't be losslessly rotated -- you will end up seeing some of the hidden data that falls off the edge of the canvas. Oh well. Most raw images are properly dimensioned, and if you're downscaling, you already have to re-encode anyway.

But that's just flipping and transposition, you say! What about rotation? Well it turns out that you can express rotation in terms of these operations: rotating 90 degrees clockwise is just a transpose and a horizontal flip (in that order). Together, flipping horizontally, flipping vertically, and transposing form a group, in the same way that flipping and flopping form a group for mattresses. Yeah!

on scheme

I wrote this library in Scheme because that's my language of choice these days. I didn't run into any serious impedance mismatches; Guile has a generic multi-dimensional array facility that made it possible to express many of these operations as generic folds, unfolds, or maps over arrays. The huffman coding part was a bit irritating, but all in all things were pretty good. The speed is pretty bad, but I haven't optimized it at all, and it gives me a nice test case for the compiler. Anyway, it's been fun and it suits my needs. Check out the project page if you're interested. Yes, to shave a yak you have to get a bit bovine and smelly, but yaks live in awesome places!

Finally I will leave you with a glitch, one of many that I have produced over the last couple weeks. Comments and corrections welcome below. Happy hacking!

Syndicated 2014-11-14 16:49:13 from wingolog

generators in firefox now twenty-two times faster

It's with great pleasure that I can announce that, thanks to Mozilla's Jan de Mooij, the new ES6 generator functions are twenty-two times faster in Firefox!

Some back-story, for the unawares. There's a new version of JavaScript coming, ECMAScript 6 (ES6). Among the new features that ES6 brings are generator functions: functions that can suspend. Firefox's JavaScript engine, SpiderMonkey, has had support for generators for many years, long before other engines. This support was upgraded to the new ES6 standard last year, thanks to sponsorship from Bloomberg, and was shipped out to users in Firefox 26.

The generators implementation in Firefox 26 was quite basic. As you probably know, modern JavaScript implementations have a number of tiered engines. In the case of SpiderMonkey there are three tiers: the interpreter, the baseline compiler, and the optimizing compiler. Code begins execution in the interpreter, which is the quickest engine to start. If a piece of code is hot -- meaning that lots of time is being spent there -- then it will "tier up" to the next level, where it is analyzed, possibly optimized, and then compiled to machine code.

Unfortunately, generators in SpiderMonkey have always been stuck at the lowest tier, the interpreter. This is because of SpiderMonkey's choice of implementation strategy for generators. Generators were implemented as "floating interpreter stack frames": heap-allocated objects whose shape was exactly the same as a stack frame in the interpreter. This had the advantage of being fairly cheap to implement in the beginning, but ultimately it made them unable to take advantage of JIT compilation, as JIT code runs on its own stack which has a different layout. The previous strategy also relied on trampolining through a helper written in C++ to resume generators, which killed optimization opportunities.

The solution was to represent suspended generator objects as snapshots of the state of a stack frame, instead of as stack frames themselves. In order for this to be efficient, last year we did a number of block scope optimizations to try and reduce the amount of state that a generator frame would have to restore. Finally, around March of this year we were at the point where we could refactor the interpreter to implement generators on the normal interpreter stack, with normal interpreter bytecodes, with the vision of being able to JIT-compile those bytecodes.

I ran out of time before I could land that patchset; although the patches were where we wanted to go, they actually caused generators to be even slower and so they languished in Bugzilla for a few more months. Sad monkey. It was with delight, then, that a month or so ago I saw that SpiderMonkey JIT maintainer Jan de Mooij was interested in picking up the patches. Since then he has been hacking off and on at getting my old patches into shape, and ended up applying them all.

He went further, optimizing stack frames to not reserve space for "aliased" locals (locals allocated on the scope chain), speeding up object literal creation in the baseline compiler and finally has implemented baseline JIT compilation for generators.

So, after all of that perf nargery, what's the upshot? Twenty-two times faster! In this microbenchmark:

function *g(n) {
    for (var i=0; i<n; i++)
        yield i;
}
function f() {
    var t = new Date();
    var it = g(1000000);
    for (var i=0; i<1000000; i++)
	it.next();
    print(new Date() - t);
}
f();

Before, it took SpiderMonkey 980 milliseconds to complete on Jan's machine. After? Only 43! It's actually marginally faster than V8 at this point, which has (temporarily, I think) regressed to 45 milliseconds on this test. Anyway. Competition is great and as a committer to both projects I find it very satisfactory to have good implementations on both sides.

As in V8, in SpiderMonkey generators cannot yet reach the highest tier of optimization. I'm somewhat skeptical that it's necessary, too, as you expect generators to suspend fairly frequently. That said, a yield point in a generator is, from the perspective of the optimizing compiler, not much different from a call site, in that it causes all locals to be saved. The difference is that locals may have unboxed representations, so we would have to box those values when saving the generator state, and unbox on restore.

Thanks to Bloomberg for funding the initial work, and big, big thanks to Mozilla's Jan de Mooij for picking up where we left off. Happy hacking with generators!

Syndicated 2014-11-14 08:41:08 from wingolog

430 older entries...

 

wingo certified others as follows:

  • wingo certified thomasvs as Journeyer
  • wingo certified Uraeus as Journeyer
  • wingo certified hadess as Master
  • wingo certified dobey as Journeyer
  • wingo certified omega as Master
  • wingo certified stevebaker as Journeyer
  • wingo certified ncm as Master
  • wingo certified habes as Journeyer
  • wingo certified dlehn as Journeyer
  • wingo certified lmjohns3 as Journeyer
  • wingo certified dolphy as Journeyer
  • wingo certified company as Journeyer
  • wingo certified rotty as Journeyer
  • wingo certified jamesh as Master
  • wingo certified fweiden as Journeyer
  • wingo certified titus as Journeyer
  • wingo certified karlberry as Master
  • wingo certified Stevey as Master
  • wingo certified leio as Apprentice
  • wingo certified minorityreport as Apprentice
  • wingo certified pabs3 as Apprentice
  • wingo certified clarkbw as Master
  • wingo certified tan as Journeyer
  • wingo certified olecom as Apprentice
  • wingo certified ingvar as Master

Others have certified wingo as follows:

  • thomasvs certified wingo as Journeyer
  • Uraeus certified wingo as Journeyer
  • wardv certified wingo as Journeyer
  • tnt certified wingo as Journeyer
  • hadess certified wingo as Journeyer
  • async certified wingo as Journeyer
  • dobey certified wingo as Journeyer
  • stevebaker certified wingo as Journeyer
  • habes certified wingo as Journeyer
  • DarthEvangelusII certified wingo as Journeyer
  • dlehn certified wingo as Journeyer
  • ishamael certified wingo as Journeyer
  • lmjohns3 certified wingo as Journeyer
  • ncm certified wingo as Journeyer
  • linn certified wingo as Journeyer
  • dolphy certified wingo as Journeyer
  • mpr certified wingo as Journeyer
  • watete certified wingo as Journeyer
  • company certified wingo as Journeyer
  • polak certified wingo as Journeyer
  • berthu certified wingo as Journeyer
  • rotty certified wingo as Journeyer
  • jamesh certified wingo as Journeyer
  • lerdsuwa certified wingo as Journeyer
  • zeenix certified wingo as Master
  • pasky certified wingo as Journeyer
  • fxn certified wingo as Journeyer
  • kai certified wingo as Journeyer
  • mathrick certified wingo as Journeyer
  • Stevey certified wingo as Journeyer
  • jdahlin certified wingo as Master
  • oubiwann certified wingo as Journeyer
  • lucasr certified wingo as Master
  • mchirico certified wingo as Journeyer
  • nixnut certified wingo as Master
  • chalst certified wingo as Journeyer
  • murajov certified wingo as Master
  • janneke certified wingo as Journeyer
  • jemarch certified wingo as Master
  • werner certified wingo as Master
  • dangermaus certified wingo as Master

[ Certification disabled because you're not logged in. ]

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!

X
Share this page