Recent blog entries

26 Nov 2014 lucasr   » (Master)

Leaving Mozilla

I joined Mozilla 3 years, 4 months, and 6 days ago. Time flies!

I was very lucky to join the company a few months before the Firefox Mobile team decided to rewrite the product from scratch to make it more competitive on Android. And we made it: Firefox for Android is now one of the highest rated mobile browsers on Android!

This has been the best team I’ve ever worked with. The talent, energy, and trust within the Firefox for Android group is simply amazing.

I’ve thoroughly enjoyed my time here but an exciting opportunity outside Mozilla came up and decided to take it.

What’s next? That’s a topic for another post ;-)

Syndicated 2014-11-26 16:57:50 from Lucas Rocha

26 Nov 2014 benad   » (Apprentice)

Modern MVC in the Browser

A few months ago, I started learning the AngularJS framework for web development. It is a quite clever framework that allows you to "augment" your HTML with markup to map placeholders with elements in your object model. It reminded me of JSP and markup-based SWING frameworks I used a decade ago. Since then, though, I realized that this template-based approach has quite a few limitations that bother me. First, as it is based on templates, it doesn't handle well highly dynamic elements that generate markup based on context. Second, complex, many-to-many mappings between the view and the model are difficult to express. For example, it is difficult to express a value placeholder that is the sum of multiple elements in the models without having to resort calling a custom function, or having the change in the view trigger the corresponding change in multiple locations in the model without, again, resorting to a custom function. Third, it is a framework that needs to keep track of the model/view mapping, so it is all-encompassing and heavy in configuration.

So, let's break down the problem and look individually at the view, model and controller as separate modules, and see if there could be a better, more "modern" approach.

The biggest problem with the "model" in JavaScript is that it lacks safely hiding properties as functions, as done natively in C# for example, or manually through "getter" and "setter" functions in JavaBeans. Because of that, it becomes difficult to take any normal object and add a layer on top of it that would implement an observer pattern automatically. Instead, you can use something like the model implementation of Backbone.js that fully hides the model behind a get/set interface that can automatically trigger change events to listener objects. It may not be elegant, but it works well.

For the "view", HTML development doesn't support well the concept of a "custom control" or "custom widget" in other GUIs. The view is mostly static, and JavaScript can change the DOM to some extent, and at high cost. In contrast, other GUI systems are based around rendering controls ("GUI controls", not to be confused with the controller in MVC) on a canvas, and the compositing engine takes care of rendering on screen only the visible and changed elements. A major advantage of that approach is that each control can render in any way it pleases without having to be aware of the lower-level rendering engine, be it pixels, vectors or HTML. You could simulate a full refresh of the DOM each time something (model or controller) changes the GUI in HTML, but the performance would be abysmal. Hence, the React library from Facebook and Instagram, which supports render-based custom controls but using a "virtual DOM", akin to "bitmasks" in traditional GUIs, so that only effective DOM changes are applied.

Finally, for the "controller", the biggest issue is how to have both the model and the view update each other automatically, on top of the usual business logic in the controller, without creating cyclic loops. React's approach, named Flux, is a design pattern where you always let events flow from the model to the view, and never (directly) in the other direction. My gripe with that is that it is merely a design pattern that cannot be enforced. This reminds me of the dangers of using the WPF threading model as a means to avoid concurrency issues: Forget to use the event dispatcher a single time, and you will create highly difficult to debug crashes. A new approach, called functional reactive programming is kind of like what dependency injection did for module integration, but for events. Essentially it is a functional way of manipulating channels of events between producers and consumers outside of the code of each producer and consumer. This may sound quite an overhead compared to the inline and prescriptive approach of Flux, but as soon as you build a web page heavy on asynchronous callbacks and events coming from outside the page, having all of that "glue" in a single location is a great benefit. An implementation of FRP for JavaScript, Bacon.js, has a great example of how FRP greatly reduces those countless nested callbacks that are commonplace in web and Node development.

Combined, Backbone.js, React and Bacon.js offer a compelling alternative to control- and template-heavy browser MVC frameworks, and at minimum prevent you from being "locked-in" a complex and difficult to replace framework.

Syndicated 2014-11-26 01:44:49 from Benad's Blog

26 Nov 2014 vicious   » (Master)

Law of large numbers: idiots, monkeys, CEOs, politicians, and nuclear war

Something that seems to be ignored by many people is the law of large numbers.  Suppose you take an action that has 99% rate of success.  That’s 1% chance failure.  Tiny!  Well do it enough times, and failures will happen.  Given enough candidates with 1% chance of winning, one of them will win.  Then everybody is surprised (but shouldn’t be).  Suppose that in the 435 seats for congress, there were all candidates that according to polls had 99% chance to win, and there was always a second candidate with 1% chance of winning.  I would expect 4or 5 of the underdogs to win.  If they didn’t we were wrong about the 99%.

Or how about entrepreneurs.  Suppose you take 100 idiots.  They each get a totally whacky idea for a new business that has 1% chance of success.  One of them will likely succeed.  Was it because he was smart?  No, there was enough idiots.  We should not overvalue success if we do not know how many other similar people failed, and how likely was success.  What if you have a person who started two businesses that had 1% chance of success.  Was that person a genius?  Or did you just have 10000 idiots.  You have surely heard that giving typewriters to monkeys will eventually (if you have enough monkeys and time) will produce works of Shakespeare.  Does this mean that Shakespeare was a monkey?  No.  There weren’t enough idiots (or monkeys) trying.  Plus the odds of typing random sentences, even if they are grammatically correct, and ending up with something as good as Shakespeare are astronomically low.  Shakespeare was with a very very very high degree of confidence not a monkey.  I can’t say the same for Steve Jobs.  The chance of Jobs having been a monkey are still somewhat smaller than your general successful CEO.  Think of the really important decisions that a CEO has to make, there aren’t that many.  If we simplified the situation and went simply with yes/no decisions on strategic things, there are a few in the lifetime of a company.  Most decisions are irrelevant to the success, and they even out: make a lot of decisions that make a small relative change and you will likely be where you started (again law of large numbers).  But there are a few that can make or break a company.  Given how many companies go bust, clearly there are many many CEOs making the wrong make or break decisions.  So just because you hired a CEO and he made a decision to focus on a certain product and drop everything else, and you made it big.  Does it mean your CEO was a genius?  Flipping a coin then gives you 50% chance of success too.

Same with stock traders.  Look and you will find traders whose last 10 bets were huge successes.  Does it mean that they are geniuses?  Or does it simply there are lots of stock traders that make fairly random decisions and some of them thus must be successful.  If there are enough of them, there will be some whose last 10 bets were good.  If it was 10 yes/no decisions, then you just need 1024 idiots for one of them to get all of them right.  They don’t have to know anything.  Let’s take a different example, suppose you find someone that out of a pool of 100 stocks has for the last 3 years picked the most successful one each year.  This person can be a total and complete idiot as long as there were a million people making those choices.  The chance of that person picking the right stock this year is most likely 1 in 100.  Don’t believe people trying to sell you their surefire way to predict the stockmarket, even if they are not lying about their past success.

OK.  More serious example of law of large numbers: Suppose your country does a military operation that has 99% chance of success and 1% chance of doom to your country.  Suppose your country keeps doing this.  Each time, it seems it is completely safe.  Yet, eventually, your country will lose.  You start enough wars, even with overwhelming odds.  Your luck will run out.  Statistically that’s a sure thing.  If you want your country to be around in 100 years, do not do things that have even an ever so tiny chance of backfiring and dooming that country to failure.  You can probably guess which (rather longish) list of countries that I am thinking of, which with good odds won’t be here in 100, 200, or 500 years.

Let’s end on a positive note:  With essentially 100% probability humankind will eventually destroy itself with nuclear (or other similarly destructive) weapons.  There seem to be conflicts arising every few decades that have a chance of deteriorating into nuclear war.  Small chance, but positive.  Since that seems likely to me to repeat itself over and over, eventually one of those nuclear wars will start.  It can’t quite be 100%, since there is a chance that we will all die in some apocalyptic natural disaster (possibly of our own making) rather than nuclear war.  Since there is also a small chance that everybody on earth gets a heart attack at the same exact time.  Even if we make sure we don’t do anything else dangerous (such as nuclear weapons), civilization will end one day with a massive concurrent heart attack.


Syndicated 2014-11-26 01:26:23 from The Spectre of Math

26 Nov 2014 mikal   » (Journeyer)

The Human Division




ISBN: 9780765369550
LibraryThing
I originally read this as a series of short stories released on the kindle, but the paperback collation of those has been out for a while and deserved a read. These stories are classic Scalzi, and read well. If you like the Old Man's War universe you will like this book. The chapters of the book are free standing because of how they were originally written, and that makes the book a bit disjointed. The cliff hanger at the end is also pretty annoying given the next book hasn't been released.

So, an interesting experiment that perhaps isn't perfect, but is well worth the read.

Tags for this post: book john_scalzi combat aliens engineered_human old_mans_war age colonization human_backup cranial_computer personal_ai
Related posts: The Last Colony ; Old Man's War ; The Ghost Brigades ; Old Man's War (2); The Ghost Brigades (2); Zoe's Tale


Comment

Syndicated 2014-11-25 17:14:00 from stillhq.com : Mikal, a geek from Canberra living in Silicon Valley (no blather posts)

25 Nov 2014 mattl   » (Journeyer)

Thanks for subscribing!

We've sent you an email with a confirmation link. Please follow this link to complete your subscription to the Free Software Supporter.

If you don't get this email, please try subscribing again and if that doesn't work, please email us at info@fsf.org.

Syndicated 2013-12-31 00:00:00 from fsf

25 Nov 2014 pixelbeat   » (Journeyer)

Fedora 21 review

A review of Fedora 21 compared to previous versions

Syndicated 2014-11-25 19:15:51 from www.pixelbeat.org

24 Nov 2014 pixelbeat   » (Journeyer)

Noppoo Lolita keyboard review

Review of my Noppoo Lolita 87 key keyboard

Syndicated 2014-11-24 17:52:44 from www.pixelbeat.org

24 Nov 2014 amits   » (Journeyer)

Upgrading to Fedora 21

The Fedora Project will soon put out its 21st release.  I’ve been running the pre-release bits for a while now, here are a few observations:

  • Upgrade from Fedora 20 to Fedora 21 via ‘fedup‘ was fast on my SSD disk, and there were no blockers after the reboot – minimal downtime!
  • Bug 740607 – evince no longer can switch to prev / next pages using the buttons or the ctrl+up/down keyboard shortcuts
  • Bug 740608 – gnome-shell’s calendar display overflows from the box if the number of calendar entries are more than some number; the box is always fixed in size.
  • Bug 739991 / Bug 730128 – gnome-terminal doesn’t pass alt+<n> to applications running inside the terminal if there isn’t a tab with that <n>.  This is the most serious regression for me; breaks several workflows for me: my irssi session as well as non-irssi terminals I use for work.  A surprising thing I found out after filing this report is there’s no way to open a closed bug report on gnome bugzilla, which means if some decides the bug isn’t going to be fixed, there’s no option to get new information back on the developers’ radar.
  • Bug 1163747 – memleak in upowerd

The workarounds[1] listed[2] earlier[3] are still in effect for things to work to my liking.

Everything else seems to be working reasonably fine so far, no further regressions.  I am tempted to give KDE a try again, though!

Syndicated 2014-11-24 05:39:33 (Updated 2014-11-24 10:32:55) from Think. Debate. Innovate.

23 Nov 2014 amits   » (Journeyer)

My talk at the CentOS Dojo Pune 2014

I spoke at the CentOS Dojo in Pune yesterday on new features available in CentOS release 7.0 since the 6 release.  Slides are available here: What’s New in Virtualization.  The event was organized by the Pune GNU/Linux Users Group (PLUG) for the CentOS project.

My talk was scheduled as the last talk of the day.  I was already quite tired by the time the talk started, and was totally exhausted when it finished.

There were about 30 people attending, with some of them having already used KVM.  There were quite a few questions related to KVM and how it compares to other hypervisors, and about features supported by KVM.  I was happy with the interaction, as well as the questions I received.  It showed a nice interest towards virtualization and KVM.

Also nice to see that some were using virt-manager, oVirt, etc., already.  I couldn’t always answer everything related to the higher levels, but pointed people at bugzilla for bugs and the mailing lists for questions.

Syndicated 2014-11-23 09:02:49 from Think. Debate. Innovate.

22 Nov 2014 Stevey   » (Master)

Lumail 2.x ?

I've continued to ponder the idea of reimplementing the console mail-client I wrote, lumail, using a more object-based codebase.

For one thing having loosely coupled code would allow testing things in isolation, which is clearly a good thing.

I've written some proof of concept code which will allow the following Lua to be evaluated:

-- Open the maildir.
users = Maildir.new( "/home/skx/Maildir/.debian.user" )

-- Count the messages.
print( "There are " .. users:count() .. " messages in the maildir " .. users:path() )

--
-- Now we want to get all the messages and output their paths.
--
for k,v in ipairs( users:messages()) do
    --
    -- Here we could do something like:
    --
    --   if ( string.find( v:headers["subject"], "troll", 1, true ) ) then v:delete()  end
    --
    -- Instead play-nice and just show the path.
    print( k .. " -> " .. v:path() )
end

This is all a bit ugly, but I've butchered some code together that works, and tried to solicit feedback from lumail users.

I'd write more but I'm tired, and intending to drink whisky and have an early night. Today I mostly replaced pipes in my attic. (Is it "attic", or is it "loft"? I keep alternating!) Fingers crossed this will mean a dry kitchen in the morning.

Syndicated 2014-11-22 21:39:28 from Steve Kemp's Blog

22 Nov 2014 dmarti   » (Master)

Why I'm not signing up for Google Contributor (or giving up on web advertising)

Making the rounds: Google’s New Service Kills Ads on Your Favorite Sites for a Monthly Fee. Basically, turn the ads into the thing that annoys the free users, wasting their bandwidth and screen space, until some of them go paid. You know, the way the crappy ads on Android apps work.

But the problem isn't advertising. The web is not the first medium where the audience gets stuff for free, or at an artificially low price. Cultural works and Journalism have been ad-supported for a long time. Sure, people like to complain about annoying ads, and they're uncomfortable about database marketing. But magazine readers don't tear ads out, and even Tivo-equipped TV viewers have low skip rates.

The problem is figuring out why today's web ads are so different, why ad blocking is on the way up, and how can a web ad work more like a magazine ad? From the article:

If people are going to gripe constantly about ads and having their personal data sold to advertisers, why not ask them to put a nominal amount of money where their mouths are?

Because that's not how people work. We don't pay other people to come into compliance with social norms. "Hey, you took my place in line...here's a dollar to switch back" doesn't happen. More:

It could save publishers who are struggling to stay afloat as ad dollars dwindle, while also giving consumers what they say they want.

You lost me at giving consumers what they say they want. When has that ever worked? People say all kinds of stuff. You have to watch what they do. What they do, offline, is enjoy high-value ad-supported content, with the ads. Why is the web so different? Why do people treat web ads more like email spam and less like offline ads? The faster we can figure out the ad blocking paradox, the faster we can move from annoying, low-value web ads to ads that pull their weight economically.

(More: Targeted Advertising Considered Harmful)

Syndicated 2014-11-22 16:34:02 from Don Marti

22 Nov 2014 Pizza   » (Master)

The current state of the Mitsubishi CP-D70DW, CP-D707DW, CP-K60DW-S, and CP-D80DW printers under Linux

The problem:

Over the last month or so, I've received on average two questions a week about these printers, mostly along the same lines of "I can't print with them, help!"

The short answer

They don't work with Linux, and this isn't likely to change anytime soon.

The longer answer

With the Mitsubishi CP-D70 and D707, If you use the Gutenprint 5.2.11-cvs code later than August 14, and the backend code from at least October 4, you will be able to successfully generate prints. The CP-K60 still won't print at all due to incomplete knowledge about printer backend protocol, and I have not seen what changes the CP-D80 incorporates.

Unfortunately, while the CP-D70 and D707 are able to successfully print, the output is all screwed up. The Windows drivers perform non-linear color scaling that requires gamma correction; this is annoying but would be straightforward to figure out, except the drivers are also doing some sort dithering.

How bad is this dithering? A test job with six nominal colors results in a printjob that contains over 18,000 (16-bit) color values. Even a simple "print a page with a single, pure color" job results in dozens (if not hundreds) of colors as the driver adjusts the intensity according to some unknown algorithm.

The pithy answer

Mitsubishi actually wrote Linux drivers for all of these (and other!) printers, but only provides them as part of their Kiosk solutions, not for normal end-users. So, don't reward manufacturers that snub Linux users, and support those who do.

The alternative answer

There are many competitive alternatives (both price-wise and performance-wise) which have solid support under Linux. In particular, here's what I'd recommend if you want a kiosk-class, workhorse photo printer:

  • DNP DS40, RX1, or DS80
  • Citizen CX, CX-W, or CY
  • Shinko S2145 / Sinfonia S2
  • Kodak 6800, 6850, or 605
  • Sony UP-DR150 or UP-DR200

Several other models from these manufacturers should (in theory) work okay, but the above represents a known-good list. Note the utter lack of any Mitsubishi models; as of this writing, none of their printers play well with Linux.

The pleading answer

In case anyone over at Mitsubishi is reading this, how about tossing me some documentation and a printer or three to play with? Proper Linux support will only help you sell more printers!

Syndicated 2014-11-22 13:35:03 from Solomon Peachy

21 Nov 2014 oubiwann   » (Journeyer)

ErlPort: Using Python from Erlang/LFE

This is a short little blog post I've been wanting to get out there ever since I ran across the erlport project a few years ago. Erlang was built for fault-tolerance. It had a goal of unprecedented uptimes, and these have been achieved. It powers 40% of our world's telecommunications traffic. It's capable of supporting amazing levels of concurrency (remember the 2007 announcement about the performance of YAWS vs. Apache?).

With this knowledge in mind, a common mistake by folks new to Erlang is to think these performance characteristics will be applicable to their own particular domain. This has often resulted in failure, disappointment, and the unjust blaming of Erlang. If you want to process huge files, do lots of string manipulation, or crunch tons of numbers, Erlang's not your bag, baby. Try Python or Julia.

But then, you may be thinking: I like supervision trees. I have long-running processes that I want to be managed per the rules I establish. I want to run lots of jobs in parallel on my 64-core box. I want to run jobs in parallel over the network on 64 of my 64-core boxes. Python's the right tool for the jobs, but I wish I could manage them with Erlang.

(There are sooo many other options for the use cases above, many of them really excellent. But this post is about Erlang/LFE :-)).

Traditionally, if you want to run other languages with Erlang in a reliable way that doesn't bring your Erlang nodes down with badly behaved code, you use Ports. (more info is available in the Interoperability Guide). This is what JInterface builds upon (and, incidentally, allows for some pretty cool integration with Clojure). However, this still leaves a pretty significant burden for the Python or Ruby developer for any serious application needs (quick one-offs that only use one or two data types are not that big a deal).

erlport was created by Dmitry Vasiliev in 2009 in an effort to solve just this problem, making it easier to use of and integrate between Erlang and more common languages like Python and Ruby. The project is maintained, and in fact has just received a few updates. Below, we'll demonstrate some usage in LFE with Python 3.

If you want to follow along, there's a demo repo you can check out:
I don't read a lot of non-fiction, but I decided to finally read this book having had it sit on the shelf for a few years. I'm glad I read it, but as someone who regularly eats in the US I am not sure if I should be glad or freaked out. The book is an interesting study in how industrialization without proper quality controls can have some pretty terrible side effects. I'm glad to live in a jurisdiction where we actively test for food quality and safety.

The book is a good read, and I'd recommend it to people without weak stomaches.

Tags for this post: book eric_schlosser food quality meat fast industrialized
Related posts: Dinner; Dishwasher Trout; Yum; 14 November 2003; Food recommendation; Generally poor audio quality on pod casts?

Comment

Syndicated 2014-11-16 01:43:00 from stillhq.com : Mikal, a geek from Canberra living in Silicon Valley (no blather posts)

14 Nov 2014 badvogato   » (Master)

i'm having a bad headache, thinking too hard about things long past.


To ease restless mind, I decided to publish " Out of the house of Burgess " - a book review written by ME!

14 Nov 2014 wingo   » (Master)

on yakshave, on color, on cosines, on glitchen

Hold on to your butts, kids, because this is epic.

on yaks

As in all great epics, our prideful, stubborn hero starts in a perfectly acceptable state of things, decides on a lark to make a small excursion, and comes back much much later to inflict upon you pictures from his journey.

So. I have a web photo gallery but I don't take many pictures these days. Dealing with photos is a bit of a drag, and the ways that are easier like Instagram or what-not give me the (peer, corporate, government: choose 3) surveillance hives. So, I had vague thoughts that I should update my web gallery. Yakpoint 1.

At the same time, my web gallery was written for mod_python on the server, and I don't like hacking in Python any more and kinda wanted to switch away from Apache. Yakpoint 2.

So I rewrote the server-side part in Scheme. (Yakpoint 3.) It worked fine but I found I needed the ability to get the dimensions of files on the server, so I wrote a quick-and-dirty JPEG parser. Yakpoint 4.

I needed EXIF data as well, as the original version displayed EXIF data, and for that I used a binding to libexif that I had written a few years ago when I thought about starting this project (Yakpoint -1). However I found some crashers in the library, because it had never really been tested in production, and instead of fixing them I said "what the hell, I'll just write an EXIF parser". (Yakpoint 5.) So I did and adapted the web gallery to use it (Yakpoint 6, for the adaptation.)

At this point, I looked back, and looked forward, and looked all around, and all was good, but what was with this uneasiness I was feeling? And indeed, I hadn't actually made anything better, and I wasn't taking more photos, and the workflow was the same.

I was also concerned about the client side of things, which was still in Python and using some breakage-prone legacy libraries to do the photo scaling and transformations and what-not, and relied on a desktop application (f-spot) of dubious future. So I started to look at what it would take to port that script to Scheme (Yakpoint 7). Well it used some legacy libraries to copy files over SSH (gnome-vfs; switching away from that would be Yakpoint 8) and I didn't want to make a Scheme GIO binding (Yakpoint 9, narrowly avoided), and I then -- and then, dear reader -- so then I said "well WTF my caching story on the server is crap anyway, I never know when the sqlite database has changed or not so I never know what responses I can cache, what I really want is a functional datastore" (Yakpoint 10), which is what I have with Git and Tekuti (Yakpoint of yore), and so why not just store my photos in Git like I do in Tekuti for blog posts and serve them from there, indexing as needed? Of course I'd need some other server software (Yakpoint of fore, by which I meantersay the future), but then I could just git push to update my photo gallery, and I wouldn't have to endure the horror that is GVFS shelling out to ssh in a FUSE daemon (Yakpoint of ne'er).

So. After mulling over these thoughts for a while I decided, during an autumnal walk on the Salève in which we had the greatest views of Mont Blanc everrrrr and yet where are the photos?, that really what I needed was new photo management software, not just a web gallery. I should be able to share photos from my phone or from my desktop, fix them up either place, tag and such, and OK woo hoo! Such is the future! And the present for many people? Thing is, I also needed good permissions management (Yakpoint what, 10 I guess?), because you know a dude just out of college is not the same as that dude many years later. Which means serving things over HTTPS (Yakpoints 11-47) in such a way that the app has some good control over who gets what.

Well. Anyway. My mind ran ahead, and runs ahead, and yet we haven't actually tasted the awesome sauce yet. So! The photo management software, whereever it lives, needs to rotate photos at least, and scale them down to a few resolutions. I smell a yak! I looked at jpegtran which can do some lossless rotations but it's not available as a library, which is odd; and really I don't like shelling out for core program functionality, because every time I deal with the file system it's the wild west of concurrent mutation. If naming things is one of the two hardest problems in computer science, the file system is the worst because you have to give a global name to every intermediate value.

At the same time to scale images, what was I to do? Make a binding to libjpeg? Well I started (Yakpoint 48) but for reals kids, libjpeg is not fun. It works great and is really clever but

  1. it's approximately impossible to use from a dynamic ffi; you want a compiler to verify that you are using the right structure definitions

  2. there has been an inane ABI and format break imposed by the official IJG libjpeg but which other implementations have not followed, but how could you know which one you are using?

  3. the error handling facility encourages longjmp in C programs; somewhat terrifying

  4. off-heap image manipulation libraries always interact poorly with GC, because the GC only sees the small pointer to the off-heap image, and so doesn't GC often enough

  5. I have zero guarantee that libjpeg won't change ABI in weird ways, and I don't want to touch this software for the next 10 years

  6. I want to do jpegtran-like lossless transformations, but that's not available as a library, and it's totes ridics that binding libjpeg does not help you out here

  7. it's still an unsafe C library, battle-tested yes, but terrifyingly unsafe, and I'd be putting it on my server and who knows?

Friends, I arrived at the pasture, and I, I chose the yak less shaven. I took my lame JPEG parser and turned it into a full decoder (Yakpoint 49), realized it wasn't much more work to do an encoder (Yakpoint 50), and implemented the lossless transformations (Yakpoint 51).

on haters

Before we go on, I know some people would think "what is this kid about". I mean, custom gallery software, a custom JPEG library of all things, all bespoke, why don't you just use off-the-shelf solutions? Why aren't you normal and use a normal language and what about the best practices and where's your business case and I can't go on about this because there's a technical term for people that say this kind of thing and it's "hater".

Thing is, when did a hater ever make anything cool? Come to think of it, when did a hater make anything at all? In my experience the most vocal haters have nothing behind their names except a long series of pseudonymous rants in other people's comment boxes. So friends, in the joyful spirit of earning-anew, let's talk about JPEG!

on color

JPEG is a funny thing. Photos are our lives and our memories, our first steps and our friends, and yet I for one didn't know very much about them. My mental model that "a JPEG is a rectangle of pixels" doesn't turn out to be quite right.

If you actually look in a normal JPEG, you see three planes of information. If I take this image, for example:

If I decode it, actually I get three images. Here's the first one:

This is just the greyscale version of the image. So, storytime! Remember black and white television? We had an old one that got moved around the house sometimes, like if Mom was working at something in the kitchen. We also had a color one in the living room, and you could watch one or the other and they showed the same stuff. Strange when you think about it though -- one being in color and the other not. Well it turns out that color was literally just added on, both historically and technically. The main broadcast was still in black and white, and then in one part of the frequency band there were separate color signals, which color TVs would pick up, mix with the black and white signal, and come out with color. Wikipedia notes that "color TV" was really just "colored TV", which is a phrase whose cleverness I respect. Big ups to the W P.

In the context of JPEG, this black-and-white signal is sometimes called "luma", but is more precisely called Y', where the "prime" (the apostrophe) indicates that the signal has gamma correction applied.

In the image above, I replaced the color planes (sometimes collectively called the "chroma") with zeroes, while losslessly keeping the luma. Below is the first color plane, with the Y' plane replaced with a uniform 50% luma, and the other color plane replaced with zeros.

This color signal is technically known as CB, which may be very imperfectly understood as the bluish component of the color. Well the original image wasn't very blue, so we don't see very much here.

Indeed, our eyes have a harder time seeing differences in color than differences in intensity. Apparently this goes all the way down to biology -- we have more receptors in our eyes for "black and white" and fewer for color.

Early broadcasters took advantage of this difference in perception by actually devoting more bandwidth in their broadcasts to luma than to chroma; if you check the Wikipedia page you will see that the area in the spectrum allocation devoted to color is much smaller than the area devoted to intensity. So it is in JPEG: the above image being half-width indicates that actually we're just encoding one CB sample for every two Y' samples.

Finally, here we have the CR color plane, which can loosely be thought of as the "redness" of the image.

These test images and crops preserve the actual encoding of this photo as it came from my camera, without re-encoding. That's partly why there's not much interesting going on; with the megapixels these days, it's hard to fit much of anything in a few hundred pixels square. This particular camera is sub-sampling in the horizontal direction, but it's also common to subsample vertically as well, producing color planes that are half-width and half-height. In my limited investigations I have found that cameras tend to sub-sample just in the X direction, producing what they call 4:2:2 images, and that standard software encoders subsample in both, producing 4:2:0.

Incidentally, properly scaling up the color planes is quite an irritating endeavor -- the standard indicates that the color is sampled between the locations of the Y' samples ("centered" chroma), but these images originally have EXIF data that indicates that the color samples are taken at the position of the first Y' sample ("co-sited" chroma). I'm pretty sure libjpeg doesn't delve into the EXIF to check this though, so it would seem that all renderings I have seen of these photos are subtly off.

But how do you get proper color out of these strange luma and chroma things? Well, the Y'CBCR colorspace is really just the same color cube as RGB, except rotated: the Y' axis traverses the diagonal from (0, 0, 0) (black) to (255, 255, 255) (white). CB and CR are perpendicular to that diagonal, pointing towards blue or red respectively. So to go back to RGB, you multiply by a matrix to rotate the cube.

It's not a very intuitive color system, as you can see from the images above. For one thing, at zero or full luma, the chroma axes have no meaning; black and white can have no hue. Indeed if you imagine trying to fit a cube corner-down into a similar-sized box, you end up either having empty space in the box, or you have to cut off corners from the cube, or both. Cut corners means that bits of the Y'CBCR signal are wasted; empty space means there are RGB colors that are not representable in Y'CBCR. I'm not sure, but I think both are true for the particular formulation of Y'CBCR used in JPEG.

There's more to say about color here but frankly I don't know enough to do so, even though I worked in digital video for many years. If this is something you are mildly interested in, I highly, highly recommend watching Wim Taymans' presentation at this year's GStreamer conference. He takes a look at color in video that is constructive, building up from biology through math to engineering. His is a principled approach rather than a list of rules. It really clarified a number of things for me (and opened doors to unknown unknowns beyond).

on cosines

Where were we? Right, JPEG. So the proper way to understand what JPEG is is to understand the encoding process. We've covered colorspace conversion from RGB to Y'CBCR and sub-sampling. Next, the image canvas is divided into equal-sized "macroblocks". (These are called "minimum coded units" (MCUs) in the JPEG context, but in video they are usually called macroblocks, and it's a better name.) Without sub-sampling, each macro-block will contain one 8-sample-by-8-sample block for each component (Y', CB, CR) of the image. In my images above, the canvas space corresponding to one chroma block is the space of two luma blocks, so the macroblocks will be 16 samples wide and 8 samples tall, and contain two Y' blocks and one each of CB and CR. If the image canvas can't be evenly divided into macroblocks, it is padded to fit, usually by duplicating the last column or row of samples.

Then to make a JPEG, each block is encoded separately, then the whole thing is just written out to a file, and you're done!

This description glosses over a couple of important points, but it's a good big-picture view to have in mind. The pipeline goes from RGB pixels, to a padded RGB canvas, to separate Y'CBCR planes, to a possibly subsampled set of those planes, to macroblocks, to encoded macroblocks, to the file. Decoding is the reverse. It's a totally doable, comprehensible thing, and that was one of the big takeaways for me from this project. I took photography classes in high school and it was really cool to see how to shoot, develop, and print film, and this is similar in many ways. The real "film" is raw-format data, which some cameras produce, but understanding JPEG is like understanding enlargers and prints and fixer baths and such things. It's smelly and dark but pretty cool stuff.

So, how do you encode a block? Well peoples, this is a kinda cool thing. Maybe you remember from some math class that, given n uniformly spaced samples, you can always represent that series as a sum of n cosine functions of equally spaced frequencies. In each litle 8-by-8 block, that's what we do: a "forward discrete cosine transformation" (FDCT), which is just multiplying together some matrices for every point in the block. The FDCT is completely separable in the X and Y directions, so the space of 8 horizontal coefficients multiplies by the space of 8 vertical coefficients at each column to yield 64 total coefficients, which is not coincidentally the number of samples in a block.

Funny thing about those coefficients: each one corresponds to a particular horizontal and vertical frequency. We can map these out as a space of functions; for example giving a non-zero coefficient to (0, 0) in the upper-left block of a 8-block-by-8-block grid, and so on, yielding a 64-by-64 pixel representation of the meanings of the individual coefficients. That's what I did in the test strip above. Here is the luma example, scaled up without smoothing:

The upper-left corner corresponds to a frequency of 0 in both X and Y. The lower-right is a frequency of 4 "hertz", oscillating from highest to lowest value in both directions four times over the 8-by-8 block. I'm actually not sure why there are some greyish pixels around the right and bottom borders; it's not a compression artifact, as I constructed these DCT arrays programmatically. Anyway. Point is, your lover's smile, your sunny days, your raw urban graffiti, your child's first steps, all of these are reified in your photos as a sum of cosine coefficients.

The odd thing is that what is reified into your pictures isn't actually all of the coefficients there are! Firstly, because the coefficients are rounded to integers. Mathematically, the FDCT is a lossless operation, but in the context of JPEG it is not because the resulting coefficients are rounded. And they're not just rounded to the nearest integer; they are probably quantized further, for example to the nearest multiple of 17 or even 50. (These numbers seem exaggerated, but keep in mind that the range of coefficients is about 8 times the range of the original samples.)

The choice of what quantization factors to use is a key part of JPEG, and it's subjective: low quantization results in near-indistinguishable images, but in middle compression levels you want to choose factors that trade off subjective perception with file size. A higher quantization factor leads to coefficients with fewer bits of information that can be encoded into less space, but results in a worse image in general.

JPEG proposes a standard quantization matrix, with one number for each frequency (coefficient). Here it is for luma:

(define *standard-luma-q-table*
  #(16 11 10 16 24 40 51 61
    12 12 14 19 26 58 60 55
    14 13 16 24 40 57 69 56
    14 17 22 29 51 87 80 62
    18 22 37 56 68 109 103 77
    24 35 55 64 81 104 113 92
    49 64 78 87 103 121 120 101
    72 92 95 98 112 100 103 99))

This matrix is used for "quality 50" when you encode an 8-bit-per-sample JPEG. You can see that lower frequencies (the upper-left part) are quantized less harshly, and vice versa for higher frequencies (the bottom right).

(define *standard-chroma-q-table*
  #(17 18 24 47 99 99 99 99
    18 21 26 66 99 99 99 99
    24 26 56 99 99 99 99 99
    47 66 99 99 99 99 99 99
    99 99 99 99 99 99 99 99
    99 99 99 99 99 99 99 99
    99 99 99 99 99 99 99 99
    99 99 99 99 99 99 99 99))

For chroma (CB and CR) we see that quantization is much more harsh in general. So not only will we sub-sample color, we will also throw away more high-frequency color variation. It's interesting to think about, but also makes sense in some way; again in photography class we did an exercise where we shaded our prints with colored pencils, and the results were remarkable. My poor, lazy coloring skills somehow rendered leaves lifelike in different hues of green; really though, they were shades of grey, colored in imprecisely. "Colored TV" indeed.

With this knowledge under our chapeaux, we can now say what the "JPEG quality" setting actually is: it's simply that pair of standard quantization matrices scaled up or down. Towards "quality 100", the matrix approaches all-ones, for no quantization, and thus minimal loss (though you still have some rounding, often subsampling as well, and RGB-to-Y'CBCR gamut loss). Towards "quality 0" they scale to a matrix full of large values, for harsh quantization.

This understanding also explains those wavey JPEG artifacts you get on low-quality images. Those artifacts look like waves because they are waves. They usually occur at sharp intensity transitions, which like a cymbal crash cause lots of high frequencies that then get harshly quantized. Incidentally I suspect (but don't know) that this is the same reason that cymbals often sound bad in poorly-encoded MP3s, because of harsh quantization in the frequency domain.

Finally, the coefficients are written out to a file as a stream of bits. Each file gets a huffman code allocated to it, which ideally is built from the distribution of quantized coefficient sizes seen in all of the blocks of an image. There are usually different encodings for luma and chroma, to reflect their different quantizations. Reading and writing this bitstream is a bit of a headache but the algorithm is specified in the JPEG standard, and all you have to do is implement it. Notably, though, there is special support for encoding a run of zero-valued coefficients, which happens often after quantization. There are rarely wavey bits in a blue blue sky.

on transforms

It's terribly common for photos to be wrongly oriented. Unfortunately, the way that many editors fix photo rotation is by setting a bit in the EXIF information of the JPEG. This is ineffectual, as web browsers don't look in the EXIF information, and silly, because it turns out you can losslessly rotate most JPEG images anyway.

Consider that the body of a JPEG is an array of macroblocks. To rotate an image, you just have to rearrange those macroblocks, then rearrange the blocks inside the macroblocks (e.g. swap the two Y' blocks in my above example), then transform the blocks themselves.

The lossless transformations that you can do on a block are transposition, vertical flipping, and horizontal flipping.

Transposition flips a block along its downward-sloping diagonal. To do so, you just swap the coefficients at (u, v) with the coefficients at (v, u). Easy peasey.

Flipping is trickier. Consider the enlarged DCT image from above. What would it take to horizontally flip the function at (0, 1)? Instead of going from light to dark, you want it to go from dark to light. Simple: you just negate the coefficients! But you only want to negate those coefficients that are "odd" in the X direction, which are those coefficients whose column is odd. And actually that's all there is to it. Flipping vertically is the same, but for coefficients whose row is odd.

I said "most images" above because those whose size is not evenly divided by the macroblock size can't be losslessly rotated -- you will end up seeing some of the hidden data that falls off the edge of the canvas. Oh well. Most raw images are properly dimensioned, and if you're downscaling, you already have to re-encode anyway.

But that's just flipping and transposition, you say! What about rotation? Well it turns out that you can express rotation in terms of these operations: rotating 90 degrees clockwise is just a transpose and a horizontal flip (in that order). Together, flipping horizontally, flipping vertically, and transposing form a group, in the same way that flipping and flopping form a group for mattresses. Yeah!

on scheme

I wrote this library in Scheme because that's my language of choice these days. I didn't run into any serious impedance mismatches; Guile has a generic multi-dimensional array facility that made it possible to express many of these operations as generic folds, unfolds, or maps over arrays. The huffman coding part was a bit irritating, but all in all things were pretty good. The speed is pretty bad, but I haven't optimized it at all, and it gives me a nice test case for the compiler. Anyway, it's been fun and it suits my needs. Check out the project page if you're interested. Yes, to shave a yak you have to get a bit bovine and smelly, but yaks live in awesome places!

Finally I will leave you with a glitch, one of many that I have produced over the last couple weeks. Comments and corrections welcome below. Happy hacking!

Syndicated 2014-11-14 16:49:13 from wingolog

14 Nov 2014 wingo   » (Master)

generators in firefox now twenty-two times faster

It's with great pleasure that I can announce that, thanks to Mozilla's Jan de Mooij, the new ES6 generator functions are twenty-two times faster in Firefox!

Some back-story, for the unawares. There's a new version of JavaScript coming, ECMAScript 6 (ES6). Among the new features that ES6 brings are generator functions: functions that can suspend. Firefox's JavaScript engine, SpiderMonkey, has had support for generators for many years, long before other engines. This support was upgraded to the new ES6 standard last year, thanks to sponsorship from Bloomberg, and was shipped out to users in Firefox 26.

The generators implementation in Firefox 26 was quite basic. As you probably know, modern JavaScript implementations have a number of tiered engines. In the case of SpiderMonkey there are three tiers: the interpreter, the baseline compiler, and the optimizing compiler. Code begins execution in the interpreter, which is the quickest engine to start. If a piece of code is hot -- meaning that lots of time is being spent there -- then it will "tier up" to the next level, where it is analyzed, possibly optimized, and then compiled to machine code.

Unfortunately, generators in SpiderMonkey have always been stuck at the lowest tier, the interpreter. This is because of SpiderMonkey's choice of implementation strategy for generators. Generators were implemented as "floating interpreter stack frames": heap-allocated objects whose shape was exactly the same as a stack frame in the interpreter. This had the advantage of being fairly cheap to implement in the beginning, but ultimately it made them unable to take advantage of JIT compilation, as JIT code runs on its own stack which has a different layout. The previous strategy also relied on trampolining through a helper written in C++ to resume generators, which killed optimization opportunities.

The solution was to represent suspended generator objects as snapshots of the state of a stack frame, instead of as stack frames themselves. In order for this to be efficient, last year we did a number of block scope optimizations to try and reduce the amount of state that a generator frame would have to restore. Finally, around March of this year we were at the point where we could refactor the interpreter to implement generators on the normal interpreter stack, with normal interpreter bytecodes, with the vision of being able to JIT-compile those bytecodes.

I ran out of time before I could land that patchset; although the patches were where we wanted to go, they actually caused generators to be even slower and so they languished in Bugzilla for a few more months. Sad monkey. It was with delight, then, that a month or so ago I saw that SpiderMonkey JIT maintainer Jan de Mooij was interested in picking up the patches. Since then he has been hacking off and on at getting my old patches into shape, and ended up applying them all.

He went further, optimizing stack frames to not reserve space for "aliased" locals (locals allocated on the scope chain), speeding up object literal creation in the baseline compiler and finally has implemented baseline JIT compilation for generators.

So, after all of that perf nargery, what's the upshot? Twenty-two times faster! In this microbenchmark:

function *g(n) {
    for (var i=0; i<n; i++)
        yield i;
}
function f() {
    var t = new Date();
    var it = g(1000000);
    for (var i=0; i<1000000; i++)
	it.next();
    print(new Date() - t);
}
f();

Before, it took SpiderMonkey 980 milliseconds to complete on Jan's machine. After? Only 43! It's actually marginally faster than V8 at this point, which has (temporarily, I think) regressed to 45 milliseconds on this test. Anyway. Competition is great and as a committer to both projects I find it very satisfactory to have good implementations on both sides.

As in V8, in SpiderMonkey generators cannot yet reach the highest tier of optimization. I'm somewhat skeptical that it's necessary, too, as you expect generators to suspend fairly frequently. That said, a yield point in a generator is, from the perspective of the optimizing compiler, not much different from a call site, in that it causes all locals to be saved. The difference is that locals may have unboxed representations, so we would have to box those values when saving the generator state, and unbox on restore.

Thanks to Bloomberg for funding the initial work, and big, big thanks to Mozilla's Jan de Mooij for picking up where we left off. Happy hacking with generators!

Syndicated 2014-11-14 08:41:08 from wingolog

13 Nov 2014 joey   » (Master)

on leaving

I left Debian. I don't really have a lot to say about why, but I do want to clear one thing up right away. It's not about systemd.

As far as systemd goes, I agree with my friend John Goerzen:

I promise you – 18 years from now, it will not matter what init Debian chose in 2014. It will probably barely matter in 3 years.

read the rest

And with Jonathan Corbet:

However things turn out, if it becomes clear that there is a better solution than systemd available, we will be able to move to it.

read the rest

I have no problem with trying out a piece of Free Software, that might have abrasive authors, all kinds of technical warts, a debatable design, scope creep etc. None of that stopped me from giving Linux a try in 1995, and I'm glad I jumped in with both feet.

It's important to be unafraid to make a decision, try it out, and if it doesn't work, be unafraid to iterate, rethink, or throw a bad choice out. That's how progress happens. Free Software empowers us to do this.

Debian used to be a lot better at that than it is now. This seems to have less to do with the size of the project, and more to do with the project having aged, ossified, and become comfortable with increasing layers of complexity around how it makes decisions. To the point that I no longer feel I can understand the decision-making process at all ... or at least, that I'd rather be spending those scarce brain cycles on understanding something equally hard but more useful, like category theory.

It's been a long time since Debian was my main focus; I feel much more useful when I'm working in a small nimble project, making fast and loose decisions and iterating on them. Recent events brought it to a head, but this is not a new feeling. I've been less and less involved in Debian since 2007, when I dropped maintaining any packages I wasn't the upstream author of, and took a year of mostly ignoring the larger project.

Now I've made the shift from being a Debian developer to being an upstream author of stuff in Debian (and other distros). It seems best to make a clean break rather than hang around and risk being sucked back in.

My mailbox has been amazing over the past week by the way. I've heard from so many friends, and it's been very sad but also beautiful.

Syndicated 2014-11-13 18:59:39 from see shy jo

12 Nov 2014 joey   » (Master)

continuing to be pleasantly surprised

Free software has been my career for a long time -- nothing else since 1999 -- and it continues to be a happy surprise each time I find a way to continue that streak.

The latest is that I'm being funded for a couple of years to work part-time on git-annex. The funding comes from the DataLad project, which was recently awarded by National Science Foundation. DataLad folks (at Dartmouth College and at Magdeburg University in Germany) are working on providing easy access to scientific data (particularly neuroimaging). So git-annex will actually be used for science!

I'm being funded for around 30 hours of work each month, to do general work on the git-annex core (not on the webapp or assistant). That includes bugfixes and some improvements that are wanted for DataLad, but are all themselves generally useful. (see issue list)

This is enough to get by on, at least in my current living situation. It would be great if I could find some funding for my other work time -- but it's also wonderful to have the flexibility to spend time on whatever other interesting projects I might want to.

Syndicated 2014-11-12 20:33:47 from see shy jo

12 Nov 2014 dmarti   » (Master)

Interest dashboard?

Mozilla's Interest Dashboard is out. Darren Herman writes,

The Content Services team is working to reframe how users are understood on the Internet: how content is presented to them, how they can signal what they are interested in, how they can take control of the kinds of adverts they are exposed to.

This is vintage Internet woo-woo that will fly with real advertisers about as well as, I don't know, CueCats.

Seriously? "take control of the kinds of adverts they are exposed to?"

I want to see ads for scientific apparatus, precision machine tools, spacecraft, and genetically engineered crops, because those ads tend to be interesting. But I don't actually buy any of that stuff. So advertisers rightly don't care what I'm interested in? They care what I'll actually buy. (burritos, coffee, and adapters for connecting one computer thingy to some other computer thingy.)

The "users will take control of advertising" fantasies are just as silly as the "Big Data will enable ads that don't have to pay for decent content" fantasies from the other side.

We had an advertising model that worked, based on signaling by supporting content. Mozilla can still help us fix Internet advertising by fixing their privacy bugs, left over from Netscape days. The brokenness of Internet advertising is because of flaws in things that the browser needs to get right, not because of missing features.

More on all this: Targeted Advertising Considered Harmful

Syndicated 2014-11-12 14:14:15 from Don Marti

12 Nov 2014 Hobart   » (Journeyer)

Rosetta / Philae comet landing livestream

http://new.livestream.com/esa/cometlanding

Fingers crossed for the #forth powered lander :)

Syndicated 2014-11-12 15:03:34 from jon's blog

12 Nov 2014 bagder   » (Master)

Keyboard key frequency

A while ago I wrote about my hunt for a new keyboard, and in my follow-up conversations with friends around that subject I quickly came to the conclusion I should get myself better analysis and data on how I actually use a keyboard and the individual keys on it. And if you know me, you know I like (useless) statistics.

Func KB-460 keyboardSo, I tried out the popular and widely used Linux key-logger software ‘logkeys‘ and immediately figured out that it doesn’t really support the precision and detail level I wanted so I forked the project and modified the code to work the way I want it: keyfreq was born. Code on github. (I forked it because I couldn’t find any way to send back my modifications to the upstream project, I don’t really feel a need for another project.)

Then I fired up the logging process and it has been running in the background for a while now, logging every key stroke with a time stamp.

Counting key frequency and how it gets distributed very quickly turns into basically seeing when I’m active in front of the computer and it also gave me thoughts around what a high key frequency actually means in terms of activity and productivity. Does a really high key frequency really mean that I was working intensely or isn’t that purpose more a sign of mail sending time? When I debug problems or research details, won’t those periods result in slower key activity?

In the end I guess that over time, the key frequency chart basically says that if I have pressed a lot of keys during a period, I was working on something then. Hours or days with a very low average key frequency are probably times when I don’t work as much.

The weekend key frequency is bound to be slightly wrong due to me sometimes doing weekend hacking on other computers where I don’t log the keys since my results are recorded from a single specific keyboard only.

Conclusions

So what did I learn? Here are some conclusions and results from 1276614 keystrokes done over a period of the most recent 52 calendar days.

I have a 105-key keyboard, but during this period I only pressed 90 unique keys. Out of the 90 keys I pressed, 3 were pressed more than 5% of the time – each. In fact, those 3 keys are more than 20% of all keystrokes. Those keys are: <Space>, <Backspace> and the letter ‘e’.

<Space> stands out from all the rest as it has been used more than 10%.

Only 29 keys were used more than 1% of the presses, giving this a really long tail with lots of keys hardly ever used.

Over this logged time, I have registered key strokes during 46% of all hours. Counting only the hours in which I actually used the keyboard, the average number of key strokes were 2185/hour, 36 keys/minute.

The average week day (excluding weekend days), I registered 32486 key presses. The most active sinngle minute during this logging period, I hit 405 keys. The most active single hour I managed to do 7937 key presses. During weekends my activity is much lower, and then I average at 5778 keys/day (7.2% of all activity were weekends).

When counting most active hours over the day, there are 14 hours that have more than 1% activity and there are 5 with less than 1%, leaving 5 hours with no keyboard activity at all (02:00- 06:59). Interestingly, the hour between 23-24 at night is the single most busy hour for me, with 12.5% of all keypresses during the period.

Random “anecdotes”

Longest contiguous time without keys: 26.4 hours

Longest key sequence without backspace: 946

There are 7 keys I only pressed once during this period; 4 of them are on the numerical keypad and the other three are F10, F3 and <Pause>.

More

I’ll try to keep the logging going and see if things change over time or if there later might end up things that can be seen in the data when looked over a longer period.

Syndicated 2014-11-12 08:19:13 from daniel.haxx.se

12 Nov 2014 jas   » (Master)

Dice Random Numbers

Generating data with entropy, or random number generation (RNG), is a well-known difficult problem. Many crypto algorithms and protocols assumes random data is available. There are many implementations out there, including /dev/random in the BSD and Linux kernels and API calls in crypto libraries such as GnuTLS or OpenSSL. How they work can be understood by reading the source code. The quality of the data depends on actual hardware and what entropy sources were available — the RNG implementation itself is deterministic, it merely convert data with supposed entropy from a set of data sources and then generate an output stream.

In some situations, like on virtualized environments or on small embedded systems, it is hard to find sources of sufficient quantity. Rarely are there any lower-bound estimates on how much entropy there is in the data you get. You can improve the RNG issue by using a separate hardware RNG, but there is deployment complexity in that, and from a theoretical point of view, the problem of trusting that you get good random data merely moved from one system to another. (There is more to say about hardware RNGs, I’ll save that for another day.)

For some purposes, the available solutions does not inspire enough confidence in me because of the high complexity. Complexity is often the enemy of security. In crypto discussions I have said, only half-jokingly, that about the only RNG process that I would trust is one that I can explain in simple words and implement myself with the help of pen and paper. Normally I use the example of rolling a normal six-sided dice (a D6) several times. I have been thinking about this process in more detail lately, and felt it was time to write it down, regardless of how silly it may seem.

A dice with six sides produces a random number between 1 and 6. It is relatively straight forward to intuitively convinced yourself that it is not clearly biased: inspect that it looks symmetric and do some trial rolls. By repeatedly rolling the dice, you can generate how much data you need, time permitting.

I do not understand enough thermodynamics physics to know how to estimate the amount of entropy of a physical process, so I need to resort to intuitive arguments. It would be easy to just assume that a dice produces 3 bits of entropy, because 2^3=6 which matches the number of possible outcomes. At least I find it easy to convince myself that 3 bits is the upper bound. I suspect that most dice have some form of defect, though, which leads to a very small bias that could be found with a large number of rolls. Thus I would propose that the amount of entropy of most D6’s are slightly below 3 bits on average. Further, to establish a lower bound, and intuitively, it seems easy to believe that if the entropy of particular D6 would be closer to 2 bits than to 3 bits, this would be noticeable fairly quickly by trial rolls. That assumes the dice does not have complex logic and machinery in it that would hide the patterns. With the tinfoil hat on, consider a dice with a power source and mechanics in it that allowed it to decide which number it would land on: it could generate seamingly-looking random pattern that still contained 0 bits of entropy. For example, suppose a D6 is built to produce the pattern 4, 1, 4, 2, 1, 3, 5, 6, 2, 3, 1, 3, 6, 3, 5, 6, 4, … this would mean it produces 0 bits of entropy (compare the numbers with the decimals of sqrt(2)). Other factors may also influence the amount of entropy in the output, consider if you roll the dice by just dropping straight down from 1cm/1inch above the table. With this discussion as background, and for simplicity, going forward, I will assume that my D6 produces 3 bits of entropy on every roll.

We need to figure out how many times we need to roll it. I usually find myself needing a 128-bit random number (16 bytes). Crypto algorithms and protocols typically use power-of-2 data sizes. 64 bits of entropy results in brute-force attacks requiring about 2^64 tests, and for many operations, this is feasible with today’s computing power. Performing 2^128 operations does not seem possible with today’s technology. To produce 128 bits of entropy using a D6 that produces 3 bits of entropy per roll, you need to perform ceil(128/3)=43 rolls.

We also need to design an algorithm to convert the D6 output into the resulting 128-bit random number. While it would be nice from a theoretical point of view to let each and every bit of the D6 output influence each and every bit of the 128-bit random number, this becomes difficult to do with pen and paper. For simplicity, my process will be to write the binary representation of the D6 output on paper in 3-bit chunks and then read it up as 8-bit chunks. After 8 rolls, there are 24 bits available, which can be read up as 3 distinct 8-bit numbers. So let’s do this for the D6 outputs of 3, 6, 1, 1, 2, 5, 4, 1:

3   6   1   1   2   5   4   1
011 111 001 001 010 101 010 001
01111100 10010101 01010001
124 0x7C 149 0x95 81 0x51

After 8 rolls, we have generated the 3 byte hex string “7C9551″. I repeat the process 5 more times, concatenating the strings, resulting in a hex string with 15 bytes of data. To get the last byte, I only need to roll the D6 three more times, where the two high bits of the last roll is used and the lowest bit is discarded. Let’s say the last D6 outputs were 4, 2, 3, this would result in:

4   2   3
100 010 011
10001001
137 0x89

So the 16 bytes of random data is “7C9551..89″ with “..” replaced by the 5 pieces of 3-byte chunks of data.

So what’s the next step? Depends on what you want to use the random data for. For some purposes, such as generating a high-quality 128-bit AES key, I would be done. The key is right there. To generate a high-quality ECC private key, you need to generate somewhat more randomness (matching the ECC curve size) and do a couple of EC operations. To generate a high-quality RSA private key, unfortunately you will need much more randomness, at the point where it makes more sense to implement a PRNG seeded with a strong 128-bit seed generated using this process. The latter approach is the general solution: generate 128 bits of data using the dice approach, and then seed a CSPRNG of your choice to get large number of data quickly. These steps are somewhat technical, and you lose the pen-and-paper properties, but code to implement these parts are easier to verify compared to verifying that you get good quality entropy out of your RNG implementation.

flattr this!

Syndicated 2014-11-11 23:36:02 from Simon Josefsson's blog

11 Nov 2014 titus   » (Journeyer)

Some thoughts on Journal Impact Factor

A colleague just e-mailed me to ask me how I felt about journal impact factor being such a big part of the Academic Ranking of World Universities - they say that 20% of the ranking weight comes from # of papers published in Nature and Science. So what do I think?

On evaluations

I'm really not a big fan of rankings and evaluations in the first place. This is largely because I feel that evaluations are rarely objective. For one very specific example, last year at MSU I got formal evaluations from both of my departments. Starting with the same underlying data (papers published, students graduated, grants submitted/awarded, money spent, classes taught, body mass/height ratio, gender weighting, eye color, miles flown, talks given, liquid volume of students' tears generated, committees served on, etc.) one department gave me a "satisfactory/satisfactory/satisfactory" while the other department gave me an "excellent/excellent/excellent." What fun! (I don't think the difference was the caliber in departments.)

Did I mention that these rankings helped determine my raise for the year?

Anyhoo, I find the rating and ranking scheme within departments at MSU to be largely silly. It's done in an ad hoc manner by untrained people, and as far as I can tell, is biased against people who are not willing to sing their own praises. (I brought up the Dunning-Kruger effect in my last evaluation meeting. Heh.) That's not to say there's not serious intent -- they are factored into raises, and at least one other purpose of evaluating assistant professors is so that once you fire their ass (aka "don't give them tenure") there's a paper trail of critical evaluations where you explained that they were in trouble.

Metrics are part of the job, though; departments evaluate their faculty so they can see who, if anyone, needs help or support or mentoring, and to do this, they rely at least in part on metrics. Basically, if someone has lots of funding and lots of papers, they're probably not failing miserably at being a research professor; if they're grant-poor and paper-poor, they're targets for further investigation. There are lots of ways to evaluate, but metrics seem like an inextricable part of it.

Back to the impact factor

Like faculty evaluations, ranking by the impact factor of the journals that university faculty publish in is an attempt to predict future performance using current data.

But impact factor is extremely problematic for many reasons. It's based on citations, which (over the long run) may be an OK measure of impact, but are subject to many confounding factors, including field-specific citation patterns. It's an attempt to predict the success of individual papers on a whole-journal basis, which falls apart in the face of variable editorial decisions. High-impact journals are also often read more widely by people than low-impact journals, which yields a troubling circularity in terms of citation numbers (you're more likely to cite a paper you've read!) Worse, the whole system is prone to being gamed in various ways, which is leading to high rates of retractions for high-impact journals, as well as outright fraud.

Impact factor is probably a piss-poor proxy for paper impact, in other words.

If impact factor was just a thing that didn't matter, I wouldn't worry. The real trouble is that impact factors have real-world effect - many countries use impact factor of publications as a very strong weight in funding and promotion decisions. Interestingly, the US is not terribly heavy handed here - most universities seem pretty enlightened about considering the whole portfolio of a scientist, at least anecdotally. But I can name a dozen countries that care deeply about impact factor for promotions and raises.

And apparently impact factor affects university rankings, too!

Taking a step back, it's not clear that any good ranking scheme can exist, and if it does, we're certainly not using it. All of this is a big problem if you care about fostering good science.

The conundrum is that many people like rankings, and it seems futile to argue against measuring and ranking people and institutions. However, any formalized ranking system can be gamed and perverted, which ends up sometimes rewarding the wrong kind of people, and shutting out some of the right kind of people. (The Reed College position on the US News & World Report ranking system is worth reading here.) More generally, in any ecosystem, the competitive landscape is evolving, and a sensible measure today may become a lousy measure tomorrow as the players evolve their strategies; the stricter the rules of evaluation, and the more entrenched the evaluation system, the less likely it is to adapt, and the more misranking will result. So ranking systems need to evolve continuously.

At its heart, this is a scientific management challenge. Rankings and metrics do pretty explicitly set the landscape of incentives and competition. If our goal in science is to increase knowledge for the betterment of mankind, then the challenge for scientific management is to figure out how to incentive behaviors that trend in that direction in the long term. If you use bad or outdated metrics, then you incentivize the wrong kind of behavior, and you waste precious time, energy, and resources. Complicating this is the management structure of academic science, which is driven by many things that include rankings and reputation - concepts that range from precise to fuzzy.

My position on all of this is always changing, but it's pretty clear that the journal system is kinda dumb and rewards the wrong behavior. (For the record, I'm actually a big fan of publications, and I think citations are probably not a terribly bad measure of impact when measured on papers and individuals, although I'm always happy to engage in discussions on why I'm wrong.) But the impact factor is especially horrible. The disproportionate effect that high-IF glamour mags like Cell, Nature and Science have on our scientific culture is clearly a bad thing - for example, I'm hearing more and more stories about editors at these journals warping scientific stories directly or indirectly to be more press-worthy - and when combined with the reproducibility crisis I'm really worried about the short-term future of science. Journal Impact Factor and other simple metrics are fundamentally problematic and are contributing to the problem, along with the current peer review culture and a whole host of other things. (Mike Eisen has written about this a lot; see e.g. this post.)

In the long term I think a much more experimental culture of peer review and alternative metrics will emerge. But what do we do for now?

More importantly:

How can we change?

I think my main advice to faculty is "lead, follow, or get out of the way."

Unless you're a recognized big shot, or willing to take somewhat insane gambles with your career, "leading" may not be productive - following or getting out of the way might be best. But there are a lot of things you can do here that don't put you at much risk, including:

  • be open to a broader career picture when hiring and evaluating junior faculty;
  • argue on behalf of alternative metrics in meetings on promotion and tenure;
  • use sites like Google Scholar to pick out some recent papers to read in depth when hiring faculty and evaluating grants;
  • avoid making (or push back at) cheap shots at people who don't have a lot of high-impact-factor papers;
  • invest in career mentoring that is more nuanced than "try for lots of C-N-S papers or else" - you'd be surprised how often this is the main advice assistant professors take away...
  • believe in and help junior faculty that seem to have a plan, even if you don't agree with the plan (or at least leave them alone ;)

What if you are a recognized big shot? Well, there are lots of things you can do. You are the people who set the tone in the community and in your department, and it behooves you to think scientifically about the culture and reward system of science. The most important thing you can do is think and investigate. What evidence is there behind the value of peer review? Are you happy with C-N-S editorial policies, and have you talked to colleagues who get rejected at the editorial review stage more than you do? Have you thought about per-article metrics? Do you have any better thoughts on how to improve the system than 'fund more people', and how would you effect changes in this direction by recognizing alternate metrics during tenure and grant review?

The bottom line is that the current evaluation systems are the creation of scientists, for scientists. It's our responsibility to critically evaluate them, and perhaps evolve them when they're inadequate; we shouldn't just complain about how the current system is broken and wait for someone else to fix it.

Addendum: what would I like to see?

Precisely predicting the future importance of papers is obviously kind of silly - see this great 1994 paper by Gans and Shepherd on rejected classics papers, for example -- and is subject to all sorts of confounding effects. But this is nonetheless what journals are accustomed to doing: editors at most journals, especially the high impact factor ones, select papers based on projected impact before sending them out for review, and/or ask the reviewers to review impact as well.

So I think we should do away with impact review and review for correctness instead. This is why I'm such a big fan of PLOS One and PeerJ, who purport to do exactly that.

But then, I get asked, what do we do about selecting out papers to read? Some (many?) scientists claim that they need the filtering effect of these selective journals to figure out what they should be reading.

There are a few responses to this.

First, it's fundamentally problematic to outsource your attention to editors at journals, for reasons mentioned above. There's some evidence that you're being drawn into a manipulated and high-retraction environment by doing that, and that should worry you.

But let's say you feel you need something to tell you what to read.

Well, second, this is technologically solvable - that's what search engines already do. There's a whole industry of search engines that give great results based on integrating free text search, automatic content classification, and citation patterns. Google Scholar does a great job here, for example.

Third, social media (aka "people you know") provides some great recommendation systems! People who haven't paid much attention to Twitter or blogging may not have noticed, but in addition to person-to-person recommendations, there are increasingly good recommendation systems coming on line. I personally get most of my paper recs from online outlets (mostly people I follow, but I've found some really smart people to follow on Twitter!). It's a great solution!

Fourth, if one of the problems is that many journals review for correctness AND impact together, why not separate them? For example, couldn't journals like Science or Nature evolve into literature overlays that highlight papers published in impact-blind journals like PLOS One or PeerJ? I can imagine a number of ways that this could work, but if we're so invested in having editors pick papers for us, why not have them pick papers that have been reviewed for scientific correctness first, and then elevate them to our attention with their magic editorial pen?

I don't see too many drawbacks to this vs the current approach, and many improvements. (Frankly this is where I see most of scientific literature going, once preprint archives become omnipresent.)

So that's where I want and expect to see things going. I don't see ranking based on predicted impact going away, but I'd like to see it more reflective of actual impact (and be measured in more diverse ways).

--titus

p.s. People looking for citations of high retraction rate, problematic peer review, and the rest could look at one of my earlier blog posts on problems with peer review. I'd be interested in more citations, though!

Syndicated 2014-11-10 23:00:00 from Living in an Ivory Basement

10 Nov 2014 crhodes   » (Master)

reproducible builds - a month ahead of schedule

I think this might be my last blog entry on the subject of building SBCL for a while.

One of the premises behind SBCL as a separate entity from CMUCL, its parent, was to make the result of its build be independent of the compiler used to build it. To a world where separate compilation is the norm, the very idea that building some software should persistently modify the state of the compiler probably seems bizarre, but the Lisp world evolved in that way and Lisp environments (at least those written in themselves) developed build recipes where the steps to construct a new Lisp system from an old one and the source code would depend critically on internal details of both the old and the new one: substantial amounts of introspection on the build host were used to bootstrap the target, so if the details revealed by introspection were no longer valid for the new system, there would need to be some patching in the middle of the build process. (How would you know whether that was necessary? Typically, because the build would fail with a more-or-less – usually more – cryptic error.)

Enter SBCL, whose strategy is essentially to use the source files first to build an SBCL!Compiler running in a host Common Lisp implementation, and then to use that SBCL!Compiler to compile the source files again to produce the target system. This requires some contortions in the source files: we must write enough of the system in portable Common Lisp so that an arbitrary host can execute SBCL!Compiler to compile SBCL-flavoured sources (including the standard headache-inducing (defun car (list) (car list)) and similar, which works because SBCL!Compiler knows how to compile calls to car).

How much is “enough” of the system? Well, one answer might be when the build output actually works, at least to the point of running and executing some Lisp code. We got there about twelve years ago, when OpenMCL (as it was then called) compiled SBCL. And yet... how do we know there aren't odd differences that depend on the host compiler lurking, which will not obviously affect normal operation but will cause hard-to-debug trouble later? (In fact there were plenty of those, popping up at inopportune moments).

I’ve been working intermittently on dealing with this, by attempting to make the Common Lisp code that SBCL!Compiler is written in sufficiently portable that executing it on different implementations generates bitwise-identical output. Because then, and only then, can we be confident that we are not depending in some unforseen way on a particular implementation-specific detail; if output files are different, it might be a harmless divergence, for example a difference in ordering of steps where neither depends on the other, or it might in fact indicate a leak from the host environment into the target. Before this latest attack at the problem, I last worked on it seriously in 2009, getting most of the way there but with some problems remaining, as measured by the number of output files (out of some 330 or so) whose contents differed depending on which host Common Lisp implementation SBCL!Compiler was running on.

Over the last month, then, I have been slowly solving these problems, one by one. This has involved refining what is probably my second most useless skill, working out what SBCL fasl files are doing by looking at their contents in a text editor, and from that intuiting the differences in the implementations that give rise to the differences in the output files. The final pieces of the puzzle fell into place earlier this week, and the triumphant commit announces that as of Wednesday all 335 target source files get compiled identically by SBCL!Compiler, whether that is running under Clozure Common Lisp (32- or 64-bit versions), CLISP, or a different version of SBCL itself.

Oh but wait. There is another component to the build: as well as SBCL!Compiler, we have SBCL!Loader, which is responsible for taking those 335 output files and constructing from them a Lisp image file which the platform executable can use to start a Lisp session. (SBCL!Loader is maybe better known as “genesis”; but it is to load what SBCL!Compiler is to compile-file). And it was slightly disheartening to find that despite having 335 identical output files, the resulting cold-sbcl.core file differed between builds on different host compilers, even after I had remembered to discount the build fingerprint constructed to be different for every build.

Fortunately, the actual problem that needed fixing was relatively small: a call to maphash, which (understandably) makes no guarantees about ordering, was used to affect the Lisp image data directly. I then spent a certain amount of time being thoroughly confused, having managed to construct for myself a Lisp image where the following forms executed with ... odd results:

  (loop for x being the external-symbols of "CL" count 1)
; => 1032
(length (delete-duplicates (loop for x being the external-symbols of "CL" collect x)))
; => 978

(shades of times gone by). Eventually I realised that

  (unless (member (package-name package) '("COMMON-LISP" "KEYWORD" :test #'string=))
  ...)

was not the same as

  (unless (member (package-name package) '("COMMON-LISP" "KEYWORD") :test #'string=)
  ...)

and all was well again, and as of this commit the cold-sbcl.core output file is identical no matter the build host.

It might be interesting to survey the various implementation-specific behaviours that we have run into during the process of making this build completely repeatable. The following is a probably non-exhaustive list – it has been twelve years, after all – but maybe is some food for thought, or (if you have a particularly demonic turn of mind) an ingredients list for a maximally-irritating CL implementation...

  • Perhaps most obviously, various constants are implementation-defined. The ones which caused the most trouble were undoubtably most-positive-fixnum and most-negative-fixnum – particularly since they could end up being used in ways where their presence wasn’t obvious. For example, (deftype fd () `(integer 0 ,most-positive-fixnum)) has, in the SBCL build process, a subtly different meaning from (deftype fd () (and fixnum unsigned-byte)) – in the second case, the fd type will have the intended meaning in the target system, using the target’s fixnum range, while in the first case we have no way of intercepting or translating the host’s value of most-positive-fixnum. Special mentions go to array-dimension-limit, which caused Bill Newman to be cross on the Internet, and to internal-time-units-per-second; I ended up tracking down one difference in output machine code from a leak of the host’s value of that constant into target code.
  • Similarly, random and sxhash quite justifiably differ between implementations. The practical upshot of that is that these functions can’t be used to implement a cache in SBCL!Compiler, because the access patterns and hence the patterns of cache hits and misses will be different depending on the host implementation.
  • As I’ve already mentioned, maphash does not iterate over hash-table contents in a specified order, and in fact that order need not be deterministic; similarly, with-package-iterator can generate symbols in arbitrary orders, and set operations (intersection, set-difference and friends) will return the set as a list whose elements are in an arbitrary order. Incautious use of these functions tended to give rise to harmless but sometimes hard-to-diagnose differences in output; the solution was typically to sort the iteration output before operating on any of it, to introduce determinism...
  • ... but it was possible to get that wrong in a harder-to-detect way, because sort isnot specified to be stable. In some implementations, it actually is a stable sort in some conditions, but for cases where it’s important to preserve an already-existing partial order, stable-sort is the tool for the job.
  • The language specification explicitly says that the initial contents of uninitialized arrays are undefined. In most implementations, at most times, executing (make-array 8 :element-type (unsigned-byte 8)) will give a zero-filled array, but there are circumstances in some implementations where the returned array will have arbitrary data.
  • Not only are some constants implementation-defined, but so also are the effects of normal operation on some variables. *gensym-counter* is affected by macroexpansion if the macro function calls gensym, and implementations are permitted to macroexpand macros an arbitrary number of times. That means that our use of gensym needs to be immune to whatever the host implementation’s macroexpansion and evaluation strategy is.
  • The object returned by byte to represent a bitfield with size and position is implementation-defined. Implementations (variously) return bitmasks, conses, structures, vectors; host return values of byte must not be used during the execution of SBCL!Compiler. More subtly, the various boole-related constants (boole-and and friends) also need special treatment; at one point, their host values were used when SBCL!Compiler compiled the boole function itself, and it so happens that CLISP and SBCL both represent the constants as integers between 0 and 15... but with a different mapping between operation and integer.
  • my last blog entry talked about constant coalescing, and about printing of (quote foo). In fact printing in general has been a pain, and there are still significant differences in interpretation or at least in implementation of pretty-printing: to the extent that at one point we had to minimize printing at all in order for the build to complete under some implementations.
  • there are a number of things which are implementation-defined but have caused a certain amount of difficulty. Floating point in general is a problem, not completely solved (SBCL will not build correctly if its host doesn’t have distinct single- and double-float types that are at least approximately IEEE754-compliant). Some implementations lack denormalized numbers; some do not expose signed zeros to the user; and some implementations compute (log 2d0 10d0) more accurately than others, including SBCL itself, do. The behaviour of the host implementation on legal but dubious code is also potentially tricky: SBCL’s build treats full warnings as worthy of stopping, but some hosts emit full warnings for constructs that are tricky to write in other ways: for example to write portable code to handle multiple kinds of string, one might write (typecase string (simple-base-string ...) ((simple-array character (*)) ...)) (string ...)) but some implementations emit full warnings if a clause in a typecase is completely shadowed by other clauses, and if base-char and character are identical in that implementation the typecase above will signal.

There were probably other, more minor differences between implementations, but the above list gives a flavour of the things that needed doing in order to get to this point, where we have some assurance that our code is behaving as intended. And all of this is a month ahead of my self-imposed deadline of SBCL’s 15th birthday: SBCL was announced to the world on December 14th, 1999. (I’m hoping to be able to put on an sbcl15 workshop in conjunction with the European Lisp Symposium around April 20th/21st/22nd – if that sounds interesting, please pencil the dates in the diary and let me know...)

Syndicated 2014-11-08 22:38:14 (Updated 2014-11-10 17:00:04) from notes

9 Nov 2014 wingo   » (Master)

ffconf 2014

Last week I had the great privilege of speaking at ffconf in Brighton, UK. It was lovely. The town put on a full demonstration of its range of November weather patterns, from blue skies to driving rain to hail (!) to sea-spray to drizzle and back again. Good times.

The conference itself was quite pleasant as well, and from the speaker perspective it was amazing. A million thanks to Remy and Julie for making it such a pleasure. ffconf is mostly a front-end development conference, so it's not directly related with the practice of my work on JS implementations -- perhaps you're unaware, but there aren't so many browser implementors that actually develop content for their browsers, and indeed fewer JS implementors that actually write JS. Me, I sling C++ all day long and the most JavaScript I write is for tests. When in the weeds, sometimes we forget we're building an amazing runtime and that people do inspiring things with it, so it's nice to check in with front-end folks at their conferences to see what people are excited about.

My talk was about the part of JavaScript implementations that are written in JavaScript itself. This is an area that isn't so well known, and it has its amusing quirks. I think it can be interesting to a curious JS hacker who wants to spelunk down a bit to see what's going on in their browsers. Intrepid explorers might even find opportunities to contribute patches. Anyway, nerdy stuff, but that's basically how I roll.

The slides are here: without images (350kB PDF) or with images (3MB PDF).

I haven't been to the UK in years, and being in a foreign country where everyone speaks my native language was quite refreshing. At the same time there was an awkward incident in which I was reminded that though close, American and English just aren't the same. I made this silly joke that if you get a polyfill into a JS implementation, then shucks, you have a "gollyfill", 'cause golly it made it in! In the US I think "golly" is just one of those milquetoast profanities, "golly" instead of "god" like saying "shucks" instead of "shit". Well in the UK that's a thing too I think, but there is also another less fortunate connotation, in which "golly" as a noun can be a racial slur. Check the Wikipedia if you're as ignorant as I was. I think everyone present understood that wasn't my intention, but if that is not the case I apologize. With slides though it's less clear, so I've gone ahead and removed the joke from the slides. It's probably not a ball to take and run with.

However I do have a ball that you can run with though! And actually, this was another terrible joke that wasn't bad enough to inflict upon my audience, but that now chance fate gives me the opportunity to use. So never fear, there are still bad puns in the slides. But, you'll have to click through to the PDF for optimal groaning.

Happy hacking, JavaScripters, and until next time.

Syndicated 2014-11-09 17:36:45 from wingolog

9 Nov 2014 Stevey   » (Master)

How could you rationally fork Debian?

The topic of Debian forks has come up a lot recently, and as time goes on I've actually started considering the matter seriously: How would you fork Debian?

The biggest stumbling block is that the Debian distribution contains thousands of packages, which are maintained by thousands of developers. A small team has virtually no hope of keeping up to date, importing changes, dealing with bug-reports, etc. Instead you have to pick your battle and decide what you care about.

This is why Ubuntu split things into "main" and "universe". Because this way they didn't have to deal with bug reports - instead they could just say "Try again in six months. Stuff from that repository isn't supported. Sorry!"

So if you were going to split the Debian project into "supported" and "unsupported" what would you use as the dividing line? I think the only sensible approach would be :

  • Base + Server stuff.
  • The rest.

On that basis you'd immediately drop the support burden of GNOME, KDE, Firefox, Xine, etc. All the big, complex, and user-friendly stuff would just get thrown away. What you'd end up with would be a Debian-Server fork, or derivative.

Things you'd package and care about would include:

  • The base system.
  • The kernel.
  • SSHD.
  • Apache / Nginx / thttpd / lighttpd / etc
  • PHP / Perl / Ruby / Python / etc
  • Jabberd / ircd / rsync / etc
  • MySQL / PostGres / Redis / MariadB / etc.

Would that be a useful split? I suspect it would. It would also be manageable by a reasonably small team.

That split would also mean if you were keen on dropping any particular init-system you'd not have an unduly difficult job - your server wouldn't be running GNOME, for example.

Of course if you're thinking of integrating a kernel and server-only stuff then you might instead prefer a BSD-based distribution. But if you did that you'd miss out on Docker. Hrm.

Syndicated 2014-11-09 12:12:50 (Updated 2014-11-09 13:14:19) from Steve Kemp's Blog

8 Nov 2014 sye   » (Journeyer)

Whats-that-poem-on-Slit-of-Cloud

八雲立つ 出雲八重垣 妻籠みに 八重垣作る その八重垣を
WAKA0001

Izumo as the 'Other Japan'

万葉集

見渡せば 花も紅葉も なかりけり 浦のとまやの 秋の夕ぐれ



8 Nov 2014 dmarti   » (Master)

Audience snatching example

This is why we can't have nice things, service journalism edition.

Chris O'Hara writes: In the open market, a 30-year-old male car intender costs the same whether you find him on Cars.com or Hotmail.

How does that work for the publisher? Someone builds a car site, does tests and reviews, pays people who can write (a lot of car writers are really good, and IT writers could learn from them). And then the ad network, as soon as it figures out who's a "car intender," follows him around the Internet to show him ads somewhere else.

Adtech is heroic? Seems like it's just the high-tech version of sneaking out of a restaurant before the bill comes.

Think of the online service journalism we could have if the whole advertising industry wasn't fixated on the idea of getting out of paying for it. The traditional print ad model was for the agency to take 15 percent. Today, intermediaries between the advertiser and publisher take 55-75%, and they justify it by being able to do audience snatching. Fix the audience snatching problem, and the web could have less desperate clickbait, and more trips to Lower Saxony for test drives. (The other effects of audience snatching are signal loss, fraud, and support for illegal sites, but this is already too long.)

Publishers get sucked into the tracking game, but there are ways out. Site management can't really help, because they have to work within the existing system, even pretend to be concerned about dark traffic. Individual writers, though? If you can make your loyal readers harder to track online, you win.

Syndicated 2014-11-08 14:24:33 from Don Marti

8 Nov 2014 Stevey   » (Master)

Some brief notes on Docker

Docker is the well-known tool for building, distributing, and launching containers.

I use it personally to run a chat-server, a graphite instance, and I distribute some of my applications with Dockerfiles too, to ease deployment.

Here are some brief notes on things that might not be obvious.

For a start when you create a container it is identified by a 64-byte ID. This ID is truncated and used as the hostname of the new guest - but if you ever care you can discover the full ID from within the guest:

~# awk -F/ '{print $NF}' /proc/self/cgroup
9d16624a313bf5bb9eb36f4490b5c2b7dff4f442c055e99b8c302edd1bf26036

Compare that with the hostname:

~# hostname
9d16624a313b

Assigning names to containers is useful, for example:

$ docker run -d -p 2222:22 --name=sshd skxskx/sshd

However note that names must be removed before they can be reused:

#!/bin/sh
# launch my ssh-container - removing the name first
docker rm  sshd || true
docker run --name=sshd -d -p 2222:22 skxskx/sshd

The obvious next step is to get the IP of the new container, and setup a hostname for it sshd.docker. Getting the IP is easy, via either the name of the ID:

~$ docker inspect --format '{{ .NetworkSettings.IPAddress }}' sshd
172.17.0.2

The only missing step is the ability to do that magically. You'd hope there would be a hook that you could run when a container has started - unfortunately there is no such thing. Instead you have two choices:

  • Write a script which parses the output of "docker events" and fires appropriately when a guest is created/destroyed.
  • Write a wrapper script for launching containers, and use that to handle the creation.

I wrote a simple watcher to fire when events are created, which lets me do the job.

But running a deamon just to watch for events seems like the wrong way to go. Instead I've switched to running via a wrapper dock-run:

$ dock-run --name=sshd -d -p 2222:22 skxskx/sshd

This invokes run-parts on the creation directory, if present, and that allows me to update DNS. So "sshd.docker.local" will point to the IP of the new image.

The wrapper was two minutes work, but it does work, and if you like you can find it here.

That concludes my notes on docker - although you can read articles I wrote on docker elsewhere.

Syndicated 2014-11-08 13:33:25 from Steve Kemp's Blog

7 Nov 2014 caolan   » (Master)

LibreOffice Coverity Defect Density 0.02

Coverity Defect Density: LibreOffice vs Average

We run LibreOffice through Coverity approximately once a week. According to Coverity's overview dashboard our current status is:

LibreOffice: 7,271,857 line of code and 0.02 defect density

Open Source Defect Density By Project Size

Line of Code (LOC) Defect Density
Less than 100,0000.35
100,000 to 499,9990.5
500,000 to 1 million0.7
More than 1 million0.65
Note: Defect density is measured by the number of defects per 1,000 lines of code, identified by the Coverity platform. The numbers shown above are from our 2013 Coverity Scan Report, which analyzed 250 million lines of open source code.

The "lines of code" here is 7,271,857 vs 9,500,825 in older reports because I'm now building against system-libraries instead of building those in-tree in order to speed up the process. Those "external" libraries have always been marked as "ignore in analysis" in coverity so that change has no effect on the defect density of our own code.

If anyone knows how we could rework our code or otherwise automatically silence https://communities.coverity.com/thread/2993 that would be great. This false positive keeps cropping up in uses of uno::Sequence so they keep popping up.

We're now at that happy place where we are getting a very small and manageable number of actually new warnings in "really" modified code each run rather than getting the same old one again and again as general refactoring perturbs the code enough that they get newly detected.

Syndicated 2014-11-07 21:08:00 (Updated 2014-11-07 21:08:05) from Caolán McNamara

7 Nov 2014 dmarti   » (Master)

Journalists

Just picture this for a minute.

Your city puts in a brand-new monorail system.

A Journalist rides it to work.

On the platform, someone steals his wallet.

The Journalist gets in to work and writes a long think piece about "how can Journalism survive in the age of monorail technology?"

Please, knock it off.

Yes, Journalism is in trouble. But it's not the monorail, or the Internet, that's taking your money. Journalists, the Internet is just the playing field for a game that other companies are beating your employers at. (Just like Amazon is better at ebooks than publishers are, but that's another story.)

Why are news organizations failing?

Alexis C. Madrigal said it best. The ad market, on which we all depend, started going haywire. Advertisers didn’t have to buy The Atlantic. They could buy ads on networks that had dropped a cookie on people visiting The Atlantic. They could snatch our audience right out from underneath us.

The problem isn't Advertising as such. Advertising has always been part of the revenue mix for Journalism. Back in the day, we used to say that the ads paid for the content, and the subscription revenue paid for printing and postage. Good journalism can be a high-signal ad venue, so the potential rewards to advertisers are great. Supporting Journalism on the Internet isn't about going from ad-supported to some untried subscription-only model. We can fix the bugs in what we have.

Where is the money going?

The problem is that advertisers are caught up in the Big Data trend. Instead of putting ad budgets to work where they can send a signal, and supporting Journalism along the way, advertisers are choosing third-party ads, which follow users across multiple sites, instead.

While Journalism depends on advertising, half of the online ad money is going to adtech intermediaries and another half is getting stolen. This doesn't add up to 100% because the adtech companies get their cut from the fraud perpetrators too, but, come on, Journalists, of course you have no money.

The advertising business has been sold on the proposition that it's possible to reach an audience without paying for some kind of quality product to put the ads on. Instead of attaching an ad to an attention-getting article, the current fad is to spend on intermediaries instead. The result has been crappier advertising, less money for content, and more money for fraud.

Eric Picard writes,

It will be hard for publishers—even the large ones—to resist the momentum that will build to plug into these walled gardens, forcing publishers to effectively commoditize themselves in exchange for access to identity, targeting and analytics data.

How does working for a commodity publisher sound?

Not so good? Tired of watching pay, benefits, and expense accounts shrivel up, while somehow the online ad business is all Aeron chairs and free tacos?

It's a game. Make a move already.

This is where Journalists can stop lamenting the Future of Journalism, and, ready? Act in your own economic interests for once. Look, the Internet is not hard-wired against you. It's just that people who understand it better than you—online ad business and their frenemies, the ad fraud gangs—are using it to rip you off.

I'm going to give you three suggested moves. But it's up to you. At least get in the game.

  • You have to learn privacy tech anyway, so take a few extra minutes to learn the economics of how user tracking affects your own business.

  • Try an anti-tracking tool. Not just a general-purpose ad blocker, but something that specifically deals with the targeted ad problem. Disconnect and Privacy Badger are two that work for me.

  • Have a look at online ads from the advertiser's side. Spend a little money on ads to promote your own blog. See how it works. (Or try something a little "edgier".)

  • Address people's primal fears about Internet privacy. Write about the privacy tech that works for you. Get your audience started using it.

The more you make your audience aware of tracking problems and motivate them to be harder to track, the more motivation the advertisers have to work in a constructive way.

For Journalists, a future high-signal/low-tracking online ad business isn't just a positive externality. As far as I can see, it's the best shot at a respectable living.

Bonus links

Accidental equality: Three publisher take-aways from Facebook's results

Your Ad Ran Here (Not Really)

Big Data and me

Syndicated 2014-11-07 17:13:46 from Don Marti

7 Nov 2014 dkg   » (Master)

GnuPG 2.1.0 in debian experimental

Today, i uploaded GnuPG 2.1.0 into debian's experimental suite. It's built for amd64 and i386 and powerpc already. You can monitor its progress on the buildds to see when it's available for your architecture.

Changes

GnuPG 2.1 offers many new and interesting features, but one of the most important changes is the introduction of elliptic curve crypto (ECC). While GnuPG 2.1 discourages the creation of ECC keys by default, it's important that we have the ability to verify ECC signatures and to encrypt to ECC keys if other people are using this tech. It seems likely, for example, that Google's End-To-End Chrome OpenPGP extension will use ECC. Users who don't have this capability available won't be able to communicate with End-To-End users.

There are many other architectural changes, including a move to more daemonized interactions with the outside world, including using dirmngr to talk to the keyservers, and relying more heavily on gpg-agent for secret key access. The gpg-agent change is a welcome one -- the agent now holds the secret key material entirely and never releases it -- as of 2.1 gpg2 never has any asymmetric secret key material in its process space at all.

One other nice change for those of us with large keyrings is the new keybox format for public key material. This provides much faster indexed access to the public keyring.

I've been using GnuPG 2.1.0 betas regularly for the last month, and i think that for the most part, they're ready for regular use.

Timing for debian

The timing between the debian freeze and the GnuPG upstream is unfortunate, but i don't think i'm prepared to push for this as a jessie transition yet, without more backup. I'm talking to other members of the GnuPG packaging team to see if they think this is worth even bringing to the attention of the release team, but i'm not pursuing it at the moment.

If you really want to see this in debian jessie, please install the experimental package and let me know how it works for you.

Long term migration concerns

GnuPG upstream is now maintaining three branches concurrently: modern (2.1.x), stable (2.0.x), and classic (1.4.x). I think this is stretches the GnuPG upstream development team too thin, and we should do what we can to help them transition to supporting fewer releases concurrently.

In the long-term, I'd ultimately like to see gnupg 2.1.x to replace all use of gpg 1.4.x and gpg 2.0.x in debian, but unlikely to to happen right now.

In particular, the following two bugs make it impossible to use my current, common monkeysphere workflow:

And GnuPG 2.1.0 drops support for the older, known-weak OpenPGPv3 key formats. This is an important step for simplification, but there are a few people who probably still need to use v3 keys for obscure/janky reasons, or have data encrypted to a v3 key that they need to be able to decrypt. Those people will want to have GnuPG 1.4 around.

Call for testing

Anyway, if you use debian testing or unstable, and you are interested in these features, i invite you to install `gnupg2` and its friends from experimental. If you want to be sensibly conservative, i recommend backing up `~/.gnupg` before trying to use it:

cp -aT .gnupg .gnupg.bak
sudo apt install -t experimental gnupg2 gnupg-agent dirmngr gpgsm gpgv2 scdaemon
If you find issues, please file them via the debian BTS as usual. I (or other members of the pkg-gnupg team) will help you triage them to upstream as needed.

Tags: ecc, experimental, gnupg

Syndicated 2014-11-06 23:27:00 from Weblogs for dkg

6 Nov 2014 Stevey   » (Master)

Planning how to configure my next desktop

I recently setup a bunch of IPv6-only accessible hosts, which I mentioned in my previous blog post.

In the end I got them talking to the IPv4/legacy world via the installation of an OpenVPN server - they connect over IPv6 get a private 10.0.0.0/24 IP address, and that is masqueraded via the OpenVPN-gateway.

But the other thing I've been planning recently is how to configure my next desktop system. I generally do all development, surfing, etc, on one desktop system. I use virtual desktops to organize things, and I have a simple scripting utility to juggle windows around into the correct virtual-desktop as they're launched.

Planning a replacement desktop means installing a fresh desktop, then getting all the software working again. These days I'd probably use docker images to do development within, along with a few virtual machines (such as the pbuilder host I used to release all my Debian packages).

But there are still niggles. I'd like to keep the base system lean, with few packages, but you can't run xine remotely, similarly I need mpd/sonata for listening to music, emacs for local stuff, etc, etc.

In short there is always the tendency to install yet-another package, service, or application on the desktop, which makes migration a pain.

I'm not sure I could easily avoid that, but it is worth thinking about. I guess I could configure a puppet/slaughter/cfengine host and use that to install the desktop - but I've always done desktops "manually" and servers "magically" so it's a bit of a change in thinking.

Syndicated 2014-11-06 21:26:36 from Steve Kemp's Blog

6 Nov 2014 danstowell   » (Journeyer)

PhD opportunity! Study machine learning and bird sounds with me

I have a fully-funded PhD position available to study machine learning and bird sounds with me!

For full details please see the PhD advertisement on jobs.ac.uk. Application deadline Monday 12th January 2015.

Please do email me if you're interested or if you have any questions.

– and is there anyone you know who might be interested? Send them the link!

Syndicated 2014-11-06 11:00:42 from Dan Stowell

5 Nov 2014 mbrubeck   » (Journeyer)

Let's build a browser engine! Part 7: Painting 101

I’m returning at last to my series on building a simple HTML rendering engine:

In this article, I will add very basic painting code. This code takes the tree of boxes from the layout module and turns them into an array of pixels. This process is also known as “rasterization.”

Browsers usually implement rasterization with the help of graphics APIs and libraries like Skia, Cairo, Direct2D, and so on. These APIs provide functions for painting polygons, lines, curves, gradients, and text. For now, I’m going to write my own rasterizer that can only paint one thing: rectangles.

Eventually I want to add support for text rendering. At that point, I may throw away this painting code and switch to a “real” 2D graphics library. But for now, rectangles are sufficient to turn the output of my block layout algorithm into pictures.

Catching Up

Since my last post, I’ve made some small changes to the code from previous articles. These includes some minor refactoring, and some updates to keep the code compatible with the latest Rust nightly builds. None of these changes are vital to understanding the code, but if you’re curious, check the commit history.

Building the Display List

Before painting, we will walk through the layout tree and build a display list. This is a list of graphics operations like “draw a circle” or “draw a string of text.” Or in our case, just “draw a rectangle.”

Why put commands into a display list, rather than execute them immediately? The display list is useful for a several reasons. You can search it for items that will be completely covered up by later operations, and remove them to eliminate wasted painting. You can modify and re-use the display list in cases where you know only certain items have changed. And you can use the same display list to generate different types of output: for example, pixels for displaying on a screen, or vector graphics for sending to a printer.

Robinson’s display list is a vector of DisplayCommands. For now there is only one type of DisplayCommand, a solid-color rectangle:

    type DisplayList = Vec<DisplayCommand>;

enum DisplayCommand {
    SolidColor(Color, Rect),
    // insert more commands here
}
  

To build the display list, we walk through the layout tree and generate a series of commands for each box. First we draw the box’s background, then we draw its borders and content on top of the background.

    fn build_display_list(layout_root: &LayoutBox) -> DisplayList {
    let mut list = Vec::new();
    render_layout_box(&mut list, layout_root);
    return list;
}

fn render_layout_box(list: &mut DisplayList, layout_box: &LayoutBox) {
    render_background(list, layout_box);
    render_borders(list, layout_box);
    // TODO: render text

    for child in layout_box.children.iter() {
        render_layout_box(list, child);
    }
}
  

By default, HTML elements are stacked in the order they appear: If two elements overlap, the later one is drawn on top of the earlier one. This is reflected in our display list, which will draw the elements in the same order they appear in the DOM tree. If this code supported the z-index property, then individual elements would be able to override this stacking order, and we’d need to sort the display list accordingly.

The background is easy. It’s just a solid rectangle. If no background color is specified, then the background is transparent and we don’t need to generate a display command.

    fn render_background(list: &mut DisplayList, layout_box: &LayoutBox) {
    get_color(layout_box, "background").map(|color|
        list.push(SolidColor(color, layout_box.dimensions.border_box())));
}

/// Return the specified color for CSS property `name`, or None if no color was specified.
fn get_color(layout_box: &LayoutBox, name: &str) -> Option<Color> {
    match layout_box.box_type {
        BlockNode(style) | InlineNode(style) => match style.value(name) {
            Some(ColorValue(color)) => Some(color),
            _ => None
        },
        AnonymousBlock => None
    }
}
  

The borders are similar, but instead of a single rectangle we draw four—one for each edge of the box.

    fn render_borders(list: &mut DisplayList, layout_box: &LayoutBox) {
    let color = match get_color(layout_box, "border-color") {
        Some(color) => color,
        _ => return // bail out if no border-color is specified
    };

    let d = &layout_box.dimensions;
    let border_box = d.border_box();

    // Left border
    list.push(SolidColor(color, Rect {
        x: border_box.x,
        y: border_box.y,
        width: d.border.left,
        height: border_box.height,
    }));

    // Right border
    list.push(SolidColor(color, Rect {
        x: border_box.x + border_box.width - d.border.right,
        y: border_box.y,
        width: d.border.right,
        height: border_box.height,
    }));

    // Top border
    list.push(SolidColor(color, Rect {
        x: border_box.x,
        y: border_box.y,
        width: border_box.width,
        height: d.border.top,
    }));

    // Bottom border
    list.push(SolidColor(color, Rect {
        x: border_box.x,
        y: border_box.y + border_box.height - d.border.bottom,
        width: border_box.width,
        height: d.border.bottom,
    }));
}
  

Next the rendering function will draw each of the box’s children, until the entire layout tree has been translated into display commands.

Rasterization

Now that we’ve built the display list, we need to turn it into pixels by executing each DisplayCommand. We’ll store the pixels in a Canvas:

    struct Canvas {
    pixels: Vec<Color>,
    width: uint,
    height: uint,
}

impl Canvas {
    /// Create a blank canvas
    fn new(width: uint, height: uint) -> Canvas {
        let white = Color { r: 255, g: 255, b: 255, a: 255 };
        return Canvas {
            pixels: Vec::from_elem(width * height, white),
            width: width,
            height: height,
        }
    }
    // ...
}
  

To paint a rectangle on the canvas, we just loop through its rows and columns, using a helper method to make sure we don’t go outside the bounds of our canvas.

    fn paint_item(&mut self, item: &DisplayCommand) {
    match item {
        &SolidColor(color, rect) => {
            // Clip the rectangle to the canvas boundaries.
            let x0 = rect.x.clamp(0.0, self.width as f32) as uint;
            let y0 = rect.y.clamp(0.0, self.height as f32) as uint;
            let x1 = (rect.x + rect.width).clamp(0.0, self.width as f32) as uint;
            let y1 = (rect.y + rect.height).clamp(0.0, self.height as f32) as uint;

            for y in range(y0, y1) {
                for x in range(x0, x1) {
                    // TODO: alpha compositing with existing pixel
                    self.pixels[x + y * self.width] = color;
                }
            }
        }
    }
}
  

Note that this code only works with opaque colors. If we added transparency (by reading the opacity property, or adding support for rgba() values in the CSS parser) then it would need to blend each new pixel with whatever it’s drawn on top of.

Now we can put everything together in the paint function, which builds a display list and then rasterizes it to a canvas:

    /// Paint a tree of LayoutBoxes to an array of pixels.
fn paint(layout_root: &LayoutBox, bounds: Rect) -> Canvas {
    let display_list = build_display_list(layout_root);
    let mut canvas = Canvas::new(bounds.width as uint, bounds.height as uint);
    for item in display_list.iter() {
        canvas.paint_item(item);
    }
    return canvas;
}
  

Lastly, we can write a few lines of code using the Rust Image library to save the array of pixels as a PNG file.

Pretty Pictures

At last, we’ve reached the end of our rendering pipeline. In under 1000 lines of code, robinson can now parse this HTML file:

    <div class="a">
  <div class="b">
    <div class="c">
      <div class="d">
        <div class="e">
          <div class="f">
            <div class="g">
            </div>
          </div>
        </div>
      </div>
    </div>
  </div>
</div>
  

…and this CSS file:

    * { display: block; padding: 12px; }
.a { background: #ff0000; }
.b { background: #ffa500; }
.c { background: #ffff00; }
.d { background: #008000; }
.e { background: #0000ff; }
.f { background: #4b0082; }
.g { background: #800080; }
  

…to produce this:

Yay!

Exercises

If you’re playing along at home, here are some things you might want to try:

  1. Write an alternate painting function that takes a display list and produces vector output (for example, an SVG file) instead of a raster image.

  2. Add support for opacity and alpha blending.

  3. Write a function to optimize the display list by culling items that are outside of the canvas or obscured by later items.

  4. If you’re familiar with OpenGL, write a hardware-accelerated painting function that uses GL shaders to draw the rectangles.

To Be Continued…

Now that we’ve got basic functionality for each stage in our rendering pipeline, it’s time to go back and fill in some of the missing features—in particular, inline layout and text rendering. Future articles may also add additional stages, like networking and scripting.

I’m going to give a short “Let’s build a browser engine!” talk at this month’s Bay Area Rust Meetup. The meetup is at 7pm tomorrow (Thursday, November 6) at Mozilla’s San Francisco office, and it will also feature talks on Servo by my fellow Servo developers. Video of the talks will be streamed live on Air Mozilla, and recordings will be published afterward. (I’ll add a direct link to the video as soon as it’s available.)

Syndicated 2014-11-05 17:55:00 from Matt Brubeck

3 Nov 2014 amits   » (Journeyer)

Fedora Activity Day: Security I

Last Saturday a few of us gathered to work on Fedora Security.  This FAD (Fedora Activity Day) was the second in recent times held in Pune, after the testing FAD held in August.

Security FAD

The goal of the FAD was to get introduced to the newly-formed Fedora Security Team, pick up a few bug reports that were tagged as security-relevant bug reports, and triage them.  Fixing the bugs wasn’t part of the agenda, as actually pushing package updates needs one to be a provenpackager or the maintainer of the package.

We were assembled at the Red Hat Pune office.  I took a shot at transcribing PJP’s intro talk on the #fedora-india IRC channel, and a couple of people joined remotely in the triaging activity, which was quite nice to see.

The FAD wiki page had all the relevant information on how to go about triaging the bugs, so it was all quite straightforward from there.

I got a bit bored by just going through bug reports, without much “action” happening — it depended on the bug we selected on whether we just needed to set needinfo? on the assignee of the bug, or actually check progress of packages upstream, whether a patch was available, etc.  I just looked through bugs which looked relevant to virtualization, and then wanted to look at different ways to contribute.

PJP suggested looking at some fuzzers, and actually running them.  He pointed me to Radamsa as an example.  That does look like a good tool to generate some random input to programs, and see how they behave under unexpected input.  I didn’t actually get to run it, but now have an idea on what to do when I feel bored again.

While reading about Radamsa, I also thought a bit on how to fuzz qemu.  Nothing concrete came up, but one thought is to send weird stuff from guests to the host, by way of weirdly-formatted network packets (to test virtio-net or other net device emulations), or block device requests (to test virtio-blk / virtio-scsi / ide / ahci).  That’s an idea for a side project.

There also was a Docker meetup running at the same time at the office, so I dropped in there a couple of times to see what they were upto.  The organizers had split the session into talks + hackathon; and both were very well-attended.  In my lurking there, I overheard what Kubernetes is about, and a few terminologies it introduces into the tech world: minions and pods.  I’m sure we’re going to run out of words in the English language to re-purpose to technical usage very soon.

The FAD was originally supposed to happen in September, but got delayed to November.  For the next installation of Fedora-related activities, we may do an F21 release party along with a few user talks.  Regular FADs should resume in January, I suppose.

Syndicated 2014-11-03 18:28:12 (Updated 2014-11-03 18:29:12) from Think. Debate. Innovate.

3 Nov 2014 amits   » (Journeyer)

KVM Forum 2014

It’s been a couple of weeks that I’ve returned from Düsseldorf, Germany, after attending the seventh KVM Forum; an event where developers and users of the Linux virtualization technology gather to discuss the state of the hypervisor and tools around it, and brainstorm on future plans. As with the previous few years, the event was co-located with LinuxCon Europe.

IMA_4326

A few observations from the event, in random order:

  • Linux Foundation did a great job of hosting and planning the event.
  • This was the first time when the food was great!  There were even options for vegetarians, vegans, and kosher food.
  • The venue, Congress Centre Düsseldorf, was huge, and located perfectly along the picturesque Rhine river.
  • It was the first KVM Forum which Avi did not attend.
  • The schedule was nicely-paced, with not too many parallel talks, and plenty of opportunities for hallway discussions and meeting people.
  • Co-locating with the Linux Plumbers Conference, LinuxCon, CloudOpen, etc., conferences ensured there were a lot of people interested in Linux in general; and since almost everyone is at least a user of virt technologies, discussions with almost anyone is fruitful around how KVM/QEMU/libvirt get used, and what users expect from us.
  • All the talks were recorded on video, and are available in this youtube playlist.
  • Photos from the event are here
  • All the slides from talks are at the KVM Forum wiki page
  • The QEMU Summit was also held along with the Forum; notes from the Summit are posted on the qemu devel mail list.
  • Jeff Cody’s talk on an intro to writing and submitting patches to qemu, and working with the community, got very positive feedback.  At least two people told me it would’ve been good to have that talk a year back, when they were getting started.  Well, it’s now available on the ‘net, and archived for people just starting out!
  • The OVA (Open Virtualizaiton Alliance) session on connecting users and developers of KVM by hosting a panel of KVM users (from cloud providers / builders) had one interesting insight: everyone wants more performance from KVM networking (well, the session was focussed on NFV, so that isn’t surprising).  No matter how fast you go, you want stuff to go faster still.  No one talked of stability, reliability, manageability, etc., so I suppose they’re just happy with those aspects.
  • A discussion with Chris Wright on KVM and OpenStack brought to me a surprise: KVM “just works” on OpenStack, and KVM is not the layer where there are problems.  No matter how many features we add or how much more performance we can eke out of the hypervisor, the most user-visible changes are now going to happen in the upper layers, most specially within the OpenStack project.  Obviously, there’s a need for us to collaborate with the OpenStack teams, but for most purposes, KVM is hardly the bottleneck or blocker for taking stuff to the clouds.  (We do have a huge list of things to do, but we’re ahead of the curve — what we are planning to do is needed and anticipated, but we need a better way to expose what we already have, and the OpenStack teams are going full-throttle at it.)

I suppose these are the highlights; I may have forgotten a few things due to the intervening holiday season.

Syndicated 2014-11-03 17:59:47 from Think. Debate. Innovate.

2 Nov 2014 Stevey   » (Master)

IPv6 only server

I enjoy the tilde.club site/community, and since I've just setup an IPv6-only host I was looking to do something similar.

Unfortunately my (working) code to clone github repositories into per-user directories fails - because github isn't accessible over IPv6.

That's a shame.

Oddly enough chromium, the browser packaged for wheezy, doesn't want to display IPv6-only websites either. For example this site fail to load http://ipv6.steve.org.uk/.

In the meantime I've got a server setup which is only accessible over IPv6 and I'm a little smug. (http://ipv6.website/).

(Yes it is true that I've used all the IPv4 addreses allocated to my VLAN. That's just a coincidence. Ssh!)

Syndicated 2014-11-02 10:54:56 from Steve Kemp's Blog

31 Oct 2014 marnanel   » (Journeyer)

obsolete offences

"Section 13 [of the Criminal Law Act 1967] abolished the common law offences of champerty and barratry, challenging to fight, eavesdropping or being "a common scold or a common night walker." It also repealed the offence of praemunire, which had survived on the statute books since 1392. It preserved the common law offence of embracery (which was later abolished by the Bribery Act 2010)." --
https://en.wikipedia.org/wiki/Criminal_Law_Act_1967#Part_II_-_Obsolete_crimes

If you're interested, I think these are:
champerty = paying costs of a civil action you have nothing to do with as an investment in order to get some of the money if you win
barratry = stirring up quarrels in court
common scold = disturber of the peace (apparently only for women)
praemunire = sending tax money to the Pope, or submitting to his jurisdiction in civil matters (yes, this was made illegal in 1392)
embracery = bribing jurors.

This entry was originally posted at http://marnanel.dreamwidth.org/316110.html. Please comment there using OpenID.

Syndicated 2014-10-31 22:11:59 from Monument

31 Oct 2014 etbe   » (Master)

Links October 2014

The Verge has an interesting article about Tim Cook (Apple CEO) coming out [1]. Tim says “if hearing that the CEO of Apple is gay can help someone struggling to come to terms with who he or she is, or bring comfort to anyone who feels alone, or inspire people to insist on their equality, then it’s worth the trade-off with my own privacy”.

Graydon2 wrote an insightful article about the right-wing libertarian sock-puppets of silicon valley [2].

George Monbiot wrote an insightful article for The Guardian about the way that double-speak facilitates killing people [3]. He is correct that the media should hold government accountable for such use of language instead of perpetuating it.

Anne Thériault wrote an insightful article for Vice about the presumption of innocence and sex crimes [4].

Dr Nerdlove wrote an interesting article about Gamergate as the “extinction burst” of “gamer culture” [5], we can only hope.

Shweta Narayan wrote an insightful article about Category Structure and Oppression [6]. I can’t summarise it because it’s a complex concept, read the article.

Some Debian users who don’t like Systemd have started a “Debian Fork” project [7], which so far just has a web site and nothing else. I expect that they will never write any code. But it would be good if they did, they would learn about how an OS works and maybe they wouldn’t disagree so much with the people who have experience in developing system software.

A GamerGate terrorist in Utah forces Anita Sarkeesian to cancel a lecture [8]. I expect that the reaction will be different when (not if) an Islamic group tries to get a lecture cancelled in a similar manner.

Model View Culture has an insightful article by Erika Lynn Abigail about Autistics in Silicon Valley [9].

Katie McDonough wrote an interesting article for Salon about Ed Champion and what to do about men who abuse women [10]. It’s worth reading that while thinking about the FOSS community…

Related posts:

  1. Links September 2014 Matt Palmer wrote a short but informative post about enabling...
  2. Links July 2014 Dave Johnson wrote an interesting article for Salon about companies...
  3. Links August 2014 Matt Palmer wrote a good overview of DNSSEC [1]. Sociological...

Syndicated 2014-10-31 13:55:52 from etbe - Russell Coker

31 Oct 2014 etbe   » (Master)

Samsung Galaxy Note 3

In June last year I bought a Samsung Galaxy Note 2 [1]. Generally I was very happy with that phone, one problem I had is that less than a year after purchasing it the Ingress menus burned into the screen [2].

2 weeks ago I bought a new Galaxy Note 3. One of the reasons for getting it is the higher resolution screen, I never realised the benefits of a 1920*1080 screen on a phone until my wife got a Nexus 5 [3]. I had been idly considering a Galaxy Note 4, but $1000 is a lot of money to pay for a phone and I’m not sure that a 2560*1440 screen will offer much benefit in that size. Also the Note 3 and Note 4 both have 3G of RAM, as some applications use more RAM when you have a higher resolution screen the Note 4 will effectively have less usable RAM than the Note 3.

My first laptop cost me $3,800 in 1998, that’s probably more than $6,000 in today’s money. The benefits that I receive now from an Android phone are in many ways greater than I received from that laptop and that laptop was definitely good value for money for me. If the cheapest Android phone cost $6,000 then I’d pay that, but given that the Note 3 is only $550 (including postage) there’s no reason for me to buy something more expensive.

Another reason for getting a new phone is the limited storage space in the Note 2. 16G of internal storage is a limit when you have some big games installed. Also the recent Android update which prevented apps from writing to the SD card meant that it was no longer convenient to put TV shows on my SD card. 32G of internal storage in the Note 3 allows me to fit everything I want (including the music video collection I downloaded with youtube-dl). The Note 2 has 16G of internal storage and an 8G SD card (that I couldn’t fully use due to Android limitations) while the Note 3 has 32G (the 64G version wasn’t on sale at any of the cheap online stores). Also the Note 3 supports an SD card which will be good for my music video collection at some future time, this is a significant benefit over the Nexus 5.

In the past I’ve written about Android service life and concluded that storage is the main issue [4]. So it is a bit unfortunate that I couldn’t get a phone with 64G of storage at a reasonable price. But the upside is that getting a cheaper phone allows me to buy another one sooner and give the old phone to a relative who has less demanding requirements.

In the past I wrote about the warranty support for my wife’s Nexus 5 [5]. I should have followed up on that before, 3 days after that post we received a replacement phone. One good thing that Google does is to reserve money on a credit card to buy the new phone and then send you the new phone before you send the old one back. So if the customer doesn’t end up sending the broken phone they just get billed for the new phone, that avoids excessive delays in getting a replacement phone. So overall the process of Google warranty support is really good, if 2 products are equal in other ways then it would be best to buy from Google to get that level of support.

I considered getting a Nexus 5 as the hardware is reasonably good (not the greatest but quite good enough) and the price is also reasonably good. But one thing I really hate is the way they do the buttons. Having the home button appear on the main part of the display is really annoying. I much prefer the Samsung approach of having a hardware button for home and touch-screen buttons outside the viewable area for settings and back. Also the stylus on the Note devices is convenient on occasion.

The Note 3 has a fake-leather back. The concept of making fake leather is tacky, I believe that it’s much better to make honest plastic that doesn’t pretend to be something that it isn’t. However the texture of the back improves the grip. Also the fake stitches around the edge help with the grip too. It’s tacky but utilitarian.

The Note 3 is slightly smaller and lighter than the Note 2. This is a good technical achievement, but I’d rather they just gave it a bigger battery.

Update USB 3

One thing I initially forgot to mention is that the Note 3 has USB 3. This means that it has a larger socket which is less convenient when you try and plug it in at night. USB 3 seems unlikely to provide any benefit for me as I’ve never had any of my other phones transfer data at rates more than about 5MB/s. If the Note 3 happens to have storage that can handle speeds greater than the 32MB/s a typical USB 2 storage device can handle then I’m still not going to gain much benefit. USB 2 speeds would allow me to transfer the entire contents of a Note 3 in less than 20 minutes (if I needed to copy the entire storage contents). I can’t imagine myself having a real-world benefit from that.

The larger socket means more fumbling when charging my phone at night and it also means that the Note 3 cable can’t be used in any other phone I own. In a year or two my wife will have a phone with USB 3 support and that cable can be used for charging 2 phones. But at the moment the USB 3 cable isn’t useful as I don’t need to have a phone charger that can only charge one phone.

Conclusion

The Note 3 basically does everything I expected of it. It’s just like the Note 2 but a bit faster and with more storage. I’m happy with it.

Related posts:

  1. Samsung Galaxy Note 2 A few weeks ago I bought a new Samsung Galaxy...
  2. Samsung Galaxy S3 First Review with Power Case My new Samsung Galaxy S3 arrived a couple of days...
  3. Samsung Galaxy Camera – a Quick Review I recently had a chance to briefly play with the...

Syndicated 2014-10-31 13:40:25 from etbe - Russell Coker

30 Oct 2014 bagder   » (Master)

Changing networks on Mac with Firefox

Not too long ago I blogged about my work to better deal with changing networks while Firefox is running. That job was basically two parts.

A) generic code to handle receiving such a network-changed event and then

B) a platform specific part that was for Windows that detected such a network change and sent the event

Today I’ve landed yet another fix for part B called bug 1079385, which detects network changes for Firefox on Mac OS X.

mac miniI’ve never programmed anything before on the Mac so this was sort of my christening in this environment. I mean, I’ve written countless of POSIX compliant programs including curl and friends that certainly builds and runs on Mac OS just fine, but I never before used the Mac-specific APIs to do things.

I got a mac mini just two weeks ago to work on this. Getting it up, prepared and my first Firefox built from source took all-in-all less than three hours. Learning the details of the mac API world was much more trouble and can’t say that I’m mastering it now either but I did find myself at least figuring out how to detect when IP addresses on the interfaces change and a changed address is a pretty good signal that the network changed somehow.

Syndicated 2014-10-30 21:46:20 from daniel.haxx.se

30 Oct 2014 mjg59   » (Master)

Hacker News metrics (first rough approach)

I'm not a huge fan of Hacker News[1]. My impression continues to be that it ends up promoting stories that align with the Silicon Valley narrative of meritocracy, technology will fix everything, regulation is the cancer killing agile startups, and discouraging stories that suggest that the world of technology is, broadly speaking, awful and we should all be ashamed of ourselves.

But as a good data-driven person[2], wouldn't it be nice to have numbers rather than just handwaving? In the absence of a good public dataset, I scraped Hacker Slide to get just over two months of data in the form of hourly snapshots of stories, their age, their score and their position. I then applied a trivial test:

  1. If the story is younger than any other story
  2. and the story has a higher score than that other story
  3. and the story has a worse ranking than that other story
  4. and at least one of these two stories is on the front page
then the story is considered to have been penalised.

(note: "penalised" can have several meanings. It may be due to explicit flagging, or it may be due to an automated system deciding that the story is controversial or appears to be supported by a voting ring. There may be other reasons. I haven't attempted to separate them, because for my purposes it doesn't matter. The algorithm is discussed here.)

Now, ideally I'd classify my dataset based on manual analysis and classification of stories, but I'm lazy (see [2]) and so just tried some keyword analysis:








Keyword Penalised Unpenalised
Women 13 4
Harass 2 0
Female 5 1
Intel 2 3
x86 3 4
ARM 3 4
Airplane 1 2
Startup 46 26


A few things to note:
  1. Lots of stories are penalised. Of the front page stories in my dataset, I count 3240 stories that have some kind of penalty applied, against 2848 that don't. The default seems to be that some kind of detection will kick in.
  2. Stories containing keywords that suggest they refer to issues around social justice appear more likely to be penalised than stories that refer to technical matters
  3. There are other topics that are also disproportionately likely to be penalised. That's interesting, but not really relevant - I'm not necessarily arguing that social issues are penalised out of an active desire to make them go away, merely that the existing ranking system tends to result in it happening anyway.

This clearly isn't an especially rigorous analysis, and in future I hope to do a better job. But for now the evidence appears consistent with my innate prejudice - the Hacker News ranking algorithm tends to penalise stories that address social issues. An interesting next step would be to attempt to infer whether the reasons for the penalties are similar between different categories of penalised stories[3], but I'm not sure how practical that is with the publicly available data.

(Raw data is here, penalised stories are here, unpenalised stories are here)


[1] Moving to San Francisco has resulted in it making more sense, but really that just makes me even more depressed.
[2] Ha ha like fuck my PhD's in biology
[3] Perhaps stories about startups tend to get penalised because of voter ring detection from people trying to promote their startup, while stories about social issues tend to get penalised because of controversy detection?

comment count unavailable comments

Syndicated 2014-10-30 15:19:57 from Matthew Garrett

30 Oct 2014 Pizza   » (Master)

Further printer work

Okay, so I guess I was wrong about additional printer hacking. Despite the 12-hour days at the office over the past few weeks (we got our first silicon back, and software is the ring that binds everything together in the darkness), I'm still spending time writing code when I get home.

First, I added support for the Sony UP-CR10L and its rebadged bretheren, the DNP SL10. I've had these on my to-do list for a while; I'd already decoded everything and updated the existing UP-DR150/200 backend to handle the new bits, but never got around to adding proper support into Gutenprint. That's now done, and once I get the USB PIDs, it should JustWork(tm).

Beyond that, I've knocked out a few things on the bug list. One I just fixed affected pipelined printingon the DNP/Citizen printers, and it was most easily triggered by multi-page print jobs. With Gutenprint 5.2.10's backend, the printer would just abort the job after the first page, but if you were using a development snapshot after 2014-06-04, it would automatically retry the job, resulting in an endless printing of page 1 over and over again.

The bug was due to the backend mistakenly treating the "Printing, with one available buffer for a 300dpi or small 600dpi job" status as an error.

...Oops.

At least folks won't have to wait for the next Gutenprint release to pick up the latest backend code.

I have a rather large photo backlog from the past month to sort through. That will be my weekend project..

Syndicated 2014-10-30 02:38:37 (Updated 2014-11-22 14:13:15) from Solomon Peachy

30 Oct 2014 mjg59   » (Master)

On joining the FSF board

I joined the board of directors of the Free Software Foundation a couple of weeks ago. I've been travelling a bunch since then, so haven't really had time to write about it. But since I'm currently waiting for a test job to finish, why not?

It's impossible to overstate how important free software is. A movement that began with a quest to work around a faulty printer is now our greatest defence against a world full of hostile actors. Without the ability to examine software, we can have no real faith that we haven't been put at risk by backdoors introduced through incompetence or malice. Without the freedom to modify software, we have no chance of updating it to deal with the new challenges that we face on a daily basis. Without the freedom to pass that modified software on to others, we are unable to help people who don't have the technical skills to protect themselves.

Free software isn't sufficient for building a trustworthy computing environment, one that not merely protects the user but respects the user. But it is necessary for that, and that's why I continue to evangelise on its behalf at every opportunity.

However.

Free software has a problem. It's natural to write software to satisfy our own needs, but in doing so we write software that doesn't provide as much benefit to people who have different needs. We need to listen to others, improve our knowledge of their requirements and ensure that they are in a position to benefit from the freedoms we espouse. And that means building diverse communities, communities that are inclusive regardless of people's race, gender, sexuality or economic background. Free software that ends up designed primarily to meet the needs of well-off white men is a failure. We do not improve the world by ignoring the majority of people in it. To do that, we need to listen to others. And to do that, we need to ensure that our community is accessible to everybody.

That's not the case right now. We are a community that is disproportionately male, disproportionately white, disproportionately rich. This is made strikingly obvious by looking at the composition of the FSF board, a body made up entirely of white men. In joining the board, I have perpetuated this. I do not bring new experiences. I do not bring an understanding of an entirely different set of problems. I do not serve as an inspiration to groups currently under-represented in our communities. I am, in short, a hypocrite.

So why did I do it? Why have I joined an organisation whose founder I publicly criticised for making sexist jokes in a conference presentation? I'm afraid that my answer may not seem convincing, but in the end it boils down to feeling that I can make more of a difference from within than from outside. I am now in a position to ensure that the board never forgets to consider diversity when making decisions. I am in a position to advocate for programs that build us stronger, more representative communities. I am in a position to take responsibility for our failings and try to do better in future.

People can justifiably conclude that I'm making excuses, and I can make no argument against that other than to be asked to be judged by my actions. I hope to be able to look back at my time with the FSF and believe that I helped make a positive difference. But maybe this is hubris. Maybe I am just perpetuating the status quo. If so, I absolutely deserve criticism for my choices. We'll find out in a few years.

comment count unavailable comments

Syndicated 2014-10-30 00:45:32 from Matthew Garrett

29 Oct 2014 Hobart   » (Journeyer)

GMail locked down IMAP access at some point.

Wheee.

  • If you don't have IMAP + OAuth 2 you're locked out. Unless:
  • You change a Big Scary Setting "Allow less secure apps". The activation of which also generates a Big Scary Email to let you know you've done it. But then:
  • Your failed attempts triggered another lock on your account, which you need to inspect the IMAP negotiation to see. The first claims "Web login required! go to http://blahblah/100char-long-url", but, surprise! visiting the URL doesn't unlock you.
  • The second directs you to https://support.google.com/mail/answer/78754 where you learn about https://www.google.com/accounts/DisplayUnlockCaptcha which, when visited, does NOT display a CAPTCHA, but does unlock your account.
  • OAuth 2 is so ridiculously overdesigned the main editor of the spec loudly quit.
  • All of this could have been handled using client side certificates, without requiring any changes to the @#$% mail clients.
-----BEGIN PGP MESSAGE-----
Version: GnuPG v1
owGbwMvMwCSYyMCz7/ket1DG0+JJDCGBPvHuqSUK+WlpCrmVCjmJ5Xl6XB1uLAyC
TAxsrEwgaQYuTgGYHmsehgV/ekp/rlgx7dujh3trJkrvClXyWv+AYcGRNO89b/xt
xVd7q/u4lR2pjozkUwMA
=rXE4
-----END PGP MESSAGE-----
Some relevant URLs:

Syndicated 2014-10-29 20:36:11 from jon's blog

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!

Advogato User Stats
Users13999
Observer9884
Apprentice745
Journeyer2337
Master1029

New Advogato Members

Recently modified projects

11 Nov 2014 respin
20 Jun 2014 Ultrastudio.org
13 Apr 2014 Babel
13 Apr 2014 Polipo
19 Mar 2014 usb4java
8 Mar 2014 Noosfero
17 Jan 2014 Haskell
17 Jan 2014 Erlang
17 Jan 2014 Hy
17 Jan 2014 clj-simulacrum
17 Jan 2014 Haskell-Lisp
17 Jan 2014 lfe-disco
17 Jan 2014 clj-openstack
17 Jan 2014 lfe-openstack
17 Jan 2014 LFE

New projects

11 Nov 2014 respin
8 Mar 2014 Noosfero
17 Jan 2014 Haskell
17 Jan 2014 Erlang
17 Jan 2014 Hy
17 Jan 2014 clj-simulacrum
17 Jan 2014 Haskell-Lisp
17 Jan 2014 lfe-disco
17 Jan 2014 clj-openstack
17 Jan 2014 lfe-openstack
17 Jan 2014 LFE
1 Nov 2013 FAQ Linux
15 Apr 2013 Gramps
8 Apr 2013 pydiction
28 Mar 2013 Snapper