Recent blog entries for glyph

Letters To The Editor: Re: Email

Since I removed comments from this blog, I’ve been asking y’all to email me when you have feedback, with the promise that I’d publish the good bits. Today I’m making good on that for the first time, with this lovely missive from Adam Doherty:


I just wanted to say thank you. As someone who is never able to say no, your article on email struck a chord with me. I have had Gmail since the beginning, since the days of hoping for an invitation. And the day I received my invitation was the the last day my inbox was ever empty.

Prior to reading your article I had over 40,000 unread messages. It used to be a sort of running joke; I never delete anything. Realistically though was I ever going to do anything with them?

With 40,000 unread messages in your inbox, you start to miss messages that are actually important. Messages that must become tasks, tasks that must be completed.

Last night I took your advice; and that is saying something - most of the things I read via HN are just noise. This however spoke to me directly.

I archived everything older than two weeks, was down to 477 messages and kept pruning. So much of the email we get on a daily basis is also noise. Those messages took me half a second to hit archive and move on.

I went to bed with zero messages in my inbox, woke up with 21, archived 19, actioned 2 and then archived those.

Seriously, thank you so very much. I am unburdened.


First, I’d like to thank Adam for writing in. I really do appreciate the feedback.

Second, I wanted to post this here not in service of showcasing my awesomeness1, but rather to demonstrate that getting to the bottom of your email can have a profound effect on your state of mind. Even if it’s a running joke, even if you don’t think it’s stressing you out, there’s a good chance that, somewhere in the back of your mind, it is. After all, if you really don’t care, what’s stopping you from hitting select all / archive right now?

At the very least, if you did that, your mail app would load faster.


  1. although, let there be no doubt, I am awesome 

Syndicated 2016-05-03 06:06:00 from Deciphering Glyph

Email Isn’t The Thing You’re Bad At

I’ve been using the Internet for a good 25 years now, and I’ve been lucky enough to have some perspective dating back farther than that. The common refrain for my entire tenure here:

We all get too much email.

A New, New, New, New Hope

Luckily, something is always on the cusp of replacing email. AOL instant messenger will totally replace it. Then it was blogging. RSS. MySpace. Then it was FriendFeed. Then Twitter. Then Facebook.

Today, it’s in vogue to talk about how Slack is going to replace email. As someone who has seen this play out a dozen times now, let me give you a little spoiler:

Slack is not going to replace email.

But Slack isn’t the problem here, either. It’s just another communication tool.

The problem of email overload is both ancient and persistent. If the problem were really with “email”, then, presumably, one of the nine million email apps that dot the app-stores like mushrooms sprouting from a globe-spanning mycelium would have just solved it by now, and we could all move on with our lives. Instead, it is permanently in vogue1 to talk about how overloaded we all are.

If not email, then what?

If you have twenty-four thousand unread emails in your Inbox, like some kind of goddamn animal, what you’re bad at is not email, it’s transactional interactions.

Different communication media have different characteristics, but the defining characteristic of email is that it is the primary mode of communication that we use, both professionally and personally, when we are asking someone else to perform a task.

Of course you might use any form of communication to communicate tasks to another person. But other forms - especially the currently popular real-time methods - appear as a bi-directional communication, and are largely immutable. Email’s distinguishing characteristic is that it is discrete; each message is its own entity with its own ID. Emails may also be annotated, whether with flags, replied-to markers, labels, placement in folders, archiving, or deleting. Contrast this with a group chat in IRC, iMessage, or Slack, where the log is mostly2 unchangeable, and the only available annotation is “did your scrollbar ever move down past this point”; each individual message has only one bit of associated information. Unless you have catlike reflexes and an unbelievably obsessive-compulsive personality, it is highly unlikely that you will carefully set the “read” flag on each and every message in an extended conversation.

All this makes email much more suitable for communicating a task, because the recipient can file it according to their system for tracking tasks, come back to it later, and generally treat the message itself as an artifact. By contrast if I were to just walk up to you on the street and say “hey can you do this for me”, you will almost certainly just forget.

The word “task” might seem heavy-weight for some of the things that email is used for, but tasks come in all sizes. One task might be “click this link to confirm your sign-up on this website”. Another might be “choose a time to get together for coffee”. Or “please pass along my resume to your hiring department”. Yet another might be “send me the final draft of the Henderson report”.

Email is also used for conveying information: here are the minutes from that meeting we were just in. Here is transcription of the whiteboard from that design session. Here are some photos from our family vacation. But even in these cases, a task is implied: read these minutes and see if they’re accurate; inspect this diagram and use it to inform your design; look at these photos and just enjoy them.

So here’s the thing that you’re bad at, which is why none of the fifty different email apps you’ve bought for your phone have fixed the problem: when you get these messages, you aren’t making a conscious decision about:

  1. how important the message is to you
  2. whether you want to act on them at all
  3. when you want to act on them
  4. what exact action you want to take
  5. what the consequences of taking or not taking that action will be

This means that when someone asks you to do a thing, you probably aren’t going to do it. You’re going to pretend to commit to it, and then you’re going to flake out when push comes to shove. You’re going to keep context-switching until all the deadlines have passed.

In other words:

The thing you are bad at is saying ‘no’ to people.

Sometimes it’s not obvious that what you’re doing is saying ‘no’. For many of us — and I certainly fall into this category — a lot of the messages we get are vaguely informational. They’re from random project mailing lists, perhaps they’re discussions between other people, and it’s unclear what we should do about them (or if we should do anything at all). We hang on to them (piling up in our Inboxes) because they might be relevant in the future. I am not advocating that you have to reply to every dumb mailing list email with a 5-part action plan and a Scrum meeting invite: that would be a disaster. You don’t have time for that. You really shouldn’t have time for that.

The trick about getting to Inbox Zero3 is not in somehow becoming an email-reading machine, but in realizing that most email is worthless, and that’s OK. If you’re not going to do anything with it, just archive it and forget about it. If you’re subscribed to a mailing list where only 1 out of 1000 messages actually represents something you should do about it, archive all the rest after only answering the question “is this the one I should do something about?”. You can answer that question after just glancing at the subject; there are times when checking my email I will be hitting “archive” with a 1-second frequency. If you are on a list where zero messages are ever interesting enough to read in their entirety or do anything about, then of course you should unsubscribe.

Once you’ve dug yourself into a hole with thousands of “I don’t know what I should do with this” messages, it’s time to declare email bankruptcy. If you have 24,000 messages in your Inbox, let me be real with you: you are never, ever going to answer all those messages. You do not need a smartwatch to tell you exactly how many messages you are never going to reply to.

We’re In This Together, Me Especially

A lot of guidance about what to do with your email addresses email overload as a personal problem. Over the years of developing my tips and tricks for dealing with it, I certainly saw it that way. But lately, I’m starting to see that it has pernicious social effects.

If you have 24,000 messages in your Inbox, that means you aren’t keeping track or setting priorities on which tasks you want to complete. But just because you’re not setting those priorities, that doesn’t mean nobody is. It means you are letting availability heuristic - whatever is “latest and loudest” - govern access to your attention, and therefore your time. By doing this, you are rewarding people (or #brands) who contact you repeatedly, over inappropriate channels, and generally try to flood your attention with their priorities instead of your own. This, in turn, creates a culture where it is considered reasonable and appropriate to assume that you need to do that in order to get someone’s attention.

Since we live in the era of subtext and implication, I should explicitly say that I’m not describing any specific work environment or community. I used to have an email startup, and so I thought about this stuff very heavily for almost a decade. I have seen email habits at dozens of companies, and I help people in the open source community with their email on a regular basis. So I’m not throwing shade: almost everybody is terrible at this.

And that is the one way that email, in the sense of the tools and programs we use to process it, is at fault: technology has made it easier and easier to ask people to do more and more things, without giving us better tools or training to deal with the increasingly huge array of demands on our time. It’s easier than ever to say “hey could you do this for me” and harder than ever to just say “no, too busy”.

Mostly, though, I want you to know that this isn’t just about you any more. It’s about someone much more important than you: me. I’m tired of sending reply after reply to people asking to “just circle back” or asking if I’ve seen their email. Yes, I’ve seen your email. I have a long backlog of tasks, and, like anyone, I have trouble managing them and getting them all done4, and I frequently have to decide that certain things are just not important enough to do. Sometimes it takes me a couple of weeks to get to a message. Sometimes I never do. But, it’s impossible to be mad at somebody for “just checking in” for the fourth time when this is probably the only possible way they ever manage to get anyone else to do anything.

I don’t want to end on a downer here, though. And I don’t have a book to sell you which will solve all your productivity problems. I know that if I lay out some incredibly elaborate system all at once, it’ll seem overwhelming. I know that if I point you at some amazing gadget that helps you keep track of what you want to do, you’ll either balk at the price or get lost fiddling with all its knobs and buttons and not getting a lot of benefit out of it. So if I’m describing a problem that you have here, here’s what I want you to do.

Step zero is setting aside some time. This will probably take you a few hours, but trust me; they will be well-spent.

Email Bankruptcy

First, you need to declare email bankruptcy. Select every message in your Inbox older than 2 weeks. Archive them all, right now. In the past, you might have to worry about deleting those messages, but modern email systems pretty much universally have more storage than you’ll ever need. So rest assured that if you actually need to do anything with these messages, they’ll all be in your archive. But anything in your Inbox right now older than a couple of weeks is just never going to get dealt with, and it’s time to accept that fact. Again, this part of the process is not about making a decision yet, it’s just about accepting a reality.

Mailbox Three

One extra tweak I would suggest here is to get rid of all of your email folders and filters. It seems like many folks with big email problems have tried to address this by ever-finer-grained classification of messages, ever more byzantine email rules. At least, it’s common for me, when looking over someone’s shoulder to see 24,000 messages, it’s common to also see 50 folders. Probably these aren’t helping you very much.

In older email systems, it was necessary to construct elaborate header-based filtering systems so that you can later identify those messages in certain specific ways, like “message X went to this mailing list”. However, this was an incomplete hack, a workaround for a missing feature. Almost all modern email clients (and if yours doesn’t do this, switch) allow you to locate messages like this via search.

Your mail system ought to have 3 folders:

  1. Inbox, which you process to discover tasks,
  2. Drafts, which you use to save progress on replies, and
  3. Archive, the folder which you access only by searching for information you need when performing a task.

Getting rid of unnecessary folders and queries and filter rules will remove things that you can fiddle with.

Moving individual units of trash between different heaps of trash is not being productive, and by removing all the different folders you can shuffle your messages into before actually acting upon them you will make better use of your time spent looking at your email client.

There’s one exception to this rule, which is filters that do nothing but cause a message to skip your Inbox and go straight to the archive. The reason that this type of filter is different is that there are certain sources or patterns of message which are not actionable, but rather, a useful source of reference material that is only available as a stream of emails. Messages like that should, indeed, not show up in your Inbox. But, there’s no reason to file them into a specific folder or set of folders; you can always find them with a search.

Make A Place For Tasks

Next, you need to get a task list. Your email is not a task list; tasks are things that you decided you’re going to do, not things that other people have asked you to do5. Critically, you are going to need to parse e-mails into tasks. To explain why, let’s have a little arithmetic aside.

Let’s say it only takes you 45 seconds to go from reading a message to deciding what it really means you should do; so, it only takes 20 seconds to go from looking at the message to remembering what you need to do about it. This means that by the time you get to 180 un-processed messages that you need to do something about in your Inbox, you’ll be spending an hour a day doing nothing but remembering what those messages mean, before you do anything related to actually living your life, even including checking for new messages.

What should you use for the task list? On some level, this doesn’t really matter. It only needs one really important property: you need to trust that if you put something onto it, you’ll see it at the appropriate time. How exactly that works depends heavily on your own personal relationship with your computers and devices; it might just be a physical piece of paper. But for most of us living in a multi-device world, something that synchronizes to some kind of cloud service is important, so Wunderlist or Remember the Milk are good places to start, with free accounts.

Turn Messages Into Tasks

The next step - and this is really the first day of the rest of your life - start at the oldest message in your Inbox, and work forward in time. Look at only one message at a time. Decide whether this message is a meaningful task that you should accomplish.

If you decide a message represents a task, then make a new task on your task list. Decide what the task actually is, and describe it in words; don’t create tasks like “answer this message”. Why do you need to answer it? Do you need to gather any information first?

If you need to access information from the message in order to accomplish the task, then be sure to note in your task how to get back to the email. Depending on what your mail client is, it may be easier or harder to do this6, but in the worst case, following the guidelines above about eliminating unnecessary folders and filing in your email client, just put a hint into your task list about how to search for the message in question unambiguously.

Once you’ve done that:

Archive the message immediately.

The record that you need to do something about the message now lives in your task list, not your email client. You’ve processed it, and so it should no longer remain in your inbox.

If you decide a message doesn’t represent a task, then:

Archive the message immediately.

Do not move on to the next message until you have archived this message. Do not look ahead7. The presence of a message in your Inbox means you need to make a decision about it. Follow the touch-move rule with your email. If you skip over messages habitually and decide you’ll “just get back to it in a minute”, that minute will turn into 4 months and you’ll be right back where you were before.

Circling back to the subject of this post; once again, this isn’t really specific to email. You should follow roughly the same workflow when someone asks you to do a task in a meeting, or in Slack, or on your Discourse board, or wherever, if you think that the task is actually important enough to do. Note the slack timestamp and a snippet of the message so you can search for it again, if there is a relevant attachment. The thing that makes email different is really just the presence of an email box.

Banish The Blue Dot

Almost all email clients have a way of tracking “unread” messages; they cheerfully display counters of them. Ignore this information; it is useless. Messages have two states: in your inbox (unprocessed) and in your archive (processed). “Read” vs. “Unread” can be, at best, of minimal utility when resuming an interrupted scanning session. But, you are always only ever looking at the oldest message first, right? So none of the messages below it should be unread anyway...

Be Ruthless

As you try to start translating your flood of inbound communications into an actionable set of tasks you can actually accomplish, you are going to notice that your task list is going to grow and grow just as your Inbox was before. This is the hardest step:

Decide you are not going to do those tasks, and simply delete them. Sometimes, a task’s entire life-cycle is to be created from an email, exist for ten minutes, and then have you come back to look at it and then delete it. This might feel pointless, but in going through that process, you are learning something extremely valuable: you are learning what sorts of things are not actually important enough to do you do.

If every single message you get from some automated system provokes this kind of reaction, that will give you a clue that said system is wasting your time, and just making you feel anxious about work you’re never really going to get to, which can then lead to you un-subscribing or filtering messages from that system.

Tasks Before Messages

To thine own self, not thy Inbox, be true.

Try to start your day by looking at the things you’ve consciously decided to do. Don’t look at your email, don’t look at Slack; look at your calendar, and look at your task list.

One of those tasks, probably, is a daily reminder to “check your email”, but that reminder is there more to remind you to only do it once than to prevent you from forgetting.

I say “try” because this part is always going to be a challenge; while I mentioned earlier that you don’t want to unthinkingly give in to availability heuristic, you also have to acknowledge that the reason it’s called a “cognitive bias” is because it’s part of human cognition. There will always be a constant anxious temptation to just check for new stuff; for those of us who have a predisposition towards excessive scanning behavior have it more than others.

Why Email?

We all need to make commitments in our daily lives. We need to do things for other people. And when we make a commitment, we want to be telling the truth. I want you to try to do all these things so you can be better at that. It’s impossible to truthfully make a commitment to spend some time to perform some task in the future if, realistically, you know that all your time in the future will be consumed by whatever the top 3 highest-priority angry voicemails you have on that day are.

Email is a challenging social problem, but I am tired of email, especially the user interface of email applications, getting the blame for what is, at its heart, a problem of interpersonal relations. It’s like noticing that you get a lot of bills through the mail, and then blaming the state of your finances on the colors of the paint in your apartment building’s mail room. Of course, the UI of an email app can encourage good or bad habits, but Gmail gave us a prominent “Archive” button a decade ago, and we still have all the same terrible habits that were plaguing Outlook users in the 90s.

Of course, there’s a lot more to “productivity” than just making a list of the things you’re going to do. Some tools can really help you manage that list a lot better. But all they can help you to do is to stop working on the wrong things, and start working on the right ones. Actually being more productive, in the sense of getting more units of work out of a day, is something you get from keeping yourself healthy, happy, and well-rested, not from an email filing system.

You can’t violate causality to put more hours into the day, and as a frail and finite human being, there’s only so much work you can reasonably squeeze in before you die.

The reason I care a lot about salvaging email specifically is that it remains the best medium for communication that allows you to be in control of your own time, and by extension, the best medium for allowing people to do creative work.

Asking someone to do something via SMS doesn’t scale; if you have hundreds of unread texts there’s no way to put them in order, no way to classify them as “finished” and “not finished”, so you need to keep it to the number of things you can fit in short term memory. Not to mention the fact that text messaging is almost by definition an interruption - by default, it causes a device in someone’s pocket to buzz. Asking someone to do something in group chat, such as IRC or Slack, is similarly time-dependent; if they are around, it becomes an interruption, and if they’re not around, you have to keep asking and asking over and over again, which makes it really inefficient for the asker (or the asker can use a @highlight, and assume that Slack will send the recipient, guess what, an email).

Social media often comes up as another possible replacement for email, but its sort order is even worse than “only the most recent and most frequently repeated”. Messages are instead sorted by value to advertisers or likeliness to increase ‘engagement’”, i.e. most likely to keep you looking at this social media site rather than doing any real work.

For those of us who require long stretches of uninterrupted time to produce something good – “creatives”, or whatever today’s awkward buzzword for intersection of writers, programmers, graphic designers, illustrators, and so on, is – we need an inbound task queue that we can have some level of control over. Something that we can check at a time of our choosing, something that we can apply filtering to in order to protect access to our attention, something that maintains the chain of request/reply for reference when we have to pick up a thread we’ve had to let go of for a while. Some way to be in touch with our customers, our users, and our fans, without being constantly interrupted. Because if we don’t give those who need to communicate with such a tool, they’ll just blast @everyone messages into our slack channels and @mentions onto Twitter and texting us Hey, got a minute? until we have to quit everything to try and get some work done.

Questions about this post?

Go ahead and send me an email.


Acknowledgements

As always, any errors or bad ideas are certainly my own.

First of all, Merlin Mann, whose writing and podcasting were the inspiration, direct or indirect, for many of my thoughts on this subject; and who sets a good example because he won’t answer your email.

Thanks also to David Reid for introducing me to Merlin's work, as well as Alex Gaynor, Tristan Seligmann, Donald Stufft, Cory Benfield, Piët Delport, Amber Brown, and Ashwini Oruganti for feedback on drafts.


  1. Email is so culturally pervasive that it is literally in Vogue, although in fairness this is not a reference to the overflowing-Inbox problem that I’m discussing here. 

  2. I find the “edit” function in Slack maddening; although I appreciate why it was added, it’s easy to retroactively completely change the meaning of an entire conversation in ways that make it very confusing for those reading later. You don’t even have to do this intentionally; sometimes you make a legitimate mistake, like forgetting the word “not”, and the next 5 or 6 messages are about resolving that confusion; then, you go back and edit, and it looks like your colleagues correcting you are a pedantic version of Mr. Magoo, unable to see that you were correct the first time. 

  3. There, I said it. Are you happy now? 

  4. Just to clarify: nothing in this post should be construed as me berating you for not getting more work done, or for ever failing to meet any commitment no matter how casual. Quite the opposite: what I’m saying you need to do is acknowledge that you’re going to screw up and rather than hold a thousand emails in your inbox in the vain hope that you won’t, just send a quick apology and move on. 

  5. Maybe you decided to do the thing because your boss asked you to do it and failing to do it would cost you your job, but nevertheless, that is a conscious decision that you are making; not everybody gets to have “boss” priority, and unless your job is a true Orwellian nightmare, not everything your boss says in email is an instant career-ending catastrophe. 

  6. In Gmail, you can usually just copy a link to the message itself. If you’re using OS X’s Mail.app, you can use this Python script to generate links that, when clicked, will open the Mail app:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    from __future__ import (print_function, unicode_literals,
                            absolute_import, division)
    
    from ScriptingBridge import SBApplication
    import urllib
    
    mail = SBApplication.applicationWithBundleIdentifier_("com.apple.mail")
    
    for viewer in mail.messageViewers():
        for message in viewer.selectedMessages():
            for header in message.headers():
                name = header.name()
                if name.lower() == "message-id":
                    content = header.content()
                    print("message:" + urllib.quote(content))
    

    You can then paste these links into just about any task tracker; if they don’t become clickable, you can paste them into Safari’s URL bar or pass them to the open command-line tool. 

  7. The one exception here is that you can look ahead in the same thread to see if someone has already replied. 

Syndicated 2016-04-24 23:54:00 from Deciphering Glyph

Far too many things can stop the BLOB

It occurs to me that the lack of a standard, well-supported, memory-efficient interface for BLOBs in multiple programming languages is one of the primary driving factors of poor scalability characteristics of open source SaaS applications.

Applications like Gitlab, Redmine, Trac, Wordpress, and so on, all need to store potentially large files (“attachments”). Frequently, they elect to store these attachments (at least by default) in a dedicated filesystem directory. This leads to a number of tricky concurrency issues, as the filesystem has different (and divorced) concurrency semantics from the backend database, and resides only on the individual API nodes, rather than in the shared namespace of the attached database.

Some databases do support writing to BLOBs like files. Postgres, SQLite, and Oracle do, although it seems MySQL lags behind in this area (although I’d love to be corrected on this front). But many higher-level API bindings for these databases don’t expose support for BLOBs in an efficient way.

Directly using the filesystem, as opposed to a backing service, breaks the “expected” scaling behavior of the front-end portion of a web application. Using an object store, like Cloud Files or S3, is a good option to achieve high scalability for public-facing applications, but they creates additional deployment complexity.

So, as both a plea to others and a note to myself: if you’re writing a database-backed application that needs to store some data, please consider making “store it in the database as BLOBs” an option. And if your particular database client library doesn’t support it, consider filing a bug.

Syndicated 2016-04-20 01:01:00 from Deciphering Glyph

I think I’m using GitHub wrong.

I use a hodgepodge of https: and : (i.e. “ssh”) URL schemes for my local clones; sometimes I have a remote called “github” and sometimes I have one called “origin”. Sometimes I clone from a fork I made and sometimes I clone from the upstream.

I think the right way to use GitHub would instead be to always fork first, make my remote always be “origin”, and consistently name the upstream remote “upstream”. The problem with this, though, is that forks rapidly fall out of date, and I often want to automatically synchronize all the upstream branches.

Is there a script or a github option or something to synchronize a fork with upstream automatically, including all its branches and tags? I know there’s no comment field, but you can email me or reply on twitter.

Syndicated 2016-04-13 21:11:00 from Deciphering Glyph

Monads are simple to understand.

You can just think of them like a fleet of mysterious inverted pyramids ominously hovering over a landscape dotted with the tombs of ancient and terrible gods. Tombs from which they may awake at any moment if they are “evaluated”.

The IO loop is then the malevolent personification force of entropy, causing every action we take to push the universe further into the depths of uncontrolled chaos.

Simple!

Syndicated 2016-02-16 00:36:00 from Deciphering Glyph

This is an experiment with a subtly different format.

Right now when I want to say something quickly, I pop open the Twitter app and just type it. But I realized that the only reason I'm doing this rather than publishing on my own site is a UI affordance: Twitter lets me hit two keys to start composing, ⌘N and then two keys to finish, ⌘Return. Also, to tweet something, I don't need to come up with a title.

So I added an Emacs minor-mode that lets me hit a comparable number of keys; translated into the Emacs keyboard shortcut idiom it is of course Meta-Shift-Control-Hyper-C Backflip-LeftPedal-N to create a post and Bucky-Reverse-Erase-x Omega-Shift-Epsilon-j to publish it. Such posts will be distinguished by the presence of the "microblog" tag and the empty title.

(Also, the sky's the limit in terms of character-count.)

Feel free to let me know if you think the format works or not.

Syndicated 2016-02-10 04:29:00 from Deciphering Glyph

Stop Working So Hard

Recently, I saw this tweet where John Carmack posted to a thread on Hacker News about working hours. As this post propagated a good many bad ideas about working hours, particularly in the software industry, I of course had to reply. After some further back-and-forth on Twitter, Carmack followed up.

First off, thanks to Mr. Carmack for writing such a thorough reply in good faith. I suppose internet arguments have made me a bit cynical in that I didn't expect that. I still definitely don't agree, but I think there's a legitimate analysis of the available evidence there now, at least.

When trying to post this reply to HN, I was told that the comment was too long, and I suppose it is a bit long for a comment. So, without further ado, here are my further thoughts on working hours.

... if only the workers in Greece would ease up a bit, they would get the productivity of Germany. Would you make that statement?

Not as such, no. This is a hugely complex situation mixing together finance, culture, management, international politics, monetary policy, and a bunch of other things. That study, and most of the others I linked to, is interesting in that it confirms the general model of ability-to-work (i.e. "concentration" or "willpower") as a finite resource that you exhaust throughout the day; not in that "reduction in working hours" is a panacea solution. Average productivity-per-hour-worked would definitely go up.

However, I do believe (and now we are firmly off into interpretation-of-results territory, I have nothing empirical to offer you here) that if the average Greek worker were less stressed to the degree of the average German one, combining issues like both overwork and the presence of a constant catastrophic financial crisis in the news, yes; they'd achieve equivalent productivity.

Total net productivity per worker, discounting for any increases in errors and negative side effects, continues increasing well past 40 hours per week. ... Only when you are so broken down that even when you come back the following day your productivity per hour is significantly impaired, do you open up the possibility of actually reducing your net output.

The trouble here is that you really cannot discount for errors and negative side effects, especially in the long term.

First of all, the effects of overwork (and attendant problems, like sleep deprivation) are cumulative. While productivity on a given day increases past 40 hours per week, if you continue to work more, you productivity will continue to degrade. So, the case where "you come back the following day ... impaired" is pretty common... eventually.

Since none of this epidemiological work tracks individual performance longitudinally there are few conclusive demonstrations of this fact, but lots of compelling indications; in the past, I've collected quantitative data on myself (and my reports, back when I used to be a manager) that strongly corroborates this hypothesis. So encouraging someone to work one sixty-hour week might be a completely reasonable trade-off to address a deadline; but building a culture where asking someone to work nights and weekends as a matter of course is inherently destructive. Once you get into the area where people are losing sleep (and for people with other responsibilities, it's not hard to get to that point) overwork starts impacting stuff like the ability to form long-term memories, which means that not only do you do less work, you also consistently improve less.

Furthermore, errors and negative side effects can have a disproportionate impact.

Let me narrow the field here to two professions I know a bit about and are germane to this discussion; one, health care, which the original article here starts off by referencing, and two, software development, with which we are both familiar (since you already raised the Mythical Man Month).

In medicine, you can do a lot of valuable life-saving work in a continuous 100-hour shift. And in fact residents are often required to do so as a sort of professional hazing ritual. However, you can also make catastrophic mistakes that would cost a person their life; this happens routinely. Study after study confirms this, and minor reforms happen, but residests are still routinely abused and made to work in inhumane conditions that have catastrophic outcomes for their patients.

In software, defects can be extremely expensive to fix. Not only are they hard to fix, they can also be hard to detect. The phenomenon of the Net Negative Producing Programmer also indicates that not only can productivity drop to zero, it can drop below zero. On the anecdotal side, anyone who has had the unfortunate experience of cleaning up after a burnt-out co-worker can attest to this.

There are a great many tasks where inefficiency grows significantly with additional workers involved; the Mythical Man Month problem is real. In cases like these, you are better off with a smaller team of harder working people, even if their productivity-per-hour is somewhat lower.

The specific observation from the Mythical Man Month was that the number of communication links on a fully connected graph of employees increases geometrically whereas additional productivity (in the form of additional workers) increases linearly. If you have a well-designed organization, you can add people without requiring that your communication graph be fully connected.

But of course, you can't always do that. And specifically you can't do that when a project is already late: you already figured out how the work is going to be divided. Brooks' Law is formulated as: "Adding manpower to a late software project makes it later." This is indubitable. But one of the other famous quotes from this book is "The bearing of a child takes nine months, no matter how many women are assigned."

The bearing of a child also takes nine months no matter how many hours a day the woman is assigned to work on it. So "in cases like these" my contention is that you are not "better off with ... harder working people": you're just screwed. Some projects are impossible and you are better off acknowledging the fact that you made unrealistic estimates and you are going to fail.

You called my post “so wrong, and so potentially destructive”, which leads me to believe that you hold an ideological position that the world would be better if people didn’t work as long. I don’t actually have a particularly strong position there; my point is purely about the effective output of an individual.

I do, in fact, hold such an ideological position, but I'd like to think that said position is strongly justified by the data available to me.

But, I suppose calling it "so potentially destructive" might have seemed glib, if you are really just looking at the microcosm of what one individual might do on one given week at work, and not at the broader cultural implications of this commentary. After all, as this discussion shows, if you are really restricting your commentary to a single person on a single work-week, the case is substantially more ambiguous. So let me explain why I believe it's harmful, as opposed to merely being incorrect.

First of all, the problem is that you can't actually ignore the broader cultural implications. This is Hacker News, and you are John Carmack; you are practically a cultural institution yourself, and by using this site you are posting directly into the broader cultural implications of the software industry.

Software development culture, especially in the USA, suffers from a long-standing culture of chronic overwork. Startup developers in their metaphorical (and sometimes literal) garages are lionized and then eventually mythologized for spending so many hours on their programs. Anywhere that it is celebrated, this mythology rapidly metastasizes into a severe problem; the Death March

Note that although the term "death march" is technically general to any project management, it applies "originally and especially in software development", because this problem is worse in the software industry (although it has been improving in recent years) than almost anywhere else.

So when John Carmack says on Hacker News that "the effective output of an individual" will tend to increase with hours worked, that sends a message to many young and impressionable software developers. This is the exact same phenomenon that makes pop-sci writing terrible: your statement may be, in some limited context, and under some tight constraints, empirically correct, but it doesn't matter because when you expand the parameters to the full spectrum of these people's careers, it's both totally false and also a reinforcement of an existing cognitive bias and cultural trope.

I can't remember the name of this cognitive bias (and my Google-fu is failing me), but I know it exists. Let me call it the "I'm fine" bias. I know it exists because I have a friend who had the opportunity to go on a flight with NASA (on the Vomit Comet), and one of the more memorable parts of this experience that he related to me was the hypoxia test. The test involved basic math and spatial reasoning skills, but that test wasn't the point: the real test was that they had to notice and indicate when the oxygen levels were dropping and indicate that to the proctor. Concentrating on the test, many people failed the first few times, because the "I'm fine" bias makes it very hard to notice that you are impaired.

This is true of people who are drunk, or people who are sleep deprived, too. Their abilities are quantifiably impaired, but they have to reach a pretty severe level of impairment before they notice.

So people who are overworked might feel generally bad but they don't notice their productivity dropping until they're way over the red line.

Combine this with the fact that most people, especially those already employed as developers, are actually quite hard-working and earnest (laziness is much more common as a rhetorical device than as an actual personality flaw) and you end up in a scenario where a good software development manager is responsible much more for telling people to slow down, to take breaks, and to be more realistic in their estimates, than to speed up, work harder, and put in more hours.

The trouble is this goes against the manager's instincts as well. When you're a manager you tend to think of things in terms of resources: hours worked, money to hire people, and so on. So there's a constant nagging sensation for a manager to encourage people to work more hours in a day, so you can get more output (hours worked) out of your input (hiring budget). The problem here is that while all hours are equal, some hours are more equal than others. Managers have to fight against their own sense that a few more worked hours will be fine, and their employees' tendency to overwork because they're not noticing their own burnout, and upper management's tendency to demand more.

It is into this roiling stew of the relentless impulse to "work, work, work" that we are throwing our commentary about whether it's a good idea or not to work more hours in the week. The scales are weighted very heavily on one side already - which happens to be the wrong side in the first place - and while we've come back from the unethical and illegal brink we were at as an industry in the days of ea_spouse, software developers still generally work far too much.

If we were fighting an existential threat, say an asteroid that would hit the earth in a year, would you really tell everyone involved in the project that they should go home after 35 hours a week, because they are harming the project if they work longer?

Going back to my earlier explanation in this post about the cumulative impact of stress and sleep deprivation - if we were really fighting an existential threat, the equation changes somewhat. Specifically, the part of the equation where people can have meaningful downtime.

In such a situation, I would still want to make sure that people are as well-rested and as reasonably able to focus as they possibly can be. As you've already acknowledged, there are "increases in errors" when people are working too much, and we REALLY don't want the asteroid-targeting program that is going to blow apart the asteroid that will wipe out all life on earth to have "increased errors".

But there's also the problem that, faced with such an existential crisis, nobody is really going to be able to go home and enjoy a fine craft beer and spend some time playing with their kids and come back refreshed at 100% the next morning. They're going to be freaking out constantly about the comet, they're going to be losing sleep over that whether they're working or not. So, in such a situation, people should have the option to go home and relax if they're psychologically capable of doing so, but if the option for spending their time that makes them feel the most sane is working constantly and sleeping under their desk, well, that's the best one can do in that situation.

This metaphor is itself also misleading and out of place, though. There is also a strong cultural trend in software, especially in the startup ecosystem, to over-inflate the importance of what the company is doing - it is not "changing the world" to create a website for people to order room-service for their dogs - and thereby to catastrophize any threat to that goal. The vast majority of the time, it is inappropriate to either to sacrifice -- or to ask someone else to sacrifice -- health and well-being for short-term gains. Remember, given the cumulative effects of overwork, that's all you even can get: short-term gains. This sacrifice often has a huge opportunity cost in other areas, as you can't focus on more important things that might come along later.

In other words, while the overtime situation is complex and delicate in the case of an impending asteroid impact, there's also the question of whether, at the beginning of Project Blow Up The Asteroid, I want everyone to be burnt out and overworked from their pet-hotel startup website. And in that case, I can say, unequivocally, no. I want them bright-eyed and bushy-tailed for what is sure to be a grueling project, no matter what the overtime policy is, that absolutely needs to happen. I want to make sure they didn't waste their youth and health on somebody else's stock valuation.

Syndicated 2016-01-17 04:06:00 from Deciphering Glyph

Taking Issue With Paul Graham’s Premises

Paul Graham has recently penned an essay on income inequality. Holly Wood wrote a pretty good critique of this piece, but it is addressing a huge amount of pre-existing context, as well as ignoring large chunks of the essay that nominally agree.

Eevee has already addressed how the “simplified version” didn’t substantively change anything that the longer one was saying, so I’m not going to touch on that much. However, it’s worth noting that the reason Paul Graham says he wrote that is that he thinks that “adventurous interpretations” are to blame for criticism of his “controversial” writing; in other words, that people are misinterpreting his argument because the conclusion is politically unacceptable.

Personally, I am deeply ambivalent about the political implications of his writing. I believe, strongly, in the power of markets to solve social problems that planning cannot. I don’t think “capitalist” is a slur. But, neither do I believe that markets are inherently good; capitalist economic theory assumes an environment of equal initial opportunity which demonstrably does not exist. I am, personally, very open to ideas like the counter-intuitive suggestion that economic inequality might not be such a bad thing, if the case were well-made. I say this because I want to be clear that what bothers me about Paul Graham’s writing is not its “controversial” content.

What bothers me about Paul Graham’s writing is that the reasoning is desperately sloppy. I sometimes mentor students on their writing, and if this mess were handed to me by one of my mentees, I would tell them to rewrite it from scratch. Although the “thanks” section at the end of each post on his blog implies that he gets editing feedback, it must be so uncritical of his assumptions as to be useless.

Initially, my entirely subjective impression is that Paul Graham is not a credible authority on the topic of income inequality. He doesn’t demonstrate any grasp of its causes, or indeed the substance of any proposed remedy. I would say that he is attacking a straw-man, but he doesn’t even bother to assemble the straw-man first, saying only:

... the thing that strikes me most about the conversations I overhear ...

What are these “conversations” he “overhears”? What remedies are they proposing which would target income inequality by eliminating any possible reward for entrepreneurship? Nothing he’s arguing against sounds like anything I’ve ever heard someone propose, and I spend a lot of time in the sort of conversation that he imagines overhearing.

His claim to credentials in this area doesn’t logically follow, either:

I’ve become an expert on how to increase economic inequality, and I’ve spent the past decade working hard to do it. ... In the real world you can create wealth as well as taking it from others.

This whole passage is intended to read logically as: “I increase economic inequality, which you might assume is bad, but it’s not so bad, because it creates wealth in the process!”.

Hopefully what PG has actually been trying to become an expert in is creating wealth (a net positive), not in increasing economic inequality (a negative, or, at best, neutral by-product of that process). If he is focused on creating wealth, as so much of the essay purports he is, then it does not necessarily follow that the startup founders will be getting richer than their customers.

Of course, many goods and services provide purely subjective utility to their consumers. But in a properly functioning market, the whole point of of engaging in transactions is to improve efficiency.

To borrow from PG’s woodworker metaphor:

A woodworker creates wealth. He makes a chair, and you willingly give him money in return for it.

I might be buying that chair to simply appreciate its chair-ness and bask in the sublime beauty of its potential for being sat-in. But equally likely, I’m buying that chair for my office, where I will sit in it, and produce some value of my own while thusly seated. If the woodworker hadn’t created that chair for me, I’d have to do it myself, and it (presumably) would have been more expensive in terms of time and resources. Therefore, by producing the chair more efficiently, the woodworker would have increased my wealth as well as his own, by increasing the delta between my expenses (which include the chair) and my revenue (generated by tripping the light pythonic or whatever).

Note that “more efficient” doesn’t necessarily mean “lower cost”. Sitting in a chair is a substantial portion of my professional activity. A higher-quality chair that costs the same amount might improve the quality of my sitting experience, which might improve my own productivity at writing code, allowing me to make more income myself.

Even if the value of the chair is purely subjective, it is still an expense, and making it more efficient to make chairs would still increase my net worth.

Therefore, if startups really generated wealth so reliably, rather than simply providing a vehicle for transferring it, we would expect to see decreases in economic inequality, as everyone was able to make the most efficient use of their own time and resources, and was able to make commensurately more money.

... variation in productivity is accelerating ...

Counterpoint: no it isn’t. It’s not even clear that it’s increasing, let alone that its derivative is increasing. This doesn’t appear to be something that much data is collected on, and in the absence of any citation, I have to assume that it is a restatement of the not only false, but harmful, frequently debunked 10x programmer myth.

Most people who get rich tend to be fairly driven.

This sounds obvious: of course, if you “get” rich, you have to be doing something to “get” that way. First of all, this ignores many people who simply are rich, who get their wealth from inheritance or rent-seeking, which I think is discounting a pretty substantial number of rich people.

But it is implicitly making a bolder claim: that people who get rich are more driven than other people; i.e. those who don’t get rich.

In my personal experience, the opposite is true. People who get rich do work hard, and are determined, but really poor people work a lot harder and are a lot more determined. A startup founder who is eating rice and beans to try to keep their burn rate low and their runway long may indeed be making sacrifices and working hard. They may be experiencing emotional turmoil. But implicitly, such a person always has the safety net of high-value skills they can use to go find another job if their attempt doesn’t work out.

But don’t take my word for it; think about it for yourself. Consider a single mother working three minimum-wage jobs and eating rice and beans because that’s the only way she can feed her children. Would you imagine she is less determined and will work less hard to keep her children alive than our earlier hypothetical startup founder would work to keep their valuation high?

One of the most important principles in Silicon Valley is that “you make what you measure.” It means that if you pick some number to focus on, it will tend to improve, but that you have to choose the right number, because only the one you choose will improve

A closely-related principle from outside of Silicon Valley is Goodhart’s Law. It states, “When a measure becomes a target, it ceases to be a good measure”. If you pick some number to focus on, the number as measured will improve, but since it’s often cheaper to subvert the mechanisms for measuring than to actually make progress, the improvement will often be meaningless. It is a dire mistake to assume that as long as you select the right metric in a political process that you can really improve it.

The Silicon Valley version - assuming the number will genuinely increase, and all you have to do is choose the right one - really only works when the things producing the numbers are computers, and the people collecting them have clearly circumscribed reasons not to want to cheat. This is why people tend to select numbers like income inequality to optimize: it gives people a reason to want to avoid cheating.

It’s still possible to get rich by buying politicians (though even that is harder than it was in 1880)

The sunlight foundation published a report in 2014, indicating that the return on investment of political spending is approximately 76,000%. While the sunlight foundation didn't exist in 1880, a similar report in 2009 suggested this number was 22,000% a few years ago, suggesting this number is going up, not down; i.e. over time, it is getting easier, not harder, to get rich by buying politicians.

Meanwhile, the ROI of venture capital, while highly variable, is, on average, at least two orders of magnitude lower than that. While outright “buying” a politican is a silly straw-man, manipulating goverment remains a far more reliable and lucrative source of income than doing anything productive, with technology or otherwise.

The rate at which individuals can create wealth depends on the technology available to them, and that grows exponentially.

In what sense does technology grow “exponentially”? Let’s look at a concrete example of increasing economic output that’s easy to quantify: wheat yield per acre. What does the report have to say about it?

Winter wheat yields have trended higher since 1960. We find that a linear trend is the best fit to actual average yields over that period and that yields have increased at a rate of 0.4 bushel per acre per year...

(emphasis mine)

In other words, when I go looking for actual, quantifiable evidence of the benefit of improving technology, it is solidly linear, not exponential.

What have we learned?

Paul Graham frequently writes essays in which he makes quantifiable, falsifiable claims (technology growth is “exponential”, an “an exponential curve that has been operating for thousands of years”, “there are also a significant number who get rich by creating wealth”) but rarely, if ever, provides any data to back up those claims. When I look for specific examples to test his claims, as with the crop yield examples above, it often seems to me that his claims are exaggerated, entirely imagined, or, worse yet, completely backwards from the truth of the matter.

Graham frequently uses the language of rationality, data, science, empiricism, and mathematics. This is a bad habit shared by many others immersed in Silicon Valley culture. However, simply adopting an unemotional tone and co-opting words like “exponential” and “factor”, or almost-quantifiable weasel words like “most” and “significant”, is no substitute for actually doing the research, assembling the numbers, fitting the curves, and trying to understand if the claims are valid.

This continues to strike me as a real shame, because PG’s CV clearly shows he is an intelligent and determined fellow, and he certainly has a fair amount of money, status, and power at this point. More importantly, his other writings clearly indicate he cares a lot about things like “niceness” and fairness. If he took the trouble to more humbly approach socioeconomic problems like income inequality and poverty, really familiarize himself with existing work in the field, he could put his mind to a solution. He might be able to make some real change. Instead, he continues to use misleading language and rhetorical flourishes to justify decisions he’s already made. In doing so, he remains, regrettably, a blowhard.

Syndicated 2016-01-05 16:29:00 from Deciphering Glyph

Your Text Editor Is Malware

Are you a programmer? Do you use a text editor? Do you install any 3rd-party functionality into that text editor?

If you use Vim, you’ve probably installed a few vimballs from vim.org, a website only available over HTTP. Vimballs are fairly opaque; if you’ve installed one, chances are you didn’t audit the code.

If you use Emacs, you’ve probably installed some packages from ELPA or MELPA using package.el; in Emacs’s default configuration, ELPA is accessed over HTTP, and until recently MELPA’s documentation recommended HTTP as well.

When you install un-signed code into your editor that you downloaded over an unencrypted, unauthenticated transport like HTTP, you might as well be installing malware. This is not a joke or exaggeration: you really might be.1 You have no assurance that you’re not being exploited by someone on your local network, by someone on your ISP’s network, the NSA, the CIA, or whoever else.

The solution for Vim is relatively simple: use vim-plug, which fetches stuff from GitHub exclusively via HTTPS. I haven’t audited it conclusively but its relatively small codebase includes lots of https:// and no http:// or git://2 that I could see.

I’m relatively proud of my track record of being a staunch advocate for improved security in text editor package installation. I’d like to think I contributed a little to the fact that MELPA is now available over HTTPS and instructs you to use HTTPS URLs.

But the situation still isn’t very good in Emacs-land. Even if you manage to get your package sources from an authenticated source over HTTPS, it doesn’t matter, because Emacs won’t verify TLS.

Although package signing is implemented, practically speaking, none of the packages are signed.3 Therefore, you absolutely cannot trust package signing to save you. Plus, even if the packages were signed, why is it the NSA’s business which packages you’re installing, anyway? TLS is shorthand for The Least Security (that is acceptable); whatever other security mechanisms, like package signing, are employed, you should always at least have HTTPS.

With that, here’s my unfortunately surprise-filled step-by-step guide to actually securing Emacs downloads, on Windows, Mac, and Linux.

Step 1: Make Sure Your Package Sources Are HTTPS Only

By default, Emacs ships with its package-archives list as '(("gnu" . "http://elpa.gnu.org/packages/")), which is obviously no good. You will want to both add MELPA (which you surely have done anyway, since it’s where all the actually useful packages are) and change the ELPA URL itself to be HTTPS. Use M-x customize-variable to change package-archives to:

1
2
`(("gnu" . "https://elpa.gnu.org/packages/")
  ("melpa" . "https://melpa.org/packages/"))

Step 2: Turn On TLS Trust Checking

There’s another custom variable in Emacs, tls-checktrust, which checks trust on TLS connections. Go ahead and turn that on, again, via M-x customize-variable tls-checktrust.

Step 3: Set Your Trust Roots

Now that you’ve told Emacs to check that the peer’s certificate is valid, Emacs can’t successfully fetch HTTPS URLs any more, because Emacs does not distribute trust root certificates. Although the set of cabforum certificates are already probably on your computer in various forms, you still have to acquire them in a format usable by Emacs somehow. There are a variety of ways, but in the interests of brevity and cross-platform compatibility, my preferred mechanism is to get the certifi package from PyPI, with python -m pip install --user certifi or similar. (A tutorial on installing Python packages is a little out of scope for this post, but hopefully my little website about this will help you get started.)

At this point, M-x customize-variable fails us, and we need to start just writing elisp code; we need to set tls-program to a string computed from the output of running a program, and if we want this to work on Windows we can’t use Bourne shell escapes. Instead, do something like this in your .emacs or wherever you like to put your start-up elisp:4

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
(let ((trustfile
       (replace-regexp-in-string
        "\\\\" "/"
        (replace-regexp-in-string
         "\n" ""
         (shell-command-to-string "python -m certifi")))))
  (setq tls-program
        (list
         (format "gnutls-cli%s --x509cafile %s -p %%p %%h"
                 (if (eq window-system 'w32) ".exe" "") trustfile))))

This will run gnutls-cli on UNIX, and gnutls-cli.exe on Windows.

You’ll need to install the gnutls-cli command line tool, which of course varies per platform:

  • On OS X, of course, Homebrew is the best way to go about this: brew install gnutls will install it.
  • On Windows, the only way I know of to get GnuTLS itself over TLS is to go directly to this mirror. Download one of these binaries and unzip it next to Emacs in its bin directory.
  • On Debian (or derivatives), apt-get install gnutls-bin
  • On Fedora (or derivatives), yum install gnutls-utils

Great! Now we’ve got all the pieces we need: a tool to make TLS connections, certificates to verify against, and Emacs configuration to make it do those things. We’re done, right?

Wrong!

Step 4: TRUST NO ONE

It turns out there are two ways to tell Emacs to really actually really secure the connection (really), but before I tell you the second one or why you need it, let’s first construct a little test to see if the connection is being properly secured. If we make a bad connection, we want it to fail. Let’s make sure it does.

This little snippet of elisp will use the helpful BadSSL.com site to give you some known-bad and known-good certificates (assuming nobody’s snooping on your connection):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
(if (condition-case e
        (progn
          (url-retrieve "https://wrong-host.badssl.com/"
                        (lambda (retrieved) t))
          (url-retrieve "https://self-signed.badssl.com/"
                        (lambda (retrieved) t))
          t)
      ('error nil))
    (error "tls misconfigured")
  (url-retrieve "https://badssl.com"
                (lambda (retrieved) t)))

If you evaluate it and you get an error, either your trust roots aren’t set up right and you can’t connect to a valid site, or Emacs is still blithely trusting bad certificates. Why might it do that?

Step 5: Configure the Other TLS Verifier

One of Emacs’s compile-time options is whether to link in GnuTLS or not. If GnuTLS is not linked in, it will use whatever TLS program you give it (which might be gnutls-cli or openssl s_client, but since only the most recent version of openssl s_client can even attempt to verify certificates, I’d recommend against it). That is what’s configured via tls-checktrust and tls-program above.

However, if GnuTLS is compiled in, it will totally ignore those custom variables, and honor a different set: gnutls-verify-error and gnutls-trustfiles. To make matters worse, installing the packages which supply the gnutls-cli program also install the packages which might satisfy Emacs’s dynamic linking against the GnuTLS library, which means this code path could get silently turned on because you tried to activate the other one.

To give these variables the correct values as well, we can re-visit the previous trust setup:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
(let ((trustfile
       (replace-regexp-in-string
        "\\\\" "/"
        (replace-regexp-in-string
         "\n" ""
         (shell-command-to-string "python -m certifi")))))
  (setq tls-program
        (list
         (format "gnutls-cli%s --x509cafile %s -p %%p %%h"
                 (if (eq window-system 'w32) ".exe" "") trustfile)))
  (setq gnutls-verify-error t)
  (setq gnutls-trustfiles (list trustfile)))

Now it ought to be set up properly. Try the example again from Step 4 and it ought to work. It probably will. Except, um...

Appendix A: Windows is Weird

Presently, the official Windows builds of Emacs seem to be linked against version 3.3 of GnuTLS rather than the latest 3.4. You might need to download the latest micro-version of 3.3 instead. As far as I can tell, it’s supposed to work with the command-line tools (and maybe it will for you) but for me, for some reason, Emacs could not parse gnutls-cli.exe’s output no matter what I did. This does not appear to be a universal experience, others have reported success; your mileage may vary.

Conclusion

We nerds sometimes mock the “normals” for not being as security-savvy as we are. Even if we’re considerate enough not to voice these reactions, when we hear someone got malware on their Windows machine, we think “should have used a UNIX, not Windows”. Or “should have been up to date on your patches”, or something along those lines.

Yet, nerdy tools that download and execute code - Emacs in particular - are shockingly careless about running arbitrary unverified code from the Internet. And we are often equally shockingly careless to use them, when we should know better.

If you’re an Emacs user and you didn’t fully understand this post, or you couldn’t get parts of it to work, stop using package.el until you can get the hang of it. Get a friend to help you get your environment configured properly. Since a disproportionate number of Emacs users are programmers or sysadmins, you are a high-value target, and you are risking not only your own safety but that of your users if you don’t double-check that your editor packages are coming from at least cursorily authenticated sources.

If you use another programmer’s text editor or nerdy development tool that is routinely installing software onto your system, make sure that if it’s at least securing those installations with properly verified TLS.


  1. Technically speaking of course you might always be installing malware; no defense is perfect. And HTTPS is a fairly weak one at that. But is significantly stronger than “no defense at all”. 

  2. Never, ever, clone a repository using git:// URLs. As explained in the documentation: “The native transport (i.e. git:// URL) does no authentication and should be used with caution on unsecured networks.”. You might have heard that git uses a “cryptographic hash function” and thought that had something to do with security: it doesn’t. If you want security you need signed commits, and even then you can never really be sure

  3. Plus, MELPA accepts packages on the (plain-text-only) Wiki, which may be edited by anyone, and from CVS servers, although they’d like to stop that. You should probably be less worried about this, because that’s a link between two datacenters, than about the link between you and MELPA, which is residential or business internet at best, and coffee-shop WiFi at worst. But still maybe be a bit worried about it and go comment on that bug. 

  4. Yes, that let is a hint that this is about to get more interesting... 

Syndicated 2015-11-12 08:51:00 from Deciphering Glyph

Python Option Types

NULL has, rightly, been called a “billion dollar mistake”. If that is so, then None is a hundred million dollar mistake, at least.

Forgetting, for the moment, about the numerous pitfalls of a C-style NULL, Python’s None has a very significant problem of its own. Of course, the problem is not None itself; the fact that the default return value of a function is None is (in my humble opinion, at least) fine; it’s just a marker that means “nothing to see here, move along”. The problem arises from values which might be None, or might be some other, useful thing.

APIs present values in a number of ways. A value might be exposed as the return value of a method, an attribute of an object, or an entry in a collection data structure such as a list or dictionary. If a value presented in an API might be None or it might be something else, every single client of that API needs to check the type of the value that it’s calling before doing anything.

Since it is rude to use a “simple suite” (a line of code like if x: y() with no newline after the colon), that means the minimum number of lines of code for interacting with your API is now 4: one for the if statement, one for the then clause, and one for the else clause.

Worse than the code-bloat required here, the default behavior, if your forget to do this checking, is that it works sometimes (like when you’re testing it), and that other times (like when you put it into production), you get an unhelpful exception like this:

1
2
3
4
Traceback (most recent call last):
  File "<your code>", line 1, in <module>
    value.method()
AttributeError: 'NoneType' object has no attribute 'method'

Of course NoneType doesn’t have an attribute called method, but why is value a NoneType? Science may never know.

In languages with static type declarations, there’s a concept of an Option type. Simply put, in a language with option types, the API declares its result value as “maybe something, maybe null”, and then if the caller fails to account for the “null” case, it is a compile-time error.

Python doesn’t have this kind of ahead-of-time checking though, so what are we to do? In order of my own personal preference, here are three strategies for getting rid of maybe-None-maybe-not data types in your Python code.

1: Just Say No

Some APIs - especially those that require building deeply complex nested trees of data structures - use None as a way to provide a convenient mechanism for leaving a space for a future value to be filled out. Using such an API sometimes looks like this:

1
2
3
4
5
value = MyValue()
value.foo = 1
value.bar = 2
value.baz = 3
value.do_something()

In this case, the way to get rid of None is simple: just stop doing that. Instead of .foo having an implicit type of “int or None”, just make it always be int, like this:

1
2
3
4
value = MyValue(
    foo=1, bar=2, baz=3
)
value.do_something()

Or, if do_something is the only method you’re going to call with this data structure, opt for the even simpler:

1
do_something_with_value(foo=1, bar=2, baz=3)

If MyValue has dozens of fields that need to be initialized with different subsystems, so you actually want to pass around a partially-initialized value object, consider the Builder pattern, which would make this code look like the following:

1
2
3
4
5
6
builder = MyValueBuilder()
foo = builder.with_foo(1)
bar = foo.with_bar(2)
baz = bar.with_baz(3)
value = baz.build()
value.do_something()

This acknowledges that the partially-constructed MyValueBuilder is a different type than MyValue, and, crucially, if you look at its API documentation, it does not misleadingly appear to support the do_something operation which in fact requires foo, bar, and baz all be initialized.

Wherever possible, just require values of the appropriate type be passed in in the first place, and don’t ever default to None.

2: Make The Library Handle The Different States, Not The Caller

Sometimes, None is a placeholder indicating an implicit state machine, where the states are “initialized” and “not initialized”.

For example, imagine an RPC Client which may or may not be connected. You might have an API you have to use like this:

1
2
3
4
5
6
7
8
def ask_question(self):
    message = {"question":
               "what is the air speed velocity of an unladen swallow?"}
    if self.rpc_client.connection is None:
        self.outbound_messages.append(message)
        self.rpc_client.when_connected(self.flush_outbound_messages)
    else:
        self.rpc_client.send_message(message)

By leaking through the connection attribute, rpc_client is providing an incomplete abstraction and foisting off too much work to its callers. Instead, callers should just have to do this:

1
2
3
4
def ask_question(self):
    message = {"question":
               "what is the air speed velocity of an unladen swallow?"}
    self.rpc_client.send_message(message)

Internally, rpc_client still has to maintain a private _connection attribute which may or may not be present, but by hiding this implementation detail, we centralize the complexity associated with managing that state in one place, rather than polluting every caller with it, which makes for much better API design.

Hopefully you agree that this is a good idea, but this is more what to do rather than how to do it, so here are two strategies for achieving “make the library do it”:

2a: Use Placeholder Implementations

However, rather than using None as the _connection attribute, rpc_client’s internal implementation could instead use a placeholder which provides the same interface. Let’s the expected interface of _connection in this case is just a send method that takes some bytes. We could initialize it initially with this:

1
2
3
4
5
class NotConnectedConnection(object):
    def __init__(self):
        self._buffer = b""
    def send(self, data):
        self._buffer += b

This allows the code within rpc_client itself to blindly call self._connection.send whether it’s actually connected already or not; upon connection, it could un-buffer that data onto the ready connection.

2b: Use an Explicit State Machine

Sometimes, you actually have quite a few states you need to manage, and this starts looking like an ugly proliferation of lots of weird little flags; various values which may be True or False, or None or not-None.

In those cases it’s best to be clear about the fact that there are multiple states, and enumerating the valid transitions between them. Then, expose a method which always has the same signature and return type.

Using a state machine library like ClusterHQ’s “machinist” or my Automat can allow you to automate the process of checking all the states.

Automat, in particular, goes to great lengths to make your objects look like plain old Python objects. Providing an input is just calling a method on your object: my_state_machine.provide_an_input() and receiving an output is just examining its return value. So it’s possible to refactor your code away from having to check for None by using this library.

For example, the connection-handling example above could be dealt with in the RPC client using Automat like so:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
class RPCClient(object):
    _machine = MethodicalMachine()
    @_machine.state()
    def _connected(self):
        "We have a connection."
    @_machine.state()
    def _not_connected(self):
        "We have no connection."
    @_machine.input()
    def send_message(self, message):
        "Send a message now if we're connected, or later if not."
    @_machine.output()
    def _send_message_now(self, message):
        "Send a message immediately."
        # ...
    @_machine.output()
    def _send_message_later(self, message):
        "Enqueue a message for when we are connected."
        # ...
    @_machine.output()
    def _send_queued_messages(self, connection):
        "send all messages enqueued by _send_message_later"
        # ...
    @_machine.input()
    def connection_established(self, connection):
        "A connection was established."
    _connected.upon(send_message, enter=_connected, output=[_send_message_now])
    _not_connected.upon(send_message, enter=_connected,
                        output=[_send_message_later])
    _not_connected.upon(connection_established, enter=_connected,
                        output=[_send_queued_messages])

3: Make The Caller Account For All Cases With Callbacks

The absolute lowest-level way to deal with multiple possible states is to, instead of exposing an attribute that the caller has to retrieve and test, expose a function which takes multiple callbacks, one for each case. This way you can provide clear and immediate error feedback if the caller forgets to handle a case - meaning that they forgot to pass a callback. This is only really suitable if you can’t think of any other way to handle it, but it does at least provide a very clear expectation of the interface.

To re-use our connection-handling logic above, you might do something like this:

1
2
3
4
5
6
7
def try_to_send(self):
    def connection_present(connection):
        connection.send_message(my_message)
    def connection_not_present():
        self.enqueue_message_for_later(my_message)
    self.rpc_client.with_connection(connection_present,
                                    connection_not_present)

Notice that while this is slightly awkward, it has the nice property that the connection_present callback receives the value that it needs, whereas the connection_not_present callback doesn’t receive anything, because there’s nothing for it to receive.

The Zeroth Strategy

Of course, the best strategy, if you can get away with it, may be the non-strategy: refuse the temptation to provide a maybe-None, just raise an exception when you are in a state where you can’t handle. If you intentionally raise a specific, meaningful exception type with a good error message, it will be a lot more pleasant to use your API than if return codes that the caller has to check for pop up all over the place, None or otherwise.

The Principle Of The Thing

The underlying principle here is the same: when designing an API, always provide a consistent interface to your callers. An API is 1 to N: you have 1 API implementation to N callers. As N→∞, it becomes more important that any task that needs performing frequently is performed on the “1” side of the equation, and that you don’t force callers to repeat the same error checking over and over again. None is just one form of this, but it is a particularly egregious form.

Syndicated 2015-09-17 22:04:00 from Deciphering Glyph

18 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!