Older blog entries for glyph (starting at number 32)

The One Python Library Everyone Needs

Do you write programs in Python? You should be using attrs.

Why, you ask? Don’t ask. Just use it.

Okay, fine. Let me back up.

I love Python; it’s been my primary programming language for 10+ years and despite a number of interesting developments in the interim I have no plans to switch to anything else.

But Python is not without its problems. In some cases it encourages you to do the wrong thing. Particularly, there is a deeply unfortunate proliferation of class inheritance and the God-object anti-pattern in many libraries.

One cause for this might be that Python is a highly accessible language, so less experienced programmers make mistakes that they then have to live with forever.

But I think that perhaps a more significant reason is the fact that Python sometimes punishes you for trying to do the right thing.

The “right thing” in the context of object design is to make lots of small, self-contained classes that do one thing and do it well. For example, if you notice your object is starting to accrue a lot of private methods, perhaps you should be making those “public”1 methods of a private attribute. But if it’s tedious to do that, you probably won’t bother.

Another place you probably should be defining an object is when you have a bag of related data that needs its relationships, invariants, and behavior explained. Python makes it soooo easy to just define a tuple or a list. The first couple of times you type host, port = ... instead of address = ... it doesn’t seem like a big deal, but then soon enough you’re typing [(family, socktype, proto, canonname, sockaddr)] = ... everywhere and your life is filled with regret. That is, if you’re lucky. If you’re not lucky, you’re just maintaining code that does something like values[0][7][4][HOSTNAME][“canonical”] and your life is filled with garden-variety pain rather than the more complex and nuanced emotion of regret.


This raises the question: is it tedious to make a class in Python? Let’s look at a simple data structure: a 3-dimensional cartesian coordinate. It starts off simply enough:

1
class Point3D(object):

So far so good. We’ve got a 3 dimensional point. What next?

1
2
class Point3D(object):
    def __init__(self, x, y, z):

Well, that’s a bit unfortunate. I just want a holder for a little bit of data, and I’ve already had to override a special method from the Python runtime with an internal naming convention? Not too bad, I suppose; all programming is weird symbols after a fashion.

At least I see my attribute names in there, that makes sense.

1
2
3
class Point3D(object):
    def __init__(self, x, y, z):
        self.x

I already said I wanted an x, but now I have to assign it as an attribute...

1
2
3
class Point3D(object):
    def __init__(self, x, y, z):
        self.x = x

... to x? Uh, obviously ...

1
2
3
4
5
class Point3D(object):
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z

... and now I have to do that once for every attribute, so this actually scales poorly? I have to type every attribute name 3 times?!?

Oh well. At least I’m done now.

1
2
3
4
5
6
class Point3D(object):
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z
    def __repr__(self):

Wait what do you mean I’m not done.

1
2
3
4
5
6
7
8
class Point3D(object):
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z
    def __repr__(self):
        return (self.__class__.__name__ +
                ("(x={}, y={}, z={})".format(self.x, self.y, self.z)))

Oh come on. So I have to type every attribute name 5 times, if I want to be able to see what the heck this thing is when I’m debugging, which a tuple would have given me for free?!?!?

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
class Point3D(object):
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z
    def __repr__(self):
        return (self.__class__.__name__ +
                ("(x={}, y={}, z={})".format(self.x, self.y, self.z)))
    def __eq__(self, other):
        if not isinstance(other, self.__class__):
            return NotImplemented
        return (self.x, self.y, self.z) == (other.x, other.y, other.z)

7 times?!?!?!?

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
class Point3D(object):
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z
    def __repr__(self):
        return (self.__class__.__name__ +
                ("(x={}, y={}, z={})".format(self.x, self.y, self.z)))
    def __eq__(self, other):
        if not isinstance(other, self.__class__):
            return NotImplemented
        return (self.x, self.y, self.z) == (other.x, other.y, other.z)
    def __lt__(self, other):
        if not isinstance(other, self.__class__):
            return NotImplemented
        return (self.x, self.y, self.z) < (other.x, other.y, other.z)

9 times?!?!?!?!?

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
from functools import total_ordering
@total_ordering
class Point3D(object):
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z
    def __repr__(self):
        return (self.__class__.__name__ +
                ("(x={}, y={}, z={})".format(self.x, self.y, self.z)))
    def __eq__(self, other):
        if not isinstance(other, self.__class__):
            return NotImplemented
        return (self.x, self.y, self.z) == (other.x, other.y, other.z)
    def __lt__(self, other):
        if not isinstance(other, self.__class__):
            return NotImplemented
        return (self.x, self.y, self.z) < (other.x, other.y, other.z)

Okay, whew - 2 more lines of code isn’t great, but now at least we don’t have to define all the other comparison methods. But now we’re done, right?

1
2
from unittest import TestCase
class Point3DTests(TestCase):

You know what? I’m done. 20 lines of code so far and we don’t even have a class that does anything; the hard part of this problem was supposed to be the quaternion solver, not “make a data structure which can be printed and compared”. I’m all in on piles of undocumented garbage tuples, lists, and dictionaries it is; defining proper data structures well is way too hard in Python.2


namedtuple to the (not really) rescue

The standard library’s answer to this conundrum is namedtuple. While a valiant first draft (it bears many similarities to my own somewhat embarrassing and antiquated entry in this genre) namedtuple is unfortunately unsalvageable. It exports a huge amount of undesirable public functionality which would be a huge compatibility nightmare to maintain, and it doesn’t address half the problems that one runs into. A full enumeration of its shortcomings would be tedious, but a few of the highlights:

  • Its fields are accessable as numbered indexes whether you want them to be or not. Among other things, this means you can’t have private attributes, because they’re expoed via the apparently public __getitem__ interface.
  • It compares equal to a raw tuple of the same values, so it’s easy to get into bizarre type confusion, especially if you’re trying to use it to migrate away from using tuples and lists.
  • It’s a tuple, so it’s always immutable. Sort of.

As to that last point, either you can use it like this:

1
Point3D = namedtuple('Point3D', ['x', 'y', 'z'])

in which case it doesn’t look like a type in your code; simple syntax-analysis tools without special cases won’t recognize it as one. You can’t give it any other behaviors this way, since there’s nowhere to put a method. Not to mention the fact that you had to type the class’s name twice.

Alternately you can use inheritance and do this:

1
2
class Point3D(namedtuple('_Point3DBase', 'x y z'.split()])):
    pass

This gives you a place you can put methods, and a docstring, and generally have it look like a class, which it is... but in return you now have a weird internal name (which, by the way, is what shows up in the repr, not the class’s actual name). However, you’ve also silently made the attributes not listed here mutable, a strange side-effect of adding the class declaration; that is, unless you add __slots__ = 'x y z'.split() to the class body, and then we’re just back to typing every attribute name twice.

And this doesn’t even mention the fact that science has proven that you shouldn’t use inheritance.

So, namedtuple can be an improvement if it’s all you’ve got, but only in some cases, and it has its own weird baggage.


Enter The attr

So here’s where my favorite mandatory Python library comes in.

Let’s re-examine the problem above. How do I make Point3D with attrs?

1
2
import attr
@attr.s

Since this isn’t built into the language, we do have to have 2 lines of boilerplate to get us started: the import and the decorator saying we’re about to use it.

1
2
3
import attr
@attr.s
class Point3D(object):

Look, no inheritance! By using a class decorator, Point remains a Plain Old Python Class (albeit with some helpful double-underscore methods tacked on, as we’ll see momentarily).

1
2
3
4
import attr
@attr.s
class Point3D(object):
    x = attr.ib()

It has an attribute called x.

1
2
3
4
5
6
import attr
@attr.s
class Point3D(object):
    x = attr.ib()
    y = attr.ib()
    z = attr.ib()

And one called y and one called z and we’re done.

We’re done? Wait. What about a nice string representation?

1
2
>>> Point3D(1, 2, 3)
Point3D(x=1, y=2, z=3)

Comparison?

1
2
3
4
5
6
>>> Point3D(1, 2, 3) == Point3D(1, 2, 3)
True
>>> Point3D(3, 2, 1) == Point3D(1, 2, 3)
False
>>> Point3D(3, 2, 3) > Point3D(1, 2, 3)
True

Okay sure but what if I want to extract the data defined in explicit attributes in a format appropriate for JSON serialization?

1
2
>>> attr.asdict(Point3D(1, 2, 3))
{'y': 2, 'x': 1, 'z': 3}

Maybe that last one was a little on the nose. But nevertheless, it’s one of many things that becomes easier because attrs lets you declare the fields on your class, along with lots of potentially interesting metadata about them, and then get that metadata back out.

1
2
3
4
5
>>> import pprint
>>> pprint.pprint(attr.fields(Point3D))
(Attribute(name='x', default=NOTHING, validator=None, repr=True, cmp=True, hash=True, init=True, convert=None),
 Attribute(name='y', default=NOTHING, validator=None, repr=True, cmp=True, hash=True, init=True, convert=None),
 Attribute(name='z', default=NOTHING, validator=None, repr=True, cmp=True, hash=True, init=True, convert=None))

I am not going to dive into every interesting feature of attrs here; you can read the documentation for that. Plus, it’s well-maintained, so new goodies show up every so often and I might miss something important. But attrs does a few key things that, once you have them, you realize that Python was sorely missing before:

  1. It lets you define types concisely, as opposed to the normally quite verbose manual def __init__.... Types without typing.
  2. It lets you say what you mean directly with a declaration rather than expressing it in a roundabout imperative recipe. Instead of “I have a type, it’s called MyType, it has a constructor, in the constructor I assign the property ‘A’ to the parameter ‘A’ (and so on)”, you say “I have a type, it’s called MyType, it has an attribute called a”, and behavior is derived from that fact, rather than having to later guess about the fact by reverse engineering it from behavior (for example, running dir on an instance, or looking at self.__class__.__dict__).
  3. It provides useful default behavior, as opposed to Python’s sometimes-useful but often-backwards defaults.
  4. It adds a place for you to put a more rigorous implementation later, while starting out simple.

Let’s explore that last point.

Progressive Enhancement

While I’m not going to talk about every feature, I’d be remiss if I didn’t mention a few of them. As you can see from those mile-long repr()s for Attribute above, there are a number of interesting ones.

For example: you can validate attributes when they are passed into an @attr.s-ified class. Our Point3D, for example, should probably contain numbers. For simplicity’s sake, we could say that that means instances of float, like so:

1
2
3
4
5
6
7
import attr
from attr.validators import instance_of
@attr.s
class Point3D(object):
    x = attr.ib(validator=instance_of(float))
    y = attr.ib(validator=instance_of(int))
    z = attr.ib(validator=instance_of(int))

The fact that we were using attrs means we have a place to put this extra validation: we can just add type information to each attribute as we need it. Some of these facilities let us avoid other common mistakes. For example, this is a popular “spot the bug” Python interview question:

1
2
3
4
5
6
7
class Bag:
    def __init__(self, contents=[]):
        self._contents = contents
    def add(self, something):
        self._contents.append(something)
    def get(self):
        return self.contents[:]

Fixing it, of course, becomes this:

1
2
3
4
5
class Bag:
    def __init__(self, contents=None):
        if contents is None:
            contents = []
        self._contents = contents

adding two extra lines of code.

contents inadvertently becomes a global varible here, making all Bag objects not provided with a different list share the same list. With attrs this instead becomes:

1
2
3
4
5
6
7
@attr.s
class Bag:
    _contents = attr.ib(default=attr.Factory(list))
    def add(self, something):
        self._contents.append(something)
    def get(self):
        return self.contents[:]

There are several other features that attrs provides you with opportunities to make your classes both more convenient and more correct. Another great example? If you want to be strict about extraneous attributes on your objects (or more memory-efficient on CPython), you can just pass slots=True at the class level - e.g. @attr.s(slots=True) - to automatically turn your existing attrs declarations a matching __slots__ attribute. All of these handy features allow you to make better and more powerful use of your attr.ib() declarations.


The Python Of The Future

Some people are excited about eventually being able to program in Python 3 everywhere. What I’m looking forward to is being able to program in Python-with-attrs everywhere. It exerts a subtle, but positive, design influence in all the codebases I’ve see it used in.

Give it a try: you may find yourself surprised at places where you’ll now use a tidily explained class, where previously you might have used a sparsely-documented tuple, list, or a dict, and endure the occasional confusion from co-maintainers. Now that it’s so easy to have structured types that clearly point in the direction of their purpose (in their __repr__, in their __doc__, or even just in the names of their attributes), you might find you’ll use a lot more of them. Your code will be better for it; I know mine has been.


  1. Scare quotes here because the attributes aren’t meaningfully exposed to the caller, they’re just named publicly. This pattern, getting rid of private methods entirely and having only private attributes, probably deserves its own post... 

  2. And we hadn’t even gotten to the really exciting stuff yet: type validation on construction, default mutable values... 

Syndicated 2016-08-12 09:47:00 from Deciphering Glyph

Remember that thing I said in my pycon talk about native packaging being the main thing to worry about, and single-file binaries being at best a stepping stone to that and at worst a bit of a red herring? You don’t have to take it from me. From the authors of a widely-distributed command-line application that was rewritten from Python into Go specifically for easier distribution, and then rewritten in Python:

... [the] majority of people prefer native packages so distributing precompiled binaries wasn’t a big win for this type of project1 ...

I don’t want to say “I told you so”, but... no. Wait a second. That is exactly what I want to do. That is what I am doing.

I told you so.

Syndicated 2016-08-12 03:37:00 from Deciphering Glyph

Hello lazyweb,

I want to run some “legacy” software (Trac, specifically) on a Swarm cluster. The files that it needs to store are mostly effectively write-once (it’s the attachments database) but may need to be deleted (spammers and purveyors of malware occasionally try to upload things for spamming or C&C) so while mutability is necessary, there’s a very low risk of any write contention.

I can’t use a networked filesystem, or any volume drivers, so no easy-mode solutions. Basically I want to be able to deploy this on any swarm cluster, so no cheating and fiddling with the host.

Is there any software that I can easily run as a daemon that runs in the background, synchronizing the directory of a data volume between N containers where N is the number of hosts in my cluster?

I found this but it strikes me as ... somehow wrong ... to use that as a critical part of my data-center infrastructure. Maybe it would actually be a good start? But in addition to not being really designed for this task, it’s also not open source, which makes me a little nervous. This list, or even this smaller one is huge and bewildering. So I was hoping you could drop me a line if you’ve got an idea what I could use for this.

Syndicated 2016-08-11 05:28:00 from Deciphering Glyph

I like keeping a comprehensive an accurate addressbook that includes all past email addresses for my contacts, including those which are no longer valid. I do this because I want to be able to see conversations stretching back over the years as originating from that person.

Unfortunately this causes problems when sending mail sometimes. On macOS, at least as of El Capitan, neither the Mail application nor the Contacts application have any mechanism for indicating preference-order of email addresses that I’ve been able to find. Compounding this annoyance, when completing a recipient’s address based on their name, it displays all email addresses for a contact without showing their label, which means even if I label one “preferred” or “USE THIS ONE NOW”, or “zzz don’t use this hasn’t worked since 2005”, I can’t tell when I’m sending a message.

But it seems as though it defaults to sending messages to the most recent outgoing address for that contact that it can see in an email. For people I send email to regularly to this is easy enough. For people who I’m aware have changed their email address, but where I don’t actually want to send them a message, I think I figured out a little hack that makes it work: make a new folder called “Preferred Addresses Hack” (or something suitably), compose a new message addressed to the correct address, then drag the message out of drafts into the folder; since it has a recent date and is addressed to the right person, Mail.app will index it and auto-complete the correct address in the future.

However, since the previous behavior appeared somewhat non-deterministic, I might be tricking myself into believing that this hack worked. If you can confirm it, I’d appreciate it if you would let me know.

Syndicated 2016-08-08 23:24:00 from Deciphering Glyph

Don’t Trust Sourceforge, Ever

If you use a computer and you use the Internet, chances are you’ll eventually find some software that, for whatever reason, is still hosted on Sourceforge. In case you’re not familiar with it, Sourceforge is a publicly-available malware vector that also sometimes contains useful open source binary downloads, especially for Windows.


In addition to injecting malware into their downloads (a practice they claim, hopefully truthfully, to have stopped), Sourceforge also presents an initial download page over HTTPS, then redirects the user to HTTP for the download itself, snatching defeat from the jaws of victory. This is fantastically irresponsible, especially for a site offering un-sandboxed binaries for download, especially in the era of Let’s Encrypt where getting a TLS certificate takes approximately thirty seconds and exactly zero dollars.

So: if you can possibly find your downloads anywhere else, go there.


But, rarely, you will find yourself at the mercy of whatever responsible stewards1 are still operating Sourceforge if you want to get access to some useful software. As it happens, there is a loophole that will let you authenticate the binaries that you download from them so you won’t be left vulnerable to an evil barista: their “file release system”, the thing you use to upload your projects, will allow you to download other projects as well.

To use it, first, make yourself a sourceforge account. You may need to create a dummy project as well. Sourceforge maintains an HTTPS-accessible list of key fingerprints for all the SSH servers that they operate, so you can verify the public key below.

Then you’ll need to connect to their upload server over SFTP, and go to the path /home/frs/project/<the project’s name>/.../ to get the file.

I have written a little Python script2 that automates the translation of a Sourceforge file-browser download URL, one that you can get if you right-click on a download in the “files” section of a project’s website, and runs the relevant scp command to retrieve the file for you. This isn’t on PyPI or anything, and I’m not putting any effort into polishing it further; the best possible outcome of this blog post is that it immediately stops being necessary.


  1. Are you one of those people? I would prefer to be lauding your legacy of decades of valuable contributions to the open source community instead of ridiculing your dangerous incompetence, but repeated bug reports and support emails have gone unanswered. Please get in touch so we can discuss this. 

  2. Code:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    #!/usr/bin/env python2
    
    import sys
    import os
    
    sfuri = sys.argv[1]
    
    # for example,
    # http://sourceforge.net/projects/refind/files/0.9.2/refind-bin-0.9.2.zip/download
    
    import re
    matched = re.match(
        r"https://sourceforge.net/projects/(.*)/files/(.*)/download",
        sfuri
    )
    
    if not matched:
        sys.stderr.write("Not a SourceForge download link.\n")
        sys.exit(1)
    
    project, path = matched.groups()
    
    sftppath = "/home/frs/project/{project}/{path}".format(project=project, path=path)
    
    def knows_about_web_sf_net():
        with open(
                os.path.expanduser("~/.ssh/known_hosts"), "rb"
        ) as read_known_hosts:
            data = read_known_hosts.read().split("\n")
            for line in data:
                if 'web.sourceforge.net' in line.split()[0]:
                    return True
        return False
    
    sfkey = """
    web.sourceforge.net ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA2uifHZbNexw6cXbyg1JnzDitL5VhYs0E65Hk/tLAPmcmm5GuiGeUoI/B0eUSNFsbqzwgwrttjnzKMKiGLN5CWVmlN1IXGGAfLYsQwK6wAu7kYFzkqP4jcwc5Jr9UPRpJdYIK733tSEmzab4qc5Oq8izKQKIaxXNe7FgmL15HjSpatFt9w/ot/CHS78FUAr3j3RwekHCm/jhPeqhlMAgC+jUgNJbFt3DlhDaRMa0NYamVzmX8D47rtmBbEDU3ld6AezWBPUR5Lh7ODOwlfVI58NAf/aYNlmvl2TZiauBCTa7OPYSyXJnIPbQXg6YQlDknNCr0K769EjeIlAfY87Z4tw==
    """
    
    if not knows_about_web_sf_net():
        with open(
                os.path.expanduser("~/.ssh/known_hosts"), "ab"
        ) as append_known_hosts:
            append_known_hosts.write(sfkey)
    cmd = "scp web.sourceforge.net:{sftppath} .".format(sftppath=sftppath)
    print(cmd)
    os.system(cmd)
    

Syndicated 2016-07-29 06:06:00 from Deciphering Glyph

Letters To The Editor: Re: Email

Since I removed comments from this blog, I’ve been asking y’all to email me when you have feedback, with the promise that I’d publish the good bits. Today I’m making good on that for the first time, with this lovely missive from Adam Doherty:


I just wanted to say thank you. As someone who is never able to say no, your article on email struck a chord with me. I have had Gmail since the beginning, since the days of hoping for an invitation. And the day I received my invitation was the the last day my inbox was ever empty.

Prior to reading your article I had over 40,000 unread messages. It used to be a sort of running joke; I never delete anything. Realistically though was I ever going to do anything with them?

With 40,000 unread messages in your inbox, you start to miss messages that are actually important. Messages that must become tasks, tasks that must be completed.

Last night I took your advice; and that is saying something - most of the things I read via HN are just noise. This however spoke to me directly.

I archived everything older than two weeks, was down to 477 messages and kept pruning. So much of the email we get on a daily basis is also noise. Those messages took me half a second to hit archive and move on.

I went to bed with zero messages in my inbox, woke up with 21, archived 19, actioned 2 and then archived those.

Seriously, thank you so very much. I am unburdened.


First, I’d like to thank Adam for writing in. I really do appreciate the feedback.

Second, I wanted to post this here not in service of showcasing my awesomeness1, but rather to demonstrate that getting to the bottom of your email can have a profound effect on your state of mind. Even if it’s a running joke, even if you don’t think it’s stressing you out, there’s a good chance that, somewhere in the back of your mind, it is. After all, if you really don’t care, what’s stopping you from hitting select all / archive right now?

At the very least, if you did that, your mail app would load faster.


  1. although, let there be no doubt, I am awesome 

Syndicated 2016-05-03 06:06:00 from Deciphering Glyph

Email Isn’t The Thing You’re Bad At

I’ve been using the Internet for a good 25 years now, and I’ve been lucky enough to have some perspective dating back farther than that. The common refrain for my entire tenure here:

We all get too much email.

A New, New, New, New Hope

Luckily, something is always on the cusp of replacing email. AOL instant messenger will totally replace it. Then it was blogging. RSS. MySpace. Then it was FriendFeed. Then Twitter. Then Facebook.

Today, it’s in vogue to talk about how Slack is going to replace email. As someone who has seen this play out a dozen times now, let me give you a little spoiler:

Slack is not going to replace email.

But Slack isn’t the problem here, either. It’s just another communication tool.

The problem of email overload is both ancient and persistent. If the problem were really with “email”, then, presumably, one of the nine million email apps that dot the app-stores like mushrooms sprouting from a globe-spanning mycelium would have just solved it by now, and we could all move on with our lives. Instead, it is permanently in vogue1 to talk about how overloaded we all are.

If not email, then what?

If you have twenty-four thousand unread emails in your Inbox, like some kind of goddamn animal, what you’re bad at is not email, it’s transactional interactions.

Different communication media have different characteristics, but the defining characteristic of email is that it is the primary mode of communication that we use, both professionally and personally, when we are asking someone else to perform a task.

Of course you might use any form of communication to communicate tasks to another person. But other forms - especially the currently popular real-time methods - appear as a bi-directional communication, and are largely immutable. Email’s distinguishing characteristic is that it is discrete; each message is its own entity with its own ID. Emails may also be annotated, whether with flags, replied-to markers, labels, placement in folders, archiving, or deleting. Contrast this with a group chat in IRC, iMessage, or Slack, where the log is mostly2 unchangeable, and the only available annotation is “did your scrollbar ever move down past this point”; each individual message has only one bit of associated information. Unless you have catlike reflexes and an unbelievably obsessive-compulsive personality, it is highly unlikely that you will carefully set the “read” flag on each and every message in an extended conversation.

All this makes email much more suitable for communicating a task, because the recipient can file it according to their system for tracking tasks, come back to it later, and generally treat the message itself as an artifact. By contrast if I were to just walk up to you on the street and say “hey can you do this for me”, you will almost certainly just forget.

The word “task” might seem heavy-weight for some of the things that email is used for, but tasks come in all sizes. One task might be “click this link to confirm your sign-up on this website”. Another might be “choose a time to get together for coffee”. Or “please pass along my resume to your hiring department”. Yet another might be “send me the final draft of the Henderson report”.

Email is also used for conveying information: here are the minutes from that meeting we were just in. Here is transcription of the whiteboard from that design session. Here are some photos from our family vacation. But even in these cases, a task is implied: read these minutes and see if they’re accurate; inspect this diagram and use it to inform your design; look at these photos and just enjoy them.

So here’s the thing that you’re bad at, which is why none of the fifty different email apps you’ve bought for your phone have fixed the problem: when you get these messages, you aren’t making a conscious decision about:

  1. how important the message is to you
  2. whether you want to act on them at all
  3. when you want to act on them
  4. what exact action you want to take
  5. what the consequences of taking or not taking that action will be

This means that when someone asks you to do a thing, you probably aren’t going to do it. You’re going to pretend to commit to it, and then you’re going to flake out when push comes to shove. You’re going to keep context-switching until all the deadlines have passed.

In other words:

The thing you are bad at is saying ‘no’ to people.

Sometimes it’s not obvious that what you’re doing is saying ‘no’. For many of us — and I certainly fall into this category — a lot of the messages we get are vaguely informational. They’re from random project mailing lists, perhaps they’re discussions between other people, and it’s unclear what we should do about them (or if we should do anything at all). We hang on to them (piling up in our Inboxes) because they might be relevant in the future. I am not advocating that you have to reply to every dumb mailing list email with a 5-part action plan and a Scrum meeting invite: that would be a disaster. You don’t have time for that. You really shouldn’t have time for that.

The trick about getting to Inbox Zero3 is not in somehow becoming an email-reading machine, but in realizing that most email is worthless, and that’s OK. If you’re not going to do anything with it, just archive it and forget about it. If you’re subscribed to a mailing list where only 1 out of 1000 messages actually represents something you should do about it, archive all the rest after only answering the question “is this the one I should do something about?”. You can answer that question after just glancing at the subject; there are times when checking my email I will be hitting “archive” with a 1-second frequency. If you are on a list where zero messages are ever interesting enough to read in their entirety or do anything about, then of course you should unsubscribe.

Once you’ve dug yourself into a hole with thousands of “I don’t know what I should do with this” messages, it’s time to declare email bankruptcy. If you have 24,000 messages in your Inbox, let me be real with you: you are never, ever going to answer all those messages. You do not need a smartwatch to tell you exactly how many messages you are never going to reply to.

We’re In This Together, Me Especially

A lot of guidance about what to do with your email addresses email overload as a personal problem. Over the years of developing my tips and tricks for dealing with it, I certainly saw it that way. But lately, I’m starting to see that it has pernicious social effects.

If you have 24,000 messages in your Inbox, that means you aren’t keeping track or setting priorities on which tasks you want to complete. But just because you’re not setting those priorities, that doesn’t mean nobody is. It means you are letting availability heuristic - whatever is “latest and loudest” - govern access to your attention, and therefore your time. By doing this, you are rewarding people (or #brands) who contact you repeatedly, over inappropriate channels, and generally try to flood your attention with their priorities instead of your own. This, in turn, creates a culture where it is considered reasonable and appropriate to assume that you need to do that in order to get someone’s attention.

Since we live in the era of subtext and implication, I should explicitly say that I’m not describing any specific work environment or community. I used to have an email startup, and so I thought about this stuff very heavily for almost a decade. I have seen email habits at dozens of companies, and I help people in the open source community with their email on a regular basis. So I’m not throwing shade: almost everybody is terrible at this.

And that is the one way that email, in the sense of the tools and programs we use to process it, is at fault: technology has made it easier and easier to ask people to do more and more things, without giving us better tools or training to deal with the increasingly huge array of demands on our time. It’s easier than ever to say “hey could you do this for me” and harder than ever to just say “no, too busy”.

Mostly, though, I want you to know that this isn’t just about you any more. It’s about someone much more important than you: me. I’m tired of sending reply after reply to people asking to “just circle back” or asking if I’ve seen their email. Yes, I’ve seen your email. I have a long backlog of tasks, and, like anyone, I have trouble managing them and getting them all done4, and I frequently have to decide that certain things are just not important enough to do. Sometimes it takes me a couple of weeks to get to a message. Sometimes I never do. But, it’s impossible to be mad at somebody for “just checking in” for the fourth time when this is probably the only possible way they ever manage to get anyone else to do anything.

I don’t want to end on a downer here, though. And I don’t have a book to sell you which will solve all your productivity problems. I know that if I lay out some incredibly elaborate system all at once, it’ll seem overwhelming. I know that if I point you at some amazing gadget that helps you keep track of what you want to do, you’ll either balk at the price or get lost fiddling with all its knobs and buttons and not getting a lot of benefit out of it. So if I’m describing a problem that you have here, here’s what I want you to do.

Step zero is setting aside some time. This will probably take you a few hours, but trust me; they will be well-spent.

Email Bankruptcy

First, you need to declare email bankruptcy. Select every message in your Inbox older than 2 weeks. Archive them all, right now. In the past, you might have to worry about deleting those messages, but modern email systems pretty much universally have more storage than you’ll ever need. So rest assured that if you actually need to do anything with these messages, they’ll all be in your archive. But anything in your Inbox right now older than a couple of weeks is just never going to get dealt with, and it’s time to accept that fact. Again, this part of the process is not about making a decision yet, it’s just about accepting a reality.

Mailbox Three

One extra tweak I would suggest here is to get rid of all of your email folders and filters. It seems like many folks with big email problems have tried to address this by ever-finer-grained classification of messages, ever more byzantine email rules. At least, it’s common for me, when looking over someone’s shoulder to see 24,000 messages, it’s common to also see 50 folders. Probably these aren’t helping you very much.

In older email systems, it was necessary to construct elaborate header-based filtering systems so that you can later identify those messages in certain specific ways, like “message X went to this mailing list”. However, this was an incomplete hack, a workaround for a missing feature. Almost all modern email clients (and if yours doesn’t do this, switch) allow you to locate messages like this via search.

Your mail system ought to have 3 folders:

  1. Inbox, which you process to discover tasks,
  2. Drafts, which you use to save progress on replies, and
  3. Archive, the folder which you access only by searching for information you need when performing a task.

Getting rid of unnecessary folders and queries and filter rules will remove things that you can fiddle with.

Moving individual units of trash between different heaps of trash is not being productive, and by removing all the different folders you can shuffle your messages into before actually acting upon them you will make better use of your time spent looking at your email client.

There’s one exception to this rule, which is filters that do nothing but cause a message to skip your Inbox and go straight to the archive. The reason that this type of filter is different is that there are certain sources or patterns of message which are not actionable, but rather, a useful source of reference material that is only available as a stream of emails. Messages like that should, indeed, not show up in your Inbox. But, there’s no reason to file them into a specific folder or set of folders; you can always find them with a search.

Make A Place For Tasks

Next, you need to get a task list. Your email is not a task list; tasks are things that you decided you’re going to do, not things that other people have asked you to do5. Critically, you are going to need to parse e-mails into tasks. To explain why, let’s have a little arithmetic aside.

Let’s say it only takes you 45 seconds to go from reading a message to deciding what it really means you should do; so, it only takes 20 seconds to go from looking at the message to remembering what you need to do about it. This means that by the time you get to 180 un-processed messages that you need to do something about in your Inbox, you’ll be spending an hour a day doing nothing but remembering what those messages mean, before you do anything related to actually living your life, even including checking for new messages.

What should you use for the task list? On some level, this doesn’t really matter. It only needs one really important property: you need to trust that if you put something onto it, you’ll see it at the appropriate time. How exactly that works depends heavily on your own personal relationship with your computers and devices; it might just be a physical piece of paper. But for most of us living in a multi-device world, something that synchronizes to some kind of cloud service is important, so Wunderlist or Remember the Milk are good places to start, with free accounts.

Turn Messages Into Tasks

The next step - and this is really the first day of the rest of your life - start at the oldest message in your Inbox, and work forward in time. Look at only one message at a time. Decide whether this message is a meaningful task that you should accomplish.

If you decide a message represents a task, then make a new task on your task list. Decide what the task actually is, and describe it in words; don’t create tasks like “answer this message”. Why do you need to answer it? Do you need to gather any information first?

If you need to access information from the message in order to accomplish the task, then be sure to note in your task how to get back to the email. Depending on what your mail client is, it may be easier or harder to do this6, but in the worst case, following the guidelines above about eliminating unnecessary folders and filing in your email client, just put a hint into your task list about how to search for the message in question unambiguously.

Once you’ve done that:

Archive the message immediately.

The record that you need to do something about the message now lives in your task list, not your email client. You’ve processed it, and so it should no longer remain in your inbox.

If you decide a message doesn’t represent a task, then:

Archive the message immediately.

Do not move on to the next message until you have archived this message. Do not look ahead7. The presence of a message in your Inbox means you need to make a decision about it. Follow the touch-move rule with your email. If you skip over messages habitually and decide you’ll “just get back to it in a minute”, that minute will turn into 4 months and you’ll be right back where you were before.

Circling back to the subject of this post; once again, this isn’t really specific to email. You should follow roughly the same workflow when someone asks you to do a task in a meeting, or in Slack, or on your Discourse board, or wherever, if you think that the task is actually important enough to do. Note the slack timestamp and a snippet of the message so you can search for it again, if there is a relevant attachment. The thing that makes email different is really just the presence of an email box.

Banish The Blue Dot

Almost all email clients have a way of tracking “unread” messages; they cheerfully display counters of them. Ignore this information; it is useless. Messages have two states: in your inbox (unprocessed) and in your archive (processed). “Read” vs. “Unread” can be, at best, of minimal utility when resuming an interrupted scanning session. But, you are always only ever looking at the oldest message first, right? So none of the messages below it should be unread anyway...

Be Ruthless

As you try to start translating your flood of inbound communications into an actionable set of tasks you can actually accomplish, you are going to notice that your task list is going to grow and grow just as your Inbox was before. This is the hardest step:

Decide you are not going to do those tasks, and simply delete them. Sometimes, a task’s entire life-cycle is to be created from an email, exist for ten minutes, and then have you come back to look at it and then delete it. This might feel pointless, but in going through that process, you are learning something extremely valuable: you are learning what sorts of things are not actually important enough to do you do.

If every single message you get from some automated system provokes this kind of reaction, that will give you a clue that said system is wasting your time, and just making you feel anxious about work you’re never really going to get to, which can then lead to you un-subscribing or filtering messages from that system.

Tasks Before Messages

To thine own self, not thy Inbox, be true.

Try to start your day by looking at the things you’ve consciously decided to do. Don’t look at your email, don’t look at Slack; look at your calendar, and look at your task list.

One of those tasks, probably, is a daily reminder to “check your email”, but that reminder is there more to remind you to only do it once than to prevent you from forgetting.

I say “try” because this part is always going to be a challenge; while I mentioned earlier that you don’t want to unthinkingly give in to availability heuristic, you also have to acknowledge that the reason it’s called a “cognitive bias” is because it’s part of human cognition. There will always be a constant anxious temptation to just check for new stuff; for those of us who have a predisposition towards excessive scanning behavior have it more than others.

Why Email?

We all need to make commitments in our daily lives. We need to do things for other people. And when we make a commitment, we want to be telling the truth. I want you to try to do all these things so you can be better at that. It’s impossible to truthfully make a commitment to spend some time to perform some task in the future if, realistically, you know that all your time in the future will be consumed by whatever the top 3 highest-priority angry voicemails you have on that day are.

Email is a challenging social problem, but I am tired of email, especially the user interface of email applications, getting the blame for what is, at its heart, a problem of interpersonal relations. It’s like noticing that you get a lot of bills through the mail, and then blaming the state of your finances on the colors of the paint in your apartment building’s mail room. Of course, the UI of an email app can encourage good or bad habits, but Gmail gave us a prominent “Archive” button a decade ago, and we still have all the same terrible habits that were plaguing Outlook users in the 90s.

Of course, there’s a lot more to “productivity” than just making a list of the things you’re going to do. Some tools can really help you manage that list a lot better. But all they can help you to do is to stop working on the wrong things, and start working on the right ones. Actually being more productive, in the sense of getting more units of work out of a day, is something you get from keeping yourself healthy, happy, and well-rested, not from an email filing system.

You can’t violate causality to put more hours into the day, and as a frail and finite human being, there’s only so much work you can reasonably squeeze in before you die.

The reason I care a lot about salvaging email specifically is that it remains the best medium for communication that allows you to be in control of your own time, and by extension, the best medium for allowing people to do creative work.

Asking someone to do something via SMS doesn’t scale; if you have hundreds of unread texts there’s no way to put them in order, no way to classify them as “finished” and “not finished”, so you need to keep it to the number of things you can fit in short term memory. Not to mention the fact that text messaging is almost by definition an interruption - by default, it causes a device in someone’s pocket to buzz. Asking someone to do something in group chat, such as IRC or Slack, is similarly time-dependent; if they are around, it becomes an interruption, and if they’re not around, you have to keep asking and asking over and over again, which makes it really inefficient for the asker (or the asker can use a @highlight, and assume that Slack will send the recipient, guess what, an email).

Social media often comes up as another possible replacement for email, but its sort order is even worse than “only the most recent and most frequently repeated”. Messages are instead sorted by value to advertisers or likeliness to increase ‘engagement’”, i.e. most likely to keep you looking at this social media site rather than doing any real work.

For those of us who require long stretches of uninterrupted time to produce something good – “creatives”, or whatever today’s awkward buzzword for intersection of writers, programmers, graphic designers, illustrators, and so on, is – we need an inbound task queue that we can have some level of control over. Something that we can check at a time of our choosing, something that we can apply filtering to in order to protect access to our attention, something that maintains the chain of request/reply for reference when we have to pick up a thread we’ve had to let go of for a while. Some way to be in touch with our customers, our users, and our fans, without being constantly interrupted. Because if we don’t give those who need to communicate with such a tool, they’ll just blast @everyone messages into our slack channels and @mentions onto Twitter and texting us Hey, got a minute? until we have to quit everything to try and get some work done.

Questions about this post?

Go ahead and send me an email.


Acknowledgements

As always, any errors or bad ideas are certainly my own.

First of all, Merlin Mann, whose writing and podcasting were the inspiration, direct or indirect, for many of my thoughts on this subject; and who sets a good example because he won’t answer your email.

Thanks also to David Reid for introducing me to Merlin's work, as well as Alex Gaynor, Tristan Seligmann, Donald Stufft, Cory Benfield, Piët Delport, Amber Brown, and Ashwini Oruganti for feedback on drafts.


  1. Email is so culturally pervasive that it is literally in Vogue, although in fairness this is not a reference to the overflowing-Inbox problem that I’m discussing here. 

  2. I find the “edit” function in Slack maddening; although I appreciate why it was added, it’s easy to retroactively completely change the meaning of an entire conversation in ways that make it very confusing for those reading later. You don’t even have to do this intentionally; sometimes you make a legitimate mistake, like forgetting the word “not”, and the next 5 or 6 messages are about resolving that confusion; then, you go back and edit, and it looks like your colleagues correcting you are a pedantic version of Mr. Magoo, unable to see that you were correct the first time. 

  3. There, I said it. Are you happy now? 

  4. Just to clarify: nothing in this post should be construed as me berating you for not getting more work done, or for ever failing to meet any commitment no matter how casual. Quite the opposite: what I’m saying you need to do is acknowledge that you’re going to screw up and rather than hold a thousand emails in your inbox in the vain hope that you won’t, just send a quick apology and move on. 

  5. Maybe you decided to do the thing because your boss asked you to do it and failing to do it would cost you your job, but nevertheless, that is a conscious decision that you are making; not everybody gets to have “boss” priority, and unless your job is a true Orwellian nightmare, not everything your boss says in email is an instant career-ending catastrophe. 

  6. In Gmail, you can usually just copy a link to the message itself. If you’re using OS X’s Mail.app, you can use this Python script to generate links that, when clicked, will open the Mail app:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    from __future__ import (print_function, unicode_literals,
                            absolute_import, division)
    
    from ScriptingBridge import SBApplication
    import urllib
    
    mail = SBApplication.applicationWithBundleIdentifier_("com.apple.mail")
    
    for viewer in mail.messageViewers():
        for message in viewer.selectedMessages():
            for header in message.headers():
                name = header.name()
                if name.lower() == "message-id":
                    content = header.content()
                    print("message:" + urllib.quote(content))
    

    You can then paste these links into just about any task tracker; if they don’t become clickable, you can paste them into Safari’s URL bar or pass them to the open command-line tool. 

  7. The one exception here is that you can look ahead in the same thread to see if someone has already replied. 

Syndicated 2016-04-24 23:54:00 from Deciphering Glyph

Far too many things can stop the BLOB

It occurs to me that the lack of a standard, well-supported, memory-efficient interface for BLOBs in multiple programming languages is one of the primary driving factors of poor scalability characteristics of open source SaaS applications.

Applications like Gitlab, Redmine, Trac, Wordpress, and so on, all need to store potentially large files (“attachments”). Frequently, they elect to store these attachments (at least by default) in a dedicated filesystem directory. This leads to a number of tricky concurrency issues, as the filesystem has different (and divorced) concurrency semantics from the backend database, and resides only on the individual API nodes, rather than in the shared namespace of the attached database.

Some databases do support writing to BLOBs like files. Postgres, SQLite, and Oracle do, although it seems MySQL lags behind in this area (although I’d love to be corrected on this front). But many higher-level API bindings for these databases don’t expose support for BLOBs in an efficient way.

Directly using the filesystem, as opposed to a backing service, breaks the “expected” scaling behavior of the front-end portion of a web application. Using an object store, like Cloud Files or S3, is a good option to achieve high scalability for public-facing applications, but they creates additional deployment complexity.

So, as both a plea to others and a note to myself: if you’re writing a database-backed application that needs to store some data, please consider making “store it in the database as BLOBs” an option. And if your particular database client library doesn’t support it, consider filing a bug.

Syndicated 2016-04-20 01:01:00 from Deciphering Glyph

I think I’m using GitHub wrong.

I use a hodgepodge of https: and : (i.e. “ssh”) URL schemes for my local clones; sometimes I have a remote called “github” and sometimes I have one called “origin”. Sometimes I clone from a fork I made and sometimes I clone from the upstream.

I think the right way to use GitHub would instead be to always fork first, make my remote always be “origin”, and consistently name the upstream remote “upstream”. The problem with this, though, is that forks rapidly fall out of date, and I often want to automatically synchronize all the upstream branches.

Is there a script or a github option or something to synchronize a fork with upstream automatically, including all its branches and tags? I know there’s no comment field, but you can email me or reply on twitter.

Syndicated 2016-04-13 21:11:00 from Deciphering Glyph

Monads are simple to understand.

You can just think of them like a fleet of mysterious inverted pyramids ominously hovering over a landscape dotted with the tombs of ancient and terrible gods. Tombs from which they may awake at any moment if they are “evaluated”.

The IO loop is then the malevolent personification force of entropy, causing every action we take to push the universe further into the depths of uncontrolled chaos.

Simple!

Syndicated 2016-02-16 00:36:00 from Deciphering Glyph

23 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!