Older blog entries for raz (starting at number 2)

A defensive strategy for accepting email over IPv6

Accepting email over IPv6 risks providing spammers with an easy entrance point because IP-address blocklisting is not likely to be viable for an address space as large as IPv6′s. The need to continue to accept email over IPv4 for the indefinite future provides a useful safety valve in that a receiver can push messages offered over IPv6 whose validity is uncertain back to the existing IPv4 service, thereby reducing the dependence upon – or even eliminating the need for – IPv6-address blocklists.

To take advantage of this a receiver needs whitelists (manually maintained, automatically generated, user addressbooks, provided by a reputation data provider, …) and the ability to test and act on domain authentication (SPF, DKIM, DMARC, …) during the SMTP conversation. Any message failing authentication, or passing authentication but not matching a whitelist, need merely be given a
temporary failure (4xx) response code. A well-behaved MTA (e.g. non-spammer) receiving 4xx responses will work through the receiver’s listed MXs until it finds one that gives an authoritative (2xx/5xx) response.

The argument that email receivers will need to accept email over IPv4 for the indefinite future is well-known and almost certainly correct, however organisations may find themselves wanting to accept email over IPv6 as well for at least two reasons:

  • The desire to pilot, experiment with or research acceptance of email over IPv6.
  • An externally imposed mandate that IPv6 be deployed for “all applications”.

The approach described here can be used in two different ways:

  • A defensive deployment from the outset for those who wish to get something working, but would prefer to deal up front with the risk of spammers exploiting the difficulties of IPv6-address blocklisting.
  • A fallback option for those who are willing to deploy without solving this problem, but wish to have a documented strategy for dealing with this problem when/if it arises.

In either case the benefit is the same: a production-use-ready approach for accepting at least some email over IPv6 with a safe fallback to IPv4 for the rest.

Ideally all of the relevant authentication mechanisms (SPF, DKIM and DMARC) can be processed and acted on during the SMTP transaction, but this approach can be adopted even if this is only true for SPF; the result will simply be that some of the email that could have been accepted over IPv6 will instead be pushed to IPv4.

Most types of whitelist data can be applied:

  • IPv6 address whitelists can be used as is.
    • A locally-maintained list of IPv6 addresses of mail-servers of trusted partners.
    • IPv6-address whitelists supplied by reputation data providers.
  • Domain whitelists can be used in conjunction with domain authentication (SPF (perhaps subject to DMARC’s alignment rules), DKIM, last-resort SPF data from a reputation data provider, …)
    • A locally-maintained list of domains of trusted partners.
    • A domain whitelist from a reputation data provider
  • In situations where end-user addressbooks are accessible during the SMTP conversation, the presence of a sender in the recipient’s addressbook can be treated as a whitelist match
    (subject to authentication checks as above)
    • For webmail providers this is pretty much a given
    • For others this is sometimes available from existing mail-server software, in other cases software can be used to automatically gather this data locally.

In general, content-based anti-spam filters need not be used for messages which have passed any of the above. A particular exception is malware checking: clearly, it is not desirable to deliver malware even if it’s from a source that’s known to behave well, e.g. because someone’s PC has become infected and is emailing exploits or phish to each of the user’s contacts.

Weaker signals might also be used to decide to accept a message subject to content-based anti-spam filters not detecting a problem. These include:

  • The existence of an rDNS entry for the source IP address, the existence of a matching forward DNS entry and the use by the connecting MTA of the same name in the HELO/EHLO string.
  • The connection originating from an AS, or a network within one, known to be particularly stringent in its containment of abuse. To avoid confusion, I’ll use the term
    “greenlisting” to refer to the listing of IPv6 addresses or networks as being allowed to connect but still subject to content-based filtering.
  • The RFC5322.From domain name being registered with a registrar
    known to be particularly stringent in de-registering abusers. This would of course have to be done in conjunction with domain authentication as above. (This is also somewhat hypothetical, I’m not sure that any registrar is currently strict enough for this purpose.)
  • Even without a domain whitelist entry, the historical behaviour of the RFC5322.From domain in sending mail to the receiver’s IPv4 service. Again, this would have to be done in conjunction with domain authentication.
  • The presence of well-formed, non-anonymised whois information for the RFC5322.From domain and/or the source IP address block.

These are all a little less robust than competent whitelisting, and may have to be tried on a “sacrificial lamb” basis, however as with the broad strategy of building on an IPv4 fallback, this is easier and safer to do than it was in an IPv4-only universe.

Astute readers will notice that what I am describing is an implementation of the Aspen Framework that Meng Wong described in his Sender Authentication Whitepaper 8 years (!) ago. I’d suggest that:

  • The concern about the infeasibility of IPv6-address blocklists and the certain availability of the IPv4 fallback for the indefinite future provides an opportunity to implement this approach for IPv6 receivers that never existed in an IPv4-only environment.

  • The period of time that this has taken should be a strong warning to people who blithely assume that email can simply be moved to IPv6 by mandate. Email is an unusually tough problem, progress is slow.

  • That things move so slowly makes incremental approaches like the one described here more valuable than they might otherwise be. (There’s little point piloting a partial approach that will be rendered obsolete when the “complete” approach arrives 6 months later. If you assume that a complete approach is many years away, then there is more to gain from the deployment of partial approaches.)

It is conceivable that this will eventually be the beginning of a migration strategy, that over time so much email will be able to be accepted on a “we know something good about this message” (rather than a “we know nothing bad about this
message” basis) that it will become viable to reject outright any email about which nothing good is known. I don’t actually expect that this will be the case, but also suspect that so much will change during the parallel running of delivery-to-MX over IPv4 and IPv6 that it’s not practical to predict how delivery-to-MX over IPv4 might be phased out. The important observation would appear to be that this approach provides a production-use-ready way to start.

Additional thoughts:

  • There is a legitimate concern about the additional workload that this will create – both for receivers and legitimate senders – in causing duplicate delivery of some/most/all legitimate email. I’d suggest that for early adopters this will not be a great concern, particularly while the total volume of email-over-IPv6 is small.

    • If many receivers adopt this approach when piloting accepting-over-IPv6 then the incentive to spammers to move to IPv6 will be greatly diminished in the first place, thus cutting much of the duplicate workload for receivers who senders can see are doing this. (This effect seems unlikely to be large enough to render the infeasibility of IPv6-address blocklists moot, but it would be a great side-effect!)

    • Early adopter senders are more likely to adopt full authentication anyway, however insufficient whitelisting may make encountering large numbers of receivers who push traffic to IPv4 cause costs that senders aren’t willing to incur. I’d
      suggest that operational experience will tell us how this plays out and that senders and receivers will be in a better position to work out what to do about this when/if there’s enough traffic for it to be an actual problem.

    • This problem is likely to be particularly acute for forwarders for whom far less mail is likely to pass authentication, despite being legitimate. As in other contexts, forwarded streams are likely to require special handling (e.g. by not delivering them via IPv6 except where DKIM passes, or treating delivery-via-IPv6 as a problem to solve later). It may also be the case the receivers can simply greenlist known-strict forwarders and apply content-based filtering as usual. (Note that such forwarders would not appear on useful blocklists anyway.)

  • There is another concern about 4xx responses causing poorly-behaved sending MTAs to delay even before trying other listed MXs, much as there is for greylisting. RFC5322 5.1 only specifies “In any case, the SMTP client SHOULD try at least two addresses.” If it turns out that a substantial number of
    sending MTAs limit themselves to just two addresses, then implementing this defensive approach would require listing only a single IPv6-reachable MX. This is sufficient from fault-tolerance perspective (fallback to IPv4 being an intrinsic part of the design), but may run afoul of external mandates about MX configuration rules. Such rules could usually be adjusted as part of implementing this approach, but this may nonetheless end up being a show-stopper for the entire approach for some organisations. Only operational experience will tell for certain.

  • Also as for greylisting, there may be a problem with legitimate-but-poorly-behaved sending MTAs that never retry after a 4xx response. As these are rather small in number, the same approach that was used for greylisting is likely to be viable: the development of a database of known legitimate senders who don’t deal correctly with 4xx responses and simply greenlisting them. Mail from these sources should be still be checked by content filters of course.

  • There may arise a concern that the use of addressbook data in deciding how to respond during SMTP might expose an addressbook-harvesting risk. I’d suggest that this was not a concern because it would only apply where domain authentication had succeeded with known good senders (not something that a botnet could usually do by itself) and even then, would only apply if the harvester had guessed a known sender+recipient pair. This appears to be too small an attack surface to worry about but, as ever with security concerns, this needs to be monitored and may need to be the subject of future work.

Relevant disclosure: I work for TrustSphere which supplies software that can be used for whitelist automation (TrustVault) and reputation data that can be used as described above (TrustCloud). On re-reading it occurs to me that this post makes a case for using TrustSphere’s products. I’d like to clarify that it is not the case that I believe the above (or wrote it without believing it!) because I work for TrustSphere but, rather, than I work for TrustSphere because I believe the above. See also my comments on this from a few years ago.

23 Nov 2012 (updated 23 Nov 2012 at 05:18 UTC) »
Towards ‘serverless’ social-networking

The rise of ‘cloud’ services and the rapid uptake of smartphones has created an unexplored – and perhaps quite large – niche for social software outside the control of advertiser-funded social network services (Facebook et al). While smartphone power and connectivity constraints make pure peer-to-peer social software on smartphones impractical, it is possible to construct a hybrid approach which moves much of the heavy lifting to undifferentiated/non-sticky services in the cloud while retaining owner/user control.

By contrast:

  • Many people, perhaps a majority, are perfectly happy to depend
    upon advertiser-funded social network services.

  • A visible majority is not and is therefore putting effort into personal server projects like FreedomBox to run a server in their own home which stores/shares/controls their own data and perhaps some of that of their friends. This approach avoids the power and connectivity barriers in smartphones, but requires the purchase, installation, connection, maintenance and physical securing of a device at the owner/user’s home and requires some technical expertise in dealing with the maintenance
    of the server operating system and software. Even if backups (and restores!) and upgrades are fully automated, diagnosing and correcting failures requires specialist expertise – and the
    time to use it – that the vast majority of people don’t have. This latter piece is a major part of the value that SaaS-providers generally – and social network services in particular – provide.

  • For people not concerned about governmental/law-enforcement interference, a [virtual] personal server in a data centre provides all of the other relevant benefits of a personal server and eliminates all of the physical aspects, but still requires specialist expertise in diagnosing and correcting failures.

  • For people who aren’t willing to run a server – whether virtual or real – but are willing to have their data in the hands of someone who isn’t selling advertising to fund their service and are willing to incur a small cost in time, money, inconvenience, etc., a variety of approaches are being explored. Notable amongst these are distributed/federated social networking software (e.g. Diaspora) and paid-subscription-only services (e.g. App.net).

  • Another group of people – myself included – would prefer not to run a server if possible – or are unable to – but would very much prefer that their data was under their own control. This is the unexplored niche.

The options are:

Will purchase, install, connect, maintain and physically secure
device at residence. Will maintain server software.
Will maintain server software. Won’t maintain server software. Willing to
pay $/time/inconvenience for increased freedom.
Will only use phone.
Concerned about governmental/law enforcement interference. FreedomBox
Concerned about control of data by others. FreedomBox on a virtual server. P2P app with non-sticky service help.
Concerned about advertising-funded sites skewed incentives
and/or constant unpleasant changing of the rules.
Diaspora on a friend’s server.


Not concerned. Facebook

To understand where the additional niche exists, imagine that smartphones generally had:

  • effectively unlimited battery capacity (comparable to that of a PC plugged into a national grid)

  • effectively unlimited CPU capacity (smartphones are now so powerful that this is rarely a constraint, but it would be nice if a photo/video that the owner/user shared suddenly going viral didn’t make it impossible to use the phone for several hours)

  • effectively unlimited network capacity (enough that authorised people browsing the owner/user’s photos could be loading them directly from owner/user’s the phone as they viewed

  • a fixed IP address and no NAT between it and the public Internet (so it could serve data without help from hosted services)

In this environment, it would be possible to produce social network software that ran only on phones and talked only to peers on other phones. Unfortunately on current and likely future mobile phone networks, three of those things are always false and the fourth is usually false. It is possible, however, to use a certain class of network-hosted service as force-multipliers for an app running on a phone to give it capabilities [almost] as good as those four things, and to do so without giving control away:

  • The simplest approach uses an object-storage service (Amazon S3, Rackspace CloudFiles, OpenStack Swift, … possibly enhanced with a CDN for even better speed) to share objects (files, possibly encrypted and/or subject to access control) to make things that have been shared available to others. For asynchronous browsing by others of things that the user has shared, this immediately provides all four capabilities described above. Importantly it is possible to share to multiple services of this type at the same time and to add and remove services at will, meaning that the user is never tied to one provider.

  • To add timely notification (which improves interactivity and reduces polling workload), any of a number of IM services (notably IRC and XMPP/Jabber) can be used to deliver short ‘message available at https://storageservice/objectid’ notifications between apps in near-real-time. This is not ideal as (a) such services are not currently available on a pay-per-use IaaS/PaaS basis, meaning that the user is dependent upon the willingness of someone else to carry their traffic free of charge and (b) this use (machine-to-machine) may be outside the intended use of such services, meaning that this use may not be as reliable as typical IM use. To the extent that this use is possible, parallel use of multiple services is also possible because when the traffic is
    machine-to-machine, the difficulties of untangling multiple streams of messages can be resolved by automated means, meaning again that services can be added and removed at will and the user is never tied to one provider. (Note also that there are several other approaches to the timely notification problem, some of which may be considerably better options; IM services are simply the most obvious example.)

This is not strictly ‘serverless’, but it introduces the use of hosted services in a way which (a) doesn’t cede control to an advertiser-funded social network service and (b) doesn’t require that the owner/user be willing/able to take on the administration of a virtual/real server.

An important objection in both cases is that identifiers in domains controlled by others are still required (host names for the storage-services’ web-servers in the first case,
nicknames/usernames in the second case), however it is not necessary for any of these to take the traditional role of an email address as a personal identifier known to the user’s contacts, they are merely communication endpoints and if the means of stating which ones to use is automated, use of multiple endpoints of multiple types can be sustained. This does require a less obvious means of representing identity, but note for comparison that until recently Facebook users used nothing analogous to an email address, users were located by their name and their proximity to others in the social graph. Each user has a unique identification number, but in general only developers need to know this. The recent addition of email addresses doesn’t materially change the means of locating people, it simply happens that Facebook has added email support. The same identifier-independence is true for the scheme proposed here: the use and propagation of multiple communication endpoints can happen out of the sight of owners/users.

Another important concern is that if too many people start using this approach, IM networks are more likely to start blocking this kind of use. I’d suggest – as a hypothetical example – that FreedomBox-like projects may provide a way to address this: in many cases someone owning a FreedomBox is likely to be willing to have their friends use the device to deal with real-time notification needs. The FreedomBox XMPP/Jabber server could perhaps be enhanced to allow the option for certificate-based authentication by any of the owner’s friends without requiring registration formalities, meaning that this approach could extend non-advertiser-controlled social networking software to a much, much larger audience than those who are willing to run a [virtual] FreedomBox themselves. Not everyone knows someone who’s willing to run their own server, but the pool of people who do know such a person is dozens or hundreds of times as large as the pool of people who are able to do so themselves, meaning that if this approach is of interest to FreedomBox-like projects then there may be an opportunity here to reach a much larger audience much sooner.

This post is not yet a call to action, more a partial statement of vision, I intend to write several more posts over the next few weeks/months fleshing this idea out.

(permalink at rolandturner.com)

Prefixing stdout and stderr with helpful markers

I’m testing a piece of logging code, so I care a lot about what goes where. I figured that there had to be a shell one-liner to allow me to mark stdout/stderr without any setup or code changes. Here it is:

(( some_command | sed '-es/^/stdout: /' >&3 ) 2>&1 | sed '-es/^/stderr: /') 3>&1

So, for a trivial example:

$ ((( echo out ; echo err >&2; ) | sed '-es/^/stdout: /' >&3 ) 2>&1 | sed '-es/^/stderr: /') 3>&1
stderr: err
stdout: out

As the example shows, sequence between stdout and stderr may not be preserved.

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!