17 Dec 2012 raz   » (Observer)

A defensive strategy for accepting email over IPv6

Accepting email over IPv6 risks providing spammers with an easy entrance point because IP-address blocklisting is not likely to be viable for an address space as large as IPv6′s. The need to continue to accept email over IPv4 for the indefinite future provides a useful safety valve in that a receiver can push messages offered over IPv6 whose validity is uncertain back to the existing IPv4 service, thereby reducing the dependence upon – or even eliminating the need for – IPv6-address blocklists.

To take advantage of this a receiver needs whitelists (manually maintained, automatically generated, user addressbooks, provided by a reputation data provider, …) and the ability to test and act on domain authentication (SPF, DKIM, DMARC, …) during the SMTP conversation. Any message failing authentication, or passing authentication but not matching a whitelist, need merely be given a
temporary failure (4xx) response code. A well-behaved MTA (e.g. non-spammer) receiving 4xx responses will work through the receiver’s listed MXs until it finds one that gives an authoritative (2xx/5xx) response.

The argument that email receivers will need to accept email over IPv4 for the indefinite future is well-known and almost certainly correct, however organisations may find themselves wanting to accept email over IPv6 as well for at least two reasons:

  • The desire to pilot, experiment with or research acceptance of email over IPv6.
  • An externally imposed mandate that IPv6 be deployed for “all applications”.

The approach described here can be used in two different ways:

  • A defensive deployment from the outset for those who wish to get something working, but would prefer to deal up front with the risk of spammers exploiting the difficulties of IPv6-address blocklisting.
  • A fallback option for those who are willing to deploy without solving this problem, but wish to have a documented strategy for dealing with this problem when/if it arises.

In either case the benefit is the same: a production-use-ready approach for accepting at least some email over IPv6 with a safe fallback to IPv4 for the rest.

Ideally all of the relevant authentication mechanisms (SPF, DKIM and DMARC) can be processed and acted on during the SMTP transaction, but this approach can be adopted even if this is only true for SPF; the result will simply be that some of the email that could have been accepted over IPv6 will instead be pushed to IPv4.

Most types of whitelist data can be applied:

  • IPv6 address whitelists can be used as is.
    • A locally-maintained list of IPv6 addresses of mail-servers of trusted partners.
    • IPv6-address whitelists supplied by reputation data providers.
  • Domain whitelists can be used in conjunction with domain authentication (SPF (perhaps subject to DMARC’s alignment rules), DKIM, last-resort SPF data from a reputation data provider, …)
    • A locally-maintained list of domains of trusted partners.
    • A domain whitelist from a reputation data provider
  • In situations where end-user addressbooks are accessible during the SMTP conversation, the presence of a sender in the recipient’s addressbook can be treated as a whitelist match
    (subject to authentication checks as above)
    • For webmail providers this is pretty much a given
    • For others this is sometimes available from existing mail-server software, in other cases software can be used to automatically gather this data locally.

In general, content-based anti-spam filters need not be used for messages which have passed any of the above. A particular exception is malware checking: clearly, it is not desirable to deliver malware even if it’s from a source that’s known to behave well, e.g. because someone’s PC has become infected and is emailing exploits or phish to each of the user’s contacts.

Weaker signals might also be used to decide to accept a message subject to content-based anti-spam filters not detecting a problem. These include:

  • The existence of an rDNS entry for the source IP address, the existence of a matching forward DNS entry and the use by the connecting MTA of the same name in the HELO/EHLO string.
  • The connection originating from an AS, or a network within one, known to be particularly stringent in its containment of abuse. To avoid confusion, I’ll use the term
    “greenlisting” to refer to the listing of IPv6 addresses or networks as being allowed to connect but still subject to content-based filtering.
  • The RFC5322.From domain name being registered with a registrar
    known to be particularly stringent in de-registering abusers. This would of course have to be done in conjunction with domain authentication as above. (This is also somewhat hypothetical, I’m not sure that any registrar is currently strict enough for this purpose.)
  • Even without a domain whitelist entry, the historical behaviour of the RFC5322.From domain in sending mail to the receiver’s IPv4 service. Again, this would have to be done in conjunction with domain authentication.
  • The presence of well-formed, non-anonymised whois information for the RFC5322.From domain and/or the source IP address block.

These are all a little less robust than competent whitelisting, and may have to be tried on a “sacrificial lamb” basis, however as with the broad strategy of building on an IPv4 fallback, this is easier and safer to do than it was in an IPv4-only universe.

Astute readers will notice that what I am describing is an implementation of the Aspen Framework that Meng Wong described in his Sender Authentication Whitepaper 8 years (!) ago. I’d suggest that:

  • The concern about the infeasibility of IPv6-address blocklists and the certain availability of the IPv4 fallback for the indefinite future provides an opportunity to implement this approach for IPv6 receivers that never existed in an IPv4-only environment.

  • The period of time that this has taken should be a strong warning to people who blithely assume that email can simply be moved to IPv6 by mandate. Email is an unusually tough problem, progress is slow.

  • That things move so slowly makes incremental approaches like the one described here more valuable than they might otherwise be. (There’s little point piloting a partial approach that will be rendered obsolete when the “complete” approach arrives 6 months later. If you assume that a complete approach is many years away, then there is more to gain from the deployment of partial approaches.)

It is conceivable that this will eventually be the beginning of a migration strategy, that over time so much email will be able to be accepted on a “we know something good about this message” (rather than a “we know nothing bad about this
message” basis) that it will become viable to reject outright any email about which nothing good is known. I don’t actually expect that this will be the case, but also suspect that so much will change during the parallel running of delivery-to-MX over IPv4 and IPv6 that it’s not practical to predict how delivery-to-MX over IPv4 might be phased out. The important observation would appear to be that this approach provides a production-use-ready way to start.

Additional thoughts:

  • There is a legitimate concern about the additional workload that this will create – both for receivers and legitimate senders – in causing duplicate delivery of some/most/all legitimate email. I’d suggest that for early adopters this will not be a great concern, particularly while the total volume of email-over-IPv6 is small.

    • If many receivers adopt this approach when piloting accepting-over-IPv6 then the incentive to spammers to move to IPv6 will be greatly diminished in the first place, thus cutting much of the duplicate workload for receivers who senders can see are doing this. (This effect seems unlikely to be large enough to render the infeasibility of IPv6-address blocklists moot, but it would be a great side-effect!)

    • Early adopter senders are more likely to adopt full authentication anyway, however insufficient whitelisting may make encountering large numbers of receivers who push traffic to IPv4 cause costs that senders aren’t willing to incur. I’d
      suggest that operational experience will tell us how this plays out and that senders and receivers will be in a better position to work out what to do about this when/if there’s enough traffic for it to be an actual problem.

    • This problem is likely to be particularly acute for forwarders for whom far less mail is likely to pass authentication, despite being legitimate. As in other contexts, forwarded streams are likely to require special handling (e.g. by not delivering them via IPv6 except where DKIM passes, or treating delivery-via-IPv6 as a problem to solve later). It may also be the case the receivers can simply greenlist known-strict forwarders and apply content-based filtering as usual. (Note that such forwarders would not appear on useful blocklists anyway.)

  • There is another concern about 4xx responses causing poorly-behaved sending MTAs to delay even before trying other listed MXs, much as there is for greylisting. RFC5322 5.1 only specifies “In any case, the SMTP client SHOULD try at least two addresses.” If it turns out that a substantial number of
    sending MTAs limit themselves to just two addresses, then implementing this defensive approach would require listing only a single IPv6-reachable MX. This is sufficient from fault-tolerance perspective (fallback to IPv4 being an intrinsic part of the design), but may run afoul of external mandates about MX configuration rules. Such rules could usually be adjusted as part of implementing this approach, but this may nonetheless end up being a show-stopper for the entire approach for some organisations. Only operational experience will tell for certain.

  • Also as for greylisting, there may be a problem with legitimate-but-poorly-behaved sending MTAs that never retry after a 4xx response. As these are rather small in number, the same approach that was used for greylisting is likely to be viable: the development of a database of known legitimate senders who don’t deal correctly with 4xx responses and simply greenlisting them. Mail from these sources should be still be checked by content filters of course.

  • There may arise a concern that the use of addressbook data in deciding how to respond during SMTP might expose an addressbook-harvesting risk. I’d suggest that this was not a concern because it would only apply where domain authentication had succeeded with known good senders (not something that a botnet could usually do by itself) and even then, would only apply if the harvester had guessed a known sender+recipient pair. This appears to be too small an attack surface to worry about but, as ever with security concerns, this needs to be monitored and may need to be the subject of future work.

Relevant disclosure: I work for TrustSphere which supplies software that can be used for whitelist automation (TrustVault) and reputation data that can be used as described above (TrustCloud). On re-reading it occurs to me that this post makes a case for using TrustSphere’s products. I’d like to clarify that it is not the case that I believe the above (or wrote it without believing it!) because I work for TrustSphere but, rather, than I work for TrustSphere because I believe the above. See also my comments on this from a few years ago.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!