Who’s Got Your Mail? Characterizing Mail Service Provider Usage IMC ’21, November 2–4, 2021, Virtual Event, USA
When the SMTP protocol is used to relay a message (i.e., from
one MTA to another), the sending (i.e., outbound) MTA identies its
partner MTA server by parsing e-mail addresses (i.e.,
user@domain
)
to extract the associated domain names. For each (unique) domain
name in the destination address(es) of an e-mail, the sending MTA
will lookup a DNS MX record. This MX record points to the server
to which receiving e-mail on behalf of the particular domain name
is delegated. By fully resolving this record, the sending MTA server
ultimately identies and establishes a connection with the receiving
MTA server. In this mail relay mode, TCP port 25 is typically used
(there are other ports that are used occasionally, such as port 2525,
but these are not supported by IANA or IETF [
39
] and so we do not
consider them in this paper).
2.2 Mail Exchanger Records
The Mail Exchanger (MX) record species which MTAs handle
inbound mail for a domain name [
18
,
24
,
26
] and is published in
the DNS zone of the domain. An MX record should itself contain
a valid domain name [
23
,
26
]. Multiple MX records can be con-
gured in a zone, each with an assigned preference number. The
lowest preference has highest priority, and multiple MX records
can share the same priority for load balancing [
18
]. An MX record
can be made up, in part, of the registered domain name for which
it receives e-mail, yet resolve to completely separate infrastructure.
For instance, the MX record for our institution
ucsd.edu
contains
inbound.ucsd.edu
, which in turn resolves to an IP address (A record)
owned and operated by ProofPoint, a well-established mail ltering
company wholly dierent from ucsd.edu.
2.3 STARTTLS and TLS certicates
Modern SMTP implementations opportunistically support the START-
TLS option which, in the mail relay context, allows the sending
MTA to initiate a TLS connection with the receiving MTA [
11
,
16
].
If the receiving MTA supports STARTTLS, it will provide a TLS
certicate which can be used to bootstrap a TLS session providing
session condentiality. To provide a valid certicate, the receiving
MTA must obtain a signed certicate from a trusted certicate au-
thority (CA) for which the MX domain name is either specied in
the Common Name (CN) or a Subject Alternative Name (SAN) eld.
While ideally TLS certicates are validated by the sending MTA,
in practice SMTP sessions will continue even if the certicate does
not validate [
13
,
14
]. Note that the SAN eld is used when a single
certicate must support TLS connections across a range of domains.
For example, the certicate used by Gmail has Common Name
mx.google.com
, and its SAN species other alternate domain names,
such as
aspmx2.googlemail.com
and
mx1.smtp.goog
.
3
In these cases,
the Common Name (CN) almost always species a principal domain
operated by the provider of the service.
2.4 Related work
Considering its critical role, remarkably little contemporary anal-
ysis exists of e-mail infrastructure and who provides it. Some of
the best known modern work in this space is the pair of 2015
3
mx1.smtp.goog is a valid and resolvable domain owned by Google.
papers authored by Durumeric et al. and Foster et al. which em-
pirically explored the use and conguration of privacy, authentica-
tion, and integrity mechanisms at each stage of the e-mail delivery
pipeline [
13
,
14
]. Notably, Durumeric et al. also provide one esti-
mate of the top mail providers as a part of their study, although their
methodology may underestimate the inuence of major providers
(notably Microsoft). Rijswijk et al. [
37
,
38
] investigated the growth
of three top mail providers over a relatively short, 50-day period,
and demonstrated the phasing out of Windows Live over Oce365,
among others. Their analysis, unlike ours, considers only the con-
tent of MX records, and mail was not the focal point of their work.
Finally, in 2005, Afergan et al. [
2
] measured the loss, latency, and
errors of e-mail transmission over the course of a month with hun-
dreds of domains.
Somewhat further aeld, there is a literature exploring how dan-
gling DNS records impact e-mail security, starting with the work of
Liu et al. [
22
], who explored e-mail as a special case of a general anal-
ysis of dangling DNS issues. This work was recently expanded by
Reed and Reed in their technical report that focuses specically on
dangling DNS MX records and their potential security impact [
29
].
Another direction of research, notably by Chen et al. [
9
] and Shen et
al. [
32
], studies the vulnerabilities of third-party mail providers and
how those vulnerabilities could be used to spoof e-mail messages.
In spite of these and related eorts, we have found very little
work focused on characterizing which organizations are, in fact,
responsible for providing mail service or how this responsibility
has changed over time. Indeed, perhaps the closest related work
is not from the academic literature, but from the recent Medium
post of Jason Trost which describes an analysis of MX records for
identifying e-mail security providers [36].
3 IDENTIFYING MAIL PROVIDERS
In this section, we rst illustrate the challenges in identifying mail
service providers, in particular how MX records alone can be mis-
leading, and the strengths and weaknesses of using alternative
features. Given these limitations, we then present our priority-
based approach for identifying the mail provider for a given do-
main name. For the purpose of this work, we focus on the primary
e-mail provider, which is identied by the MX record with the
highest priority. Finally, we evaluate the accuracy of this approach
using randomly sampled domains from the three larger datasets
of domains on which we base much of our subsequent analysis
(described in detail in Section 4.3).
3.1 Challenges in Provider Identication
One approach, exemplied by Trost’s analysis [
36
], relies exclu-
sively on MX records to identify the mail provider. However, this
approach can be misleading when the purported MX domain re-
solves to an IP address operated by a dierent entity.
Better accuracy can be achieved by incorporating additional
features, such as the autonomous system number (ASN) of the
IP address to which an MX record resolves, the content of Ban-
ner/EHLO messages in the initial SMTP transaction, and TLS cer-
ticates learned during an SMTP session. However, using multi-
ple features creates additional complexities. In particular, while
SMTP-level information is typically a more reliable indicator of