Contents
1. V. EXPLOITATION WORKFLOW OF BREAKSPF ATTACK
This section will introduce the exploitation workflow of the BreakSPF attack and techniques and potential challenges for each procedure. According to Figure 6, we divide the workflow of BreakSPF into the following six steps.
1.1. A. Domain Collection
First, we obtained a list of potential attack target domains from the Tranco domain list [21].
In addition, we collected the subdomains of Tranco Top 1M domains from a passive DNS dataset similar to Farsight DNSDB, provided by QiAnXin Company, since SPF records may be configured on subdomains for certain domains.
The passive DNS dataset is collected from public DNS resolvers known as 114DNS, the largest DNS provider in China [22].
Our experiments involve a total of 7,183,870 domains, which include Tranco Top 1M domain names and their subdomains.
1.2. B. SPF Scanning
We then proceeded to scan the SPF records for the domain set by querying their TXT resource records with XMap [23] and extracting the DNS responses that began with “v=spf1” as the prefix. Because of the redirect and include mechanisms (as introduced in Section II) used in SPF records, each SPF record can establish dependency relationships with other domains.
In this way, the complete SPF configuration of a domain constructs an SPF dependency tree. Each SPF record is a node of this tree structure. If we only scan the root node of the SPF dependency tree, it is hard to find vulnerable SPF records.
To evaluate the status of SPF deployment more comprehensively, we need to traverse the tree structure recursively.
We recorded the domains with multiple SPF records during the recursion since they are invalid configurations according to RFC 7208 [9]. After that, we further parsed the SPF records according to the SPF syntax rules [9], extracted the domain names corresponding to include and redirect mechanisms, and scanned them recursively using the depth- first search (DFS) algorithm. We recorded all scanning results to avoid performing duplicate searches.
We set the recursion depth to 10 during the scanning process. If the recursion depth exceeded 10, we stopped scanning and considered the SPF record of that domain invalid.
1.3. C. Data Processing
After scanning SPF records, we need to process the results of the SPF scanning. First, we can perform four types of anal- ysis based on the SPF scanning results: adoption rate of SPF, grammatical analysis of SPF records, include mechanism analysis, and IP coverage of SPF records. Details of these analyses will be discussed in Section V. These results will provide essential data support for the BreakSPF attack.
Next, we established a reverse query mechanism for the SPF dependency tree. We re-parsed the SPF records and traversed the SPF dependency tree to record each node’s ancestors. This mechanism allows us to know which domains include the SPF records of a target domain, which is critical for the subsequent attack process. Using this mechanism, we can determine which domain names a vulnerable SPF record can affect and which email providers are used by popular domain names
1.4. D. Database Building
The most critical step in the attack process is creating mappings from the IP addresses in the SPF records to their corresponding domain names. After establishing such a corre- lation database, an attacker can quickly determine whether a controlled or compromised IP address is included in the SPF record of a well-known domain name.
We parsed each SPF record and extracted IPv4 addresses from the “ip4:” tag. Considering the IPv4 address space is too large, we optimized the storage mode using a tree structure. We converted an IP address into a 32-bit integer and an IP address block into an integer range. We used the first number in this range as the key of the database and stored the domain name and CIDR prefix length on this key. For example, the SPF record of example.com contains an IP address range of 192.168.0.0/16, and we will store {“domain”:“example.com”;“cidr”:“16”} in the database entry corresponding to 3,232,235,520.
In this experiment, we ignored IPv6 addresses for the time being, since the IP addresses declared in SPF records are still dominated by IPv4 addresses. According to our measurement, only 2.2% domains configure IPv6 addresses in their SPF records.
We designed a query mode for a single IP address in the SPF reversed database and provided a web application programming interface (Web API). Attackers can access this web interface through the IP address they control to obtain information about which domain names the current IP address can represent to send spoofing emails.
The web server provides both GET and POST request interfaces. The backend function will analyze the IP address of requests sent in GET methods and the IP data sent in the POST body. When the backend function obtained the IP address submitted by the attacker, it traversed the CIDR prefix length from large to small (32 to 1), performed an AND operation on the IP address and the subnet mask, and converted the obtained subnet prefix into an integer as a key to query in the database.
Then, the database returned a JSON format response. The response may contain multiple domain names, and we iterate through them individually. If the CIDR prefix length corresponding to the current domain name in the JSON response is less than or equal to the previously enumerated CIDR prefix length, it is considered a successful hit.
The backend function recorded the hit and analyzed the next domain until the program completed the full cycle. Finally, the backend function de-duplicated the results according to the domain names and returned the results to attackers.
1.5. E. IP Collection
After setting up the reversed SPF database, attackers can easily launch the BreakSPF attack by obtaining feasible IP addresses and sending spoofed emails based on the query results from the database. The use of shared infrastructures, such as cloud services, enables attackers to acquire large volumes of IP addresses.
To conduct a comprehensive assessment of the SPF vul- nerability status, we tried to obtain as many IP addresses as possible. The greater the diversity of IP addresses, the more IP addresses we can cover, and the better our experimental results will be. Therefore, we sorted out a list of the current ways attackers can obtain public IP addresses on the Internet, which includes cloud servers, proxy services, serverless functions, CI/CD tools, and CDN services.
There are differences between these categories in terms of acquiring and using IPs to send spoofing emails. We explain the details of each category in Section VII. After obtaining IP addresses using the above methods, we leveraged the web API of the SPF reversed database to query and identify domains vulnerable to spoofing using these IP addresses. Meanwhile, the backend function of the web API recorded the query results for our subsequent data analysis.
Our system is designed to be extensible, allowing for the inclusion of additional IP acquisition methods within the existing framework if they are discovered in the future.
1.6. F. Email Spoofing Attack
The final step of the BreakSPF attack model is conducting email spoofing attacks.
Attackers need to select a domain name influenced by the obtained IP address to send spoofing emails.
Attackers can use a programming language to establish an SMTP connection with the victim’s email service, acting as a Message Transfer Agent (MTA).
Since the sender’s IP address is included in the SPF record of the sender’s domain name, these carefully crafted spoofing emails can pass SPF and DMARC verification, making them difficult to detect even for technical experts.