Apache Virtual Host Request Matching Logic

When setting up the Apache Web Server, the most important step that also consumes a fair bit of our time is the creation or setting up of the virtual hosts. Setting up multiple virtual hosts (on the same machine) is one thing but then there are chances that they may not get picked for the incoming requests or client connections due to some incorrect configuration setting. This requires us to get a solid understanding of how virtual hosts are matched with requests or basically out of multiple virtual hosts, which one does httpd decide to serve from for an incoming request.

This article will help aim at detailing out how exactly does Apache pick up the most eligible VirtualHost configuration block to serve clients based on different parameters like the machine IP address and port combination the request is accepted on, the HTTP Host header, etc. Before learning about the virtual host resolution logic or algorithm, I’d highly recommend you understand (if not then read) the following topics properly:

These topics are definitely pre-requisites to understanding request matching entirely. Let me still try to touch up on some of these topics in the shortest way possible so that you get some idea even without reading those guides linked above.

Prerequisite Topics

Virtual hosts themselves are different blocks of configurations or servers used to serve different websites or hosts on the same server machine. They are defined by <VirtualHost> blocks.

All the <VirtualHost> configuration blocks or containers are known as virtual servers. All the virtual servers are contained inside the main server. Another way of looking at it is, all the configuration outside the virtual host blocks are part of the main server configuration.

Every virtual host is defined by one or more address[:port] where the port is optional and the address can be an IP, domain name or a wildcard (* or _default_). Here’s an example:

<VirtualHost *:80>...</VirtualHost>

If you have exactly one virtual host for a ip-port combination, then that is an IP-based virtual host. Whereas if you have multiple virtual hosts defined for the same ip-port combination or in other words, multiple virtual hosts share the same ip-port combo, then these are also known as name-based virtual hosts.

With these concepts out of the way in the simplest form, lets proceed with the request matching process.

Request Matching

For an incoming request, only one <VirtualHost> will be matched and selected as the winner. By winner, I mean it will be used as the request server. This resolution happens in multiple steps.

Step 1: Address and Port Based Lookup

Virtual hosts are defined with one or more addresses. All the addresses together are known as the vhost’s address set. For example here are two virtual hosts with different addresses.

<VirtualHost 143.110.176.71:80>
  ...
</VirtualHost>

<VirtualHost 10.122.0.2:80>
  ...
</VirtualHost>

When a client connection is first received on some address and port on the machine, httpd looks for all the VirtualHost definitions that have the same IP address and port. Hence a request to 143.110.176.71:80 will match the first virtual host above, where as a local request to 10.122.0.2:80 will match the second virtual host.

But what if a connection was received on an address which hasn’t been used in a virtual host ? Like 127.0.0.1:80 (local) or [2400:6180:100:d0::b03:5001]:80 (public IPv6) ? Or say the first virtual host above is not there and a request to that IP is made ? In such a case Apache will first see if it can find any wildcard matches:

<VirtualHost *:80>
  ...
</VirtualHost>

<VirtualHost _default_>
  ...
</VirtualHost>

If yes, then that virtual host will be picked up. This means virtual hosts with a specific IP address gets more precedence or priority over wildcards.

If no, then the request is passed on to the main server to be handled and served. What will be served in this case though ? It’ll depend the default DocumentRoot, Location, Directory, etc. directives. But in most default Apache installations, there’s some file lying in the default document root that is served.

What if there are multiple virtual hosts with the same IP address and port ? Note: These are also known as named virtual hosts. Example:

# Main Server Config
Listen 80

<VirtualHost 143.110.176.71:80>
  ServerName foo.com
  DocumentRoot /home/foo/public_html
</VirtualHost>

<VirtualHost 143.110.176.71:80>
  ServerName bar.com
  DocumentRoot /home/bar/public_html
</VirtualHost>

<VirtualHost 143.110.176.71:80>
  DocumentRoot /var/www/public_html
</VirtualHost>

Summary:

  1. Match the address-port combination on which the client connection was received with all the virtual host definitions. Direct/specific IP address definitions will get more priority over wildcards.
  2. If no matches are found, then the request is served by the main server.
  3. If only one match is found, then the request is served from that.
  4. if multiple matches are found, i.e., there are a multiple named virtual hosts, then the next step is executed.

Step 2: Name-based Virtual Host Filtering

If the request contains a Host header field, the multiple virtual host matches are searched for a matching ServerName or ServerAlias. Hence, in our example above, a request to foo.com will match with the first virtual host and bar.com with the second one. The Host header may contain a port number as well, but that’ll be ignored in favour of the connection’s port number which is more trustworthy.

But what if we get a request for baz.com that we want to serve from the same IP-port combo but don’t have a ServerName or ServerAlias entry for ? Or what if the Host header is absent ? In that case the first match from the previous step is used. This means for baz.com the virtual host of foo.com will be used. The first match is decided based on the appearance-order or the order in which the virtual hosts have been defined in the configuration files. Hence, the order of vhosts specified in the configuration file matters.

This first virtual host defined in a list of name-based virtual hosts is also known as the default or primary server.

Summary:

  1. If the Host header is present, then the list of name-based virtual hosts is checked against the ServerName or ServerAlias to find the matching virtual host for the request.
    1. If no match is found, then the first virtual host is picked.
    2. If one match is found, then that is picked.
    3. If multiple matches are found, then also the first one is picked.
  2. If the Host header is absent then that will lead to a no match as well causing the first virtual host to be picked for serving.

A small note on what happens if ServerName and ServerAlias are missing:

  1. If ServerName is missing in a virtual host, then the ServerAlias is checked for matching.
  2. If ServerAlias is also missing then the default ServerName from the main server config is inherited by the virtual hosts’s to be used as the server name.
  3. If the main server doesn’t specify ServerName then the system hostname is used as the default server name by the virtual host.
  4. If the system hostname lookup fails, then a reverse lookup is performed on the first system IP address to find a suitable hostname to be used as the default server name.

Debugging Virtual Hosts

Once you’ve setup your virtual hosts, you can dump the parsed vhost settings for debugging purposes. For instance, this is what this blog’s parsed config looks like:

$ apachectl -D DUMP_VHOSTS # or httpd -D DUMP_VHOSTS
VirtualHost configuration:
*:80                   is a NameVirtualHost
         default server catchall (/etc/apache2/sites-enabled/000-catchall.conf:1)
         port 80 namevhost catchall (/etc/apache2/sites-enabled/000-catchall.conf:1)
         port 80 namevhost codingshower.com (/etc/apache2/sites-enabled/codingshower.conf:1)

The output above is really useful. In this case we know that our configuration has a couple of name-based vhosts and once the connection is matched with the addr:port (*:80 in this case), there’s are two virtual hosts with hostnames catchall and codingshower.com (both coming from ServerName) that may be matched and picked depending upon the Host header. If no server name match is found then the default virtual host selection will be default server catchall which is the default or primary server.

More on debugging virtual host configuration settings can be found here.

Leave a Reply

Your email address will not be published.