Apache Virtual Host Request Matching Logic
When setting up the Apache Web Server, the most important step that also consumes a fair bit of our time is the creation or setting up of the virtual hosts. Setting up multiple virtual hosts (on the same machine) is one thing but then there are chances that they may not get picked for the incoming requests or client connections due to some incorrect configuration setting. This requires us to get a solid understanding of how virtual hosts are matched with requests or basically out of multiple virtual hosts, which one does
httpd decide to serve from for an incoming request.
This article will help aim at detailing out how exactly does Apache pick up the most eligible VirtualHost configuration block to serve clients based on different parameters like the machine IP address and port combination the request is accepted on, the HTTP
Host header, etc. Before learning about the virtual host resolution logic or algorithm, I’d highly recommend you understand (if not then read) the following topics properly:
These topics are definitely pre-requisites to understanding request matching entirely. Let me still try to touch up on some of these topics in the shortest way possible so that you get some idea even without reading those guides linked above.
Virtual hosts themselves are different blocks of configurations or servers used to serve different websites or hosts on the same server machine. They are defined by
<VirtualHost> configuration blocks or containers are known as virtual servers. All the virtual servers are contained inside the main server. Another way of looking at it is, all the configuration outside the virtual host blocks are part of the main server configuration.
Every virtual host is defined by one or more
address[:port] where the
port is optional and the
address can be an IP, domain name or a wildcard (
_default_). Here’s an example:
If you have exactly one virtual host for a ip-port combination, then that is an IP-based virtual host. Whereas if you have multiple virtual hosts defined for the same ip-port combination or in other words, multiple virtual hosts share the same ip-port combo, then these are also known as name-based virtual hosts.
With these concepts out of the way in the simplest form, lets proceed with the request matching process.
For an incoming request, only one
<VirtualHost> will be matched and selected as the winner. By winner, I mean it will be used as the request server. This resolution happens in multiple steps.
Step 1: Address and Port Based Lookup
Virtual hosts are defined with one or more addresses. All the addresses together are known as the vhost’s address set. For example here are two virtual hosts with different addresses.
<VirtualHost 220.127.116.11:80> ... </VirtualHost> <VirtualHost 10.122.0.2:80> ... </VirtualHost>
When a client connection is first received on some address and port on the machine,
httpd looks for all the
VirtualHost definitions that have the same IP address and port. Hence a request to
18.104.22.168:80 will match the first virtual host above, where as a local request to
10.122.0.2:80 will match the second virtual host.
But what if a connection was received on an address which hasn’t been used in a virtual host ? Like
127.0.0.1:80 (local) or
[2400:6180:100:d0::b03:5001]:80 (public IPv6) ? Or say the first virtual host above is not there and a request to that IP is made ? In such a case Apache will first see if it can find any wildcard matches:
<VirtualHost *:80> ... </VirtualHost> <VirtualHost _default_> ... </VirtualHost>
If yes, then that virtual host will be picked up. This means virtual hosts with a specific IP address gets more precedence or priority over wildcards.
If no, then the request is passed on to the main server to be handled and served. What will be served in this case though ? It’ll depend the default
Directory, etc. directives. But in most default Apache installations, there’s some file lying in the default document root that is served.
What if there are multiple virtual hosts with the same IP address and port ? Note: These are also known as named virtual hosts. Example:
# Main Server Config Listen 80 <VirtualHost 22.214.171.124:80> ServerName foo.com DocumentRoot /home/foo/public_html </VirtualHost> <VirtualHost 126.96.36.199:80> ServerName bar.com DocumentRoot /home/bar/public_html </VirtualHost> <VirtualHost 188.8.131.52:80> DocumentRoot /var/www/public_html </VirtualHost>
- Match the address-port combination on which the client connection was received with all the virtual host definitions. Direct/specific IP address definitions will get more priority over wildcards.
- If no matches are found, then the request is served by the main server.
- If only one match is found, then the request is served from that.
- if multiple matches are found, i.e., there are a multiple named virtual hosts, then the next step is executed.
Step 2: Name-based Virtual Host Filtering
If the request contains a
Host header field, the multiple virtual host matches are searched for a matching
ServerAlias. Hence, in our example above, a request to
foo.com will match with the first virtual host and
bar.com with the second one. The
Host header may contain a port number as well, but that’ll be ignored in favour of the connection’s port number which is more trustworthy.
But what if we get a request for
baz.com that we want to serve from the same IP-port combo but don’t have a
ServerAlias entry for ? Or what if the
Host header is absent ? In that case the first match from the previous step is used. This means for
baz.com the virtual host of
foo.com will be used. The first match is decided based on the appearance-order or the order in which the virtual hosts have been defined in the configuration files. Hence, the order of vhosts specified in the configuration file matters.
This first virtual host defined in a list of name-based virtual hosts is also known as the default or primary server.
- If the
Hostheader is present, then the list of name-based virtual hosts is checked against the
ServerAliasto find the matching virtual host for the request.
- If no match is found, then the first virtual host is picked.
- If one match is found, then that is picked.
- If multiple matches are found, then also the first one is picked.
- If the
Hostheader is absent then that will lead to a no match as well causing the first virtual host to be picked for serving.
A small note on what happens if
ServerAlias are missing:
ServerNameis missing in a virtual host, then the
ServerAliasis checked for matching.
ServerAliasis also missing then the default
ServerNamefrom the main server config is inherited by the virtual hosts’s to be used as the server name.
- If the main server doesn’t specify
ServerNamethen the system hostname is used as the default server name by the virtual host.
- If the system hostname lookup fails, then a reverse lookup is performed on the first system IP address to find a suitable hostname to be used as the default server name.
Debugging Virtual Hosts
Once you’ve setup your virtual hosts, you can dump the parsed vhost settings for debugging purposes. For instance, this is what this blog’s parsed config looks like:
$ apachectl -D DUMP_VHOSTS # or httpd -D DUMP_VHOSTS VirtualHost configuration: *:80 is a NameVirtualHost default server catchall (/etc/apache2/sites-enabled/000-catchall.conf:1) port 80 namevhost catchall (/etc/apache2/sites-enabled/000-catchall.conf:1) port 80 namevhost codingshower.com (/etc/apache2/sites-enabled/codingshower.conf:1)
The output above is really useful. In this case we know that our configuration has a couple of name-based vhosts and once the connection is matched with the
*:80 in this case), there’s are two virtual hosts with hostnames
codingshower.com (both coming from
ServerName) that may be matched and picked depending upon the
Host header. If no server name match is found then the default virtual host selection will be
default server catchall which is the default or primary server.
More on debugging virtual host configuration settings can be found here.