Directives and contexts

A beneficial part of nginx being a reverse proxy is that it fits into a large number of server setups, and can handle many things that other web servers simply aren’t designed for.A popular question is “Why even bother with nginx when Apache httpd is available?”

The answer lies in the way the two programs are designed. The majority of Apache setups are done using prefork mode, where we spawn a certain amount of processes and then embed our dynamic language in each process. This setup is synchronous, meaning that each process can handle one request at a time, whether that connection is for a PHP script or an image file.

In contrast, nginx uses an asynchronous event-based design where each spawned process can handle thousands of concurrent connections. The downside here is that nginx will, for security and technical reasons, not embed programming languages into its own process – this means that to handle those we will need to reverse proxy to a backend, such as Apache, PHP-FPM, and so on. Thankfully, as nginx is a reverse proxy first and foremost, this is extremely easy to do and still allows us major benefits, even when keeping Apache in use.

Let’s take a look at a use case where Apache is used as an application server described earlier rather than just a web server. We have embedded PHP, Perl, or Python into Apache, which has the primary disadvantage of each request becoming costly. This is because the Apache process is kept busy until the request has been fully served, even if it’s a request for a static file. Our online service has gotten popular and we now find that our server cannot keep up with the increased demand. In this scenario introducing nginx as a spoon-feeding layer would be ideal. When an nginx server with a spoon-feeding layer will sit between our end user and Apache and a request comes in, nginx will reverse proxy it to Apache if it is for a dynamic file, while it will handle any static file requests itself. This means that we offload a lot of the request handling from the expensive Apache processes to the more lightweight nginx processes, and increase the number of end users we can serve before having to spend money on more powerful hardware.
Another example scenario is where we have an application being used from all over the world. We don’t have any static files so we can’t easily offload a number of requests from Apache. In this use case, our PHP process is busy from the time the request comes in until the user has
finished downloading the response. Sadly, not everyone in the world has fast internet and, as a result, the sending process could be busy for a relatively significant period of time. Let’s assume our visitor is on an old 56k modem and has a maximum download speed of 5 KB per second, it
will take them five seconds to download a 25 KB gzipped HTML file generated by PHP. That’s five seconds where our process cannot handle any other request. When we introduce nginx into this setup, we have PHP spending only microseconds generating the response but have nginx
spend five seconds transferring it to the end user. Because nginx is asynchronous it will happily handle other connections in the meantime, and thus, we significantly increase the number of concurrent requests we can handle.

In the previous two examples I used scenarios where nginx was used in front of Apache, but naturally this is not a requirement. nginx is capable of reverse proxying via, for instance, FastCGI, UWSGI, SCGI, HTTP, or even TCP (through a plugin) enabling backends, such as PHP-FPM, Gunicorn, Thin, and Passenger.

————————–
Quick start – Creating your first virtual host

Step 1 – Directives and contexts

To understand what we’ll be covering in this section, let me first introduce a bit of terminology that the nginx community at large uses. Two central concepts to the nginx configuration file are those of directives and contexts . A directive is basically just an identifier for the various configuration options. Contexts refer to the different sections of the nginx configuration file. This term is important because the documentation often states which context a directive is allowed to have within.
A glance at the standard configuration file should reveal that nginx uses a layered configuration format where blocks are denoted by curly brackets {} . These blocks are what are referred to as contexts.
The topmost context is called main, and is not denoted as a block but is rather the configuration file itself. The main context has only a few directives we’re really interested in, the two major ones being worker_processes and user. These directives handle how many worker processes
nginx should run and which user/group nginx should run these under.

Within the main context there are two possible subcontexts, the first one being called events . This block handles directives that deal with the event-polling nature of nginx. Mostly we can ignore every directive in here, as nginx can automatically configure this to be the most optimal; however, there’s one directive which is interesting, namely worker_connections . This directive controls the number of connections each worker can handle. It’s important to note here that nginx is a terminating proxy, so if you HTTP proxy to a backend, such as Apache httpd, that will use up two connections.
The second subcontext is the interesting one called http . This context deals with everything related to HTTP, and this is what we will be working with almost all of the time. While there are directives that are configured in the http context, for now we’ll focus on a subcontext within http called server . The server context is the nginx equivalent of a virtual host. This context is used to handle configuration directives based on the host name your sites are under.

Within the server context, we have another subcontext called location . The location context is what we use to match the URI. Basically, a request to nginx will flow through each of our contexts, matching first the server block with the hostname provided by the client, and secondly the location context with the URI provided by the client.
Depending on the installation method, there might not be any server blocks in the nginx.conf file. Typically, system package managers take advantage of the include directive that allows us to do an in-place inclusion into our configuration file. This allows us to separate out each virtual host and keep our configuration file more organized. If there aren’t any server blocks, check the bottom of the file for an include directive and check the directory from which it includes, it should have a file which contains a server block.

————————-
Step 2 – Define your first virtual hosts

Finally, let us define our first server block!
server {
listen 80;
server_name example.com;
root /var/www/website;
}

That is basically all we need, and strictly speaking, we don’t even need to define which port to listen on as port 80 is default. However, it’s generally a good practice to keep it in there should we want to search for all virtual hosts on port 80 later on.

=====================================================================

Quick start – Interacting with backends

Obviously, this virtual host is quite boring, all it does is serve a static file, and while that is certainly useful, it’s practically never all we want to do. Something more interesting would be to serve PHP requests, perhaps even for a framework with a front controller pattern and search engine friendly URLs.

Step 1 – A quick backend communication example
Communicating with a backend is done by passing the request to the backend if certain conditions are met. For example, in the following server block:

server {
listen 80;
server_name example.com;
root /var/www/website;
index index.php;
location / {
try_files $uri $uri/ /index.php;
}
location ~ \.php$ {
include fastcgi.conf;
fastcgi_pass 127.0.0.1:9000;
}
}
Here we are using a regular expression location block to define what should happen when a request with a URI ending in .php comes in. If the URI does not end in .php but, for instance, / contact-us/ , location / is used instead that tries to find a file on the disk using our root
directive and the URI. If that’s not found, it tries to search for a directory instead and uses our index directive to find an index file. If that is not found either, then it finally rewrites internally to /index.php and restarts location evaluation with the URI now ending in .php and as such the PHP location will be used and send the request to PHP.

Step 2 – Location blocks
As we’ll pass requests to a backend by using location blocks, it’ll be useful to understand the different types of location blocks available. Did you notice in the preceding section how the location blocks use different modifiers before the URI? In the first location there is no modifier, and in the second a ~ is used. This modifier changes how nginx matches the location to the URI sent by the end user. The modifiers and rules are as follows:

Modifier Result

No modifier This will match as a prefix value. location / will match any URI beginning with / , while location /foo will match any URI beginning with /foo .
= This will match as an exact value. location = /foo will only match the exact URI /foo not the URI /foobar or even /foo/ .
~ This will match as a case sensitive regular expression using the PCRE library.
~* This will match as a case insensitive regular expression using the PCRE library.
^~ Will match as a prefix value, which is more important than a regular expression.

With all of these different location modifiers, nginx needs a way to know which one to use if multiple matches occur. To do this nginx assigns each type of modifier a certain specificity, which helps to determine how important a location is.

Modifier Specificity

= This is the most specific modifier possible, as it matches only the exact string. If this location matches, it will be chosen first.
^~ This modifier is used specifically when you want a prefix match to be more important than a regular expression location. If you have multiple matching locations of this type, the longest match will be used.
~ and ~* nginx has no way to decide how specific a regular expression is, so these are matched in the order they are defined. This means that if multiple regular expression locations match, the first one defined will be used.
No modifier Finally if nothing else matches, a standard prefix match is used. If multiple prefix locations match, the longest match will be used.

Knowing how nginx chooses a location is quite essential because of how nginx inheritance works. The common thing with every directive is that it will only ever inherit downwards, never up and never across contexts. In effect this means that we cannot have nginx apply two locations
at the same time. As soon as we internally rewrite a request and locations are re-evaluated, nginx will forget about the directives in the old location and only care about the ones in the new location.

For an illustration of this behavior, see this following server block:
server {
root /home/bill/www;
index index.php;
location /phpmyadmin {
root /var/www;
try_files $uri /phpmyadmin/index.php;
}
location ~* \.php$ {
fastcgi_pass php_upstream;
}
}

When a request comes in for /phpmyadmin/image/foo.jpg , the /phpmyadmin location will be considered most specific and try_files will find the image. In contrast, if a request comes in for /phpmyadmin , it will first use the /phpmyadmin location and then try_files will rewrite the request into the PHP location. When this happens everything from the previous location is discarded and now the root is inherited from the server context making the root / home/bill/www instead, and the request results in a 404 error.

Instead, what we need to do here is use a sublocation so that nginx does not have to inherit across.

server {
root /home/bill/www;
index index.php;
location ^~ /phpmyadmin {
root /var/www;
location ~* \.php$ {
fastcgi_pass php_upstream;
}
}
location ~* \.php$ {
fastcgi_pass php_upstream;
}
}

In this example we don’t need try_files , as we have no need to rewrite the request. If the URI matches /phpmyadmin/ , it will be chosen before the PHP location at the bottom, and if it then also matches the PHP sublocation, it will flow into that one, maintaining the root directive from
the parent location.

The positive aspect of the preceding scenario is that it will always be simple to tell which directives will apply to any given request, by just following the rewrites to the final location and checking directives in the parent contexts. There are no complicated inheritance paths across contexts with some values being overridden by new directives, while others persist. Related to location blocks is something called a named location . A named location is essentially a location that isn’t reached via the URI, but rather by internal references. A named location is denoted by a @ .

location @error404 { … }

This location is useful when you want to logically separate out some directives, but don’t want that part of the config accessible through the URI. The previously named location might be used for an error page, for example, where it would only be called when a request would result in a
404 error.

error_page 404 @error404;

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s