As Zach mentioned in one of his recent blog posts, we’ve been looking at different ways in which we can handle a lot more concurrent traffic to both Atlas and Voila. Currently, we’re serving API requests with Apache. Apache is great; it’s been around since the early days of the web, and remains the single most popular HTTP server in use today. If you’ve used Apache a bit, you’ll be familiar with setting it up for any number of sites and uses.
Fortunately, in the 20-ish years Apache has been around, newer and more nimble successors have arrived. As is the way at MetaBroadcast, we’re always keen to try newer alternatives, especially if it means we’ll be providing better performance or reliability to our users. The contender for the server throne: Nginx (pronounced “engine X”).
Apache is like Microsoft Word, it has a million options but you only need six. Nginx does those six things, and it does five of them 50 times faster than Apache. ( quote from Chris Lea)
what makes nginx so good
One of the main issues with Apache is in how it handles incoming requests. It likes to spawn threads to deal with each user’s request, which works fine under low traffic, or on instances with huge amounts of memory available. Eventually though, even on a very powerful server, you’ll hit the C10K problem — essentially, you’ll try to open more threads than your system can ever reliably handle.
In Nginx, you define a number of workers (typically this will match the number of CPUs on your instance), and each of these workers can handle thousands of simultaneous requests. Also, each worker is non-blocking, meaning there’s no knock-on effect on disk activity if the CPU is busy, and vice versa.
Part of what has made Apache less suitable these days is that it’s become a bit bloated; in trying to be able to handle everything, it has gained some unnecessary modules and features, none of which we’re particularly benefitting from. In our case, Apache is mostly sitting in front of a JVM or some other service, adding logging and user authorisation where required – both of which Nginx is more than more than capable of doing.
In simplifying how it handles requests and threading, not only is Nginx better at handling high numbers of concurrent users with less memory than Apache, but also it does all of that a lot faster.
sounds great, where do I sign?
There are a few small caveats to Nginx. Firstly, although it deals with the concept of a ‘vhost’ in a similar way to Apache, the syntax of Nginx’s configuration files is very different. Not complex — in many ways actually a bit more straightforward — but lacks the familiarity that everyone here has with how Apache works. Luckily, they’re not too difficult to learn, nor do they need to be updated often.
Nginx (at least in its basic installation) doesn’t handle .htaccess files or HTTP auth groups. We don’t use the former, but we do rely on the latter to manage who has access to different vhosts. We can achieve this either through the 3rd party AuthDigest module, or by splitting the authorisation users from a single auth file into files that reflect the groups we had defined previously.
One of the problems we encountered that we didn’t expect was in how Nginx handles DNS resolution. To help it achieve its great speeds, it caches the IPs of its backend proxy resources on startup. This would be great if we were running fixed servers whose IPs we could maintain. We are, however, using highly dynamic AWS infrastructure. Instances come and go, Elastic Load Balancers scale IPs dynamically, and even Amazon’s DNS service itself can move around. Our solution was to employ Dnsmasq and Nginx’s resolver feature, which essentially proxies DNS queries back and prevents Nginx from remembering IP addresses for too long.
As you can see above, in just a few short lines we’ve defined an Nginx vhost that will accept all incoming HTTP requests, log them, and proxy them to the JVM application. This a basic use case, but it’s the most common example of what we use Apache, and now Nginx, for. Given how we use Puppet and want to make the managing the new vhost configuration as simple as possible, I’ve been setting things up with jfryman’s Nginx module for Puppet, which has made things much easier to maintain.
Have you moved to Nginx recently, or been looking at changing to a different web server setup? Let us know on Twitter what you think!