Adam Horwich

how do you solve a problem like patching

In an ideal world, all software would be well written and reliable. Sadly that will never ever happen. Vulnerabilities will always be discovered, and patches will always need applying. How we monitor and manage this process is the real headache. Should we be overly cautious or gung-ho? And how do we achieve that manageable middle ground? Today I’m going to explore the way we’ve been thinking about patching our Operating Systems.

knowing is half the battle

Before we even think about what we’re going to apply, we really ought to know the state of affairs, and severity of the vulnerability. Unsurprisingly the market for analysing and monitoring large infrastructure in an easy-to-understand (business friendly) manner is a highly lucrative one. There are lots of commercial tools available which can do this well; but this goes against our philosophies. Even Canonical’s Landscape product is unreasonably priced, which is rather disappointing. Perhaps they should release a cut-down product which provides the essential functionality a System Administrator requires to keep on top of their infrastructure without any unnecessary flair? Alas, even though Red Hat did this with their Spacewalk product, it does not play well with Ubuntu. On the open source front we’ve a few options, some integrated with deployment management products (Puppet, Chef) like Foreman, and others more passive like Pakiti. Neither of these tools are solutions in themselves for us though. The former requires a Puppetmaster (which we’re not currently implementing, but it may be something to revisit in the future), and the latter only has ropey support for Ubuntu until version 3 is release.

the sledgehammer approach

There are some basic tools readily available to keep your system up to date, like cron-apt. This package is offers some scripting, control, and notification for package updates and with a modicum of tweaking can be made to only apply patches in the security updates repository. The only problem here is that, as much as we’d want to be in step immediately with the latest packages, we’d rather not (a) update all systems at the same time, nor (b) apply all available updates. Sometimes updates go bad, and bad things happen. We’d rather err on the side of caution and allow a human to deploy potentially risky changes.

caution comes at a cost

So, in the absence of nice, neat services (Landscape) we’re faced with a two pronged approach. We need a service to monitor the patch status, and tools to deploy updates. The lesser of two evils, I guess. If Ubuntu Spacewalk wasn’t a hacky mess, maybe that’d suffice. Alas no.

compromises compromises compromises

This then is what we came up with: Package monitoring in Nagios and an AWS-integrated SSH tool to deploy updates. By leveraging our existing monitoring infrastructure, we’re not giving support engineers more tools to learn, or different locations to visit. And by writing a multi-ssh client, we’re re-using code already in place for our instance deployment. So the setup cost was relatively low. There were some nuances on getting it all to play nicely, which I’ll cover with you.

package monitoring in nagios

Ideally this would have been done by Pakiti, as it’s a tool I’ve worked with before, and it’s rather good at identifying high severity security updates… in CentOS. Unfortunately, it’s not so hot with Debian, and not really worth the investment to get it to how we want it, when the developers are promising a redesigned version with more generic support. So instead, with some minor modifications, we’re using the check_debian_packages plugin and adding it to our default suite of NRPE checks for hosts. With one change. We only want it to look at packages available in the security repository:

my $CMD_APT = "/usr/bin/apt-get upgrade -s -o Dir::Etc::SourceList=/etc/apt/security.sources.list";

So we simulate an apt-get upgrade, and provide a repo source list which only contains the security repos. But that’s non-standard in your typical Ubuntu deploy, so we had to do a bit of finesse to get this in place. And this is where we come back to our blunt instrument. Alongside deploying updates, it also provides a nice way of performing an apt-get update to refresh the repo caches. This is essential for keeping on top of new patches coming out. We’d rather not update the cache every time we perform a check, so a daily asynchronous check is the order of the day.

puppeteering with sledgehammers

While we’re busy installing cron-apt on our servers, we’re also generating that separate security sources.list file:

exec { "security-updates-cron":
                command => 'grep "-security" /etc/apt/sources.list | grep -v "#" > /etc/apt/security.sources.list',
                creates => '/etc/apt/security.sources.list',
                path => [ "/bin/", "/sbin/" , "/usr/bin/", "/usr/sbin/" ],
        }

Here we’re doing a simple grep and piping the output to a new file. The ‘creates =>’ entry means that this exec will only run if that file does not exist. As we’re not in the practice of performing dist-upgrades (i.e. moving from 11.10 to 12.04) this should never need to change. Our philosophy is that all servers are expendable (it’s a necessity in the cloud), so if we were ever to need to upgrade a release, we’d just deploy another instance.

with great power comes great… paranoia

Sure, so now we know what patches are available. But how do we then decide what to patch? Well that’s a good question, and one that’s still a work in progress. Again, integrated packages like Landscape and Pakiti would be great here as they ingest CVE/OVAL data to markup the high security patches, but we don’t necessarily have that. Instead we have mailing lists and notifications. At least from this we can match up against a list of updates we know are yet to be applied on a host. So we come to our tool, aws-ssh.sh. This takes a command, an action, and an optional parameter to compile a command to be executed on all AWS instances. It means we don’t allow arbitrary commands to be run over all instances, because that could be terrifying. We also employ some validation in case some stray ampersands or semicolons get into the mix. We do trust our engineers, but we are also mindful that this command will be executed over EVERYTHING.

sequential versus concurrent ssh

In its present incarnation, we use a for loop to iterate over instances we run in the cloud. Ideally we would use a Cluster SSH tool, but given that this would require extra software for all engineers, I opted for a more traditional approach.

./aws-ssh.sh -c package -a upgrade -e stage -p all
#Generates...
ssh $USER@$INSTANCE "sudo aptitude safe-upgrade"

The above command will run an aptitude safe-upgrade command on all instances in our AWS infrastructure which are tagged as stage servers, and will attempt to deploy all updates ‘safely’.

./aws-ssh.sh -c package -a upgrade -e prod -p security
#Generates...
ssh $USER@$INSTANCE "sudo aptitude -o Dir::Etc::SourceList=/etc/apt/security.sources.list safe-upgrade"

This time the script has been told we only want to apply security updates to our production servers, and so the compiled command includes that apt repository list we’ve generated on all our hosts. Of course, if we want to only upgrade a single package, then we can supply that package name as the -p flag.

how we discriminately deploy patches

#The -e flag is picked up and added to the AWS_ENV variable. If not supplied, it is set to *
TAGCMD="ec2-describe-tags"
EC2CMD="ec2-describe-instances"
IFS='
'
#Collect all running instance details
INSTANCES=( `$EC2CMD | grep "running"` )
#Collect all Instance IDs tagged with the supplied environment
TAGS=( `$TAGCMD | grep Env | grep "${AWS_ENV}" | awk '{ print $3 }'` )
#Iterate over all collected instances
for t in "${INSTANCES[@]}"
do

        INSTANCE=`echo $t | awk '{ print $4 }'`
        INSTANCE_ID=`echo $t | awk '{ print $2 }'`
        #If our instance is also in the list of instances with the supplied tag, then proceed
        if echo ${TAGS[@]} | grep -q "$INSTANCE_ID"; then
                ssh $USER@$INSTANCE "sudo $CMD $ACTION"
        fi
done

open to ideas

It the job done and has enough safeguards and monitoring in place to protect business continuity, so we’re happy. But, with tools like Foreman, we may well consider re-opening the door to running a Puppetmaster in the future, to see how well that could serve our requirements. We will also be keeping an eye on all Landscape, Spacewalk and Pakiti developments to make our lives easier, but I’d also be very keen on hearing your thoughts on the subject and how you’ve solved the problem.

blog comments powered by Disqus