Amazon are great (now, thinking about it, this probably isn’t the first time I’ve started a sentence like that, and probably won’t be the last). But there’s a time and a place for all things, and I should probably start by talking about the three phases of growth. By which I mean the Cloud Service client, the Dedicated Server rental, and the Datacentre colocation. And, more importantly, how a business aligns to these phases, and what to consider when transitioning. What follows is an investigation into that unknown.
we’re gonna need a bigger boat
Clouds are brilliant, we love them. It’s so simple and quick to get something set up and running and globally available. Amazon have a great service they’ve put together, and over the past 4 years, they’ve helped us grow tremendously. So then, why would we ever want to change?
Firstly, we’re always looking to be more efficient with the way we allocate and utilise resources. Amazon provide fixed packages of ‘instances’ and sometimes you’re paying twice as much when you only needed a little more grunt.
Secondly, we don’t like magic. There’s a lot of wizardry and magic that happens behind the scenes at Amazon, and frankly that scares us a little.
Thirdly, we love consistency and transparency. While we’ve designed our architectures to withstand some of the troublesome problems Amazon have, they’re not particularly forthcoming about some of the more lower level technical features of their infrastructure. And once you start caring about these things more than just having another server to throw at a problem, things get a little challenging.
why rock the boat?
So for the past few months, it’s been my task to cover three things. One, to look at what the market is like against Amazon. Two, to assess the feasibility of a physical infrastructure. And three, to above all, find some justifiable, significant cost savings to vindicate a wholesale move. We’re not alone in this dilemma either, you’ll find lots of interesting articles about companies making that leap. There’s a lovely article by Mixpanel about why they moved away from Rackspace (we already ruled Rackspace out) to Amazon, and then a follow up article a year later on why they moved away from clouds entirely. The UK Government too have an article on their decisions to use Infrastructure as a Service clouds. Interestingly, neither of these companies talked about investigating the benefits of the middle ground, running your own Cloud infrastructure.
let’s start at the beginning
But I’m getting ahead of myself. I should first talk a little about my background. I started off in the High Performance Computing world of a large Research Council. That scale of company demands its own Datacentre, and makes large capital purchases with formal tendering processes. It’s the kind of world where your office is [physically] above your servers. External websites are corporately managed, so it’s the internal or international projects that you’ll be architecting. I consider this the top of the infrastructure food chain. I then moved onto the world of Streaming Media. This is the realm of colocation and 24/7 live streaming. You’ll have servers in your office but you’ll have most of your infrastructure self-purchased and in some rented racks in a Datacentre or two. Purchases are made on the project level, but there’s shrewd balance of efficiency versus bespoke requirements. Virtualisation also plays an important part in this. Then, presently at MetaBroadcast, there’s the startup challenge. Virtualisation is your best friend, and economies of scale, a distant dream.
sowing the seeds
All companies grow though. And judging when to leap between your broadly painted infrastructure pigeon holes is no easy feat. Move too soon and you find yourself hampered by overheads, too late and you’re burning money on the cloud while you procrastinate. It’s been my job to assess both where we are, and how we move forward.
So, what are my findings? Well exploring the ocean of Cloud providers out there, proved pretty pointless. There are savings to be made, but they are typically less than 10% and they are riddled with compromise, owing to a lack of Amazon’s magic/economies of scale. We tried to look at the problem on a per-project basis, which was reasonably effective, but it’s only when you factor in your full infrastructure (and future growth expectations) that nothing really sounds appealing. Essentially we proved that, barring any surprises from Google’s Compute Engine (which is far too early in its dev cycle for production services), there’s no financial incentive to move away from Amazon. There are many *other* reasons to move, but the cost benefit is marginal.
growing your own
If you can’t beat them, join them? Sure why not. If we’re getting the best value/features from our Cloud provider, then the next logical step is to consider the growth pigeon holes. For us, that’s looking at dedicated servers. We currently have approximately 60 AWS Instances of varying sizes, so it’s not quite a matter to transposing those to a physical box. Nor even could it ever be that simple, owing to some of the S3 and Route53 DNS magic we rely on. No, in fact, the solution is more elegant than that. Building our own private Cloud infrastructure on servers, which are 1/3rd of the price of the equivalent CPU hours in Amazon’s Cloud. Translating monetary cost to man cost isn’t much of an issue if you have a plan and a migration strategy.
openstack, I choo choo choose you
There are a few good open source Cloud platforms floating about, but OpenStack is arguably the best. It’s [mostly] conveniently integrated into Ubuntu 12.04LTS, it has major industry backing, there’s a shiny frontend, and it has much compatibility with AWS tools. The drawback is EVERYTHING ELSE. You’re pretty much left fending for yourself on networking, as it’s a fairly barebones implementation on OpenStack. We settled in the end for the Quantum OpenVSwitch plugin. Our architecture model will see us emulating the availability zone conundrum with infrastructure in multiple Datacentres, so we need a networking plugin that’s VLAN friendly if we want to communicate between instances on private networks. What we’ve tested so far is the ability to reliably deploy an OpenStack Server (no mean feat, if you’ve ever tried it!), deploy a VPN with our existing Puppet bootstrap tools, and then use the VPN to communicate with instances in our OpenStack infrastructure on their private network range. This model is how we currently interact with our instances on Amazon, so the ability to achieve this was fairly fundamental to our plans.
there’s much left to do
Proving a VPN network is only the beginning. We still have questions we need to answer:
How isolated should each OpenStack server be?Ideally there should be no common component, but this raises further complications on authentication and duplicated efforts. We may want to consider high availability components of OpenStack that replicate between Datacentres.
What authentication model should we use?Following on from the questions raised about isolation, a common and easily maintained authentication process should be implemented to lessen complexity for engineers.
How mature is OpenStack?Just last week a new release (Folsom) of OpenStack was released. It has a 6 monthly cycle and there are still many core features being added and changed. We need to pick a fixed point to base an infrastructure on, and the current deployment processes make ‘Essex’ an unsuitable candidate.
How to we replicate Amazon’s DNS magic?It’s such a blessing to be able to resolve hostnames to internal IPs in AWS when you’re connected via a VPN. It allows us to allocate a single hostname for services, which we can test/restrict both internally and externally. With servers on our own private cloud outside of Amazon’s network range, we can’t utilise this. We may need to manage our own internal DNS on OpenStack to replicate this feature.
How does S3 fit into all this?We make great use of Amazon’s storage solutions like S3 and Glacier, if we move away from Amazon, those are going to be services we can continue to use, but at an increased bandwidth cost. Do we opt to replicate them or take the hit?
Information Security Policies!More than anything though, the biggest challenge is to find a dedicated server provider who can meet your clients’ Infosec requirements.
so far so good
Overall though, the progress is promising. We’re working at a sensible pace for the size of our company, but I think so far all signs point to us being able to practically set up our own private cloud on dedicated servers. Being able to flexibly assign resources to instances, run your own hardware, and have full visibility over your infrastructure offers such big wins over Amazon, that we are committed to seeing just how feasible it all is. Tune in over the next few months as I continue to explore OpenStack, strengthening deployment processes, build cross-compatible tools, and negotiate infrastructure requirements with service providers.