Adam Horwich

of hybrid clouds and men

building a better future

Over the past 6 months, through various stages, I’ve been investigating the practicalities of moving away from a sole cloud provider and entering into the world of part physical, part cloud infrastructure. The design goal was a simple one, make the tools and interactions transparent for engineers and ops alike. The best way of achieving this is IaaS (Infrastructure as a Service) technologies like OpenStack and Eucalyptus, which I have rambled on about at great length, in both their glorious strengths and depressing shortcomings. The advantage of the physical infrastructure is that you can leverage more control over your instances and cut serious headroom in the cost of running servers. On paper it was brilliant, elegant, and achievable. Today I’ll be talking about why dreams != reality.

cloudy with a chance of doom

The two hot contenders to drive our physical infrastructure were OpenStack and Eucalyptus. CloudStack, while impressive, was ruled out at the time owing to compatibility issues with Ubuntu. This has since been addressed in the Apache CloudStack release and I have high hopes for the project in the future.

Unfortunately the message is the same here, these tools are not quite production ready. Well, not for small shops or small projects. There needs to be a lot of investment to get the services up to scratch and address the various high availability shortcomings. Both OpenStack and Eucalyptus are also heavily in development with new features and radical changes being made. It’s very easy to sit and wait for these to be released before committing to a long lasting relationship. Certainly for OpenStack, this was our overall conclusion, and with their 6 monthly release roadmap, this will likely be achieved within 6 to 18 months. Eucalyptus too are promising some nice new features, but this new 3.2.0 release is no longer compatible with Ubuntu..!

At version 3.2.0, Eucalyptus no longer supports installations on CentOS 5, RHEL 5, or Ubuntu. Eucalyptus has only tested and will only support CentOS 6 or RHEL 6.

of eggs and baskets

We quite like Amazon, despite all evidence to the contrary, so any radical overhaul of our infrastructure will likely involve Amazon quite substantially. Not just because of services like S3 and Route53, but Amazon also pose an ideal disaster recovery scenario in the event that our co-location provider of choice does not deliver in the way we’d hope. So crunching the cost saving numbers isn’t just about how much it would cost to run your servers on rented/owned colo boxes, but also bridging services between datacentres to meet SLAs. This principle is certainly more important when considering the smaller dedicated server provers where they don’t have the presence or scalability of say RackSpace. It just wouldn’t be prudent to jump in with both feet.

collect all the metrics

In order to actually determine how much this hybrid solution will cost, we need to start evaluating how much traffic our instances are sending and to where. Amazon helpfully recently introduced Cost Allocation Reports, which Tom made a post about last week, which allows us to discover per instance how much traffic is being sent outside of Amazon and how much between instances. It’s not perfect, as it’s hard to determine how much traffic is being sent locally within an availability zone but it certainly helps in generating ballpark estimates. The best way to leverage this information is through the cunning use of tags in the AWS EC2 console. By assigning unique names to instances and in our case secondary classification tags, we can discover how much bandwidth a service is using as well as an individual instance. Knowing this, we can then estimate how much traffic would need to travel between our physical and AWS infrastructures, and deduce what the forecasted bandwidth costs would be. The downside of course to multiple hosting providers is that you’re paying twice for bandwidth, as there’s the cost incurred with sending traffic out from Amazon but also the in/out costs of your physical service provider. It quickly begins to add up.

making the numbers work

To maintain that desired cost saving benefit, without much additional investment in devops insanity, there are concessions which would need to be made. For example, you may wish to eliminate active dependencies between providers. So you run a standalone service in your new provider and your downtime is the time it takes to spin up instances and restore from backups back in Amazon. Or, you perhaps batch your backups in an active/passive standby model so you’re only sending updates to your secondary systems in Amazon while production traffic and requests only go to your new infrastructure on physical hosts. There’s also the use of services like S3, which we use a fair bit, which would cost much more from a remote hosting provider. These things generally have economies of scale too, so the larger your infrastructure the better your cost savings can be.

resilience is futile

Well not quite, but from a cost perspective at least. You’re going to want to build your own service resilience into your new physical infrastructure, not just to cope with provider outages, but also because of the aforementioned OpenStack infancy issues. Additionally you’d still want to be running some of your infrastructure on Amazon, so you’re not quite replacing like for like, not until you’re happy and confident with your new provider at least.

adding it all up

When you see your perceived instance cost savings of running your own instances versus AWS instances, it’s easy to get excited. But adding in the extra bandwidth charges (which originally were a minority of costs), and building in extra resilience, you’re suddenly reaching the original costs of just sticking on Amazon. Certainly, the Amazon pricing model might just be more shrewd than it first appears. With the uncertainties identified in cloud technologies and inexperience with physical hosting providers, the conclusion we reached last week was to put investigations on hold for now. We intend to re-evaluate the market in roughly 9 months time, by which point we should see exciting developments in IaaS providers and hopefully more cost effective dedicated server providers.

blog comments powered by Disqus