Garry Wilson

making kubernetes reliable

We’re now well on our way towards migrating our infrastructure into Kubernetes. We’re working on defining all of our services as Deployments, including Atlas and now Voila. This also involves us creating Docker images of each service, meaning we have a simple container that we can deploy anywhere, and which should work reliably.

While the services are being created, I’ve been working on ironing out the details on how to make Kubernetes reliable and highly available.

multiple masters

As with any critical part of our infrastructure, we want to make sure there is more than one instance of each component running at a time. Since AWS offer us three availability zones, we’ll have a Kubernetes master instance in each zone. We’ll also spread the Kubernetes worker instances (the ones which actually run the Docker images) across zones, too.

In order to cluster Kubernetes correctly, and avoid split-brain scenarios in which two masters are trying to make changes independently of each other, it’s important to know what’s being installed on each master:

  • etcd2 — this is where Kubernetes stores configuration. We need 3 nodes to achieve quorum.
  • kube-apiserver — listens for API commands and manages the etcd2 configuration.
  • kube-controller — ensures the correct number of pods of each service are running.
  • kube-scheduler — places new pods on suitable worker instances.

clustering etcd2

Being the main data store that drives Kubernetes, and also the Flannel networking layer on top of containers, etcd is the single most important component in the cluster. It’s also one of the very first services we set up on the masters, as most of the other services depend on it.

kubernetes components

The Kubernetes API server, controller and scheduler are all actually running within Kubernetes as Pods. This means each benefits from automatically recovering from errors, and reduces the number of different systems we have in place.

The API server is stateless, writes any configuration to etcd2 immediately, and is queried by the other components. This allows us to run three copies of it at any one time without fear of interference. In fact, we use the flag apiserver-count=3 to make it aware there are others performing the same role.

The controller and scheduler are a little different. As they’re responsible for keeping the correct number of pods in the correct places, only one scheduler and one controller pod should be in control at a time, so there’s no conflict. Again, a simple flag (leader-elect=true) allows us to set up leader elections, where etcd will keep track of which pod’s currently managing things.

getting ready to roll

There will be more blog posts as we get closer to switching over to Kubernetes. Not only is it going to improve almost all aspects of our infrastructure, but it’s exciting to work with and get to use.

If you enjoyed the read, drop us a comment below or share the article, follow us on Twitter or subscribe to our #MetaBeers newsletter. Before you go, grab a PDF of the article, and let us know if it’s time we worked together.

blog comments powered by Disqus