Thomas Maroulis

tuning cassandra, an intro of sorts

As you might have been able to glean from this, and that, and the other, and also this one, and… (I’m going to stop now), here at MetaBroadcast we use the distributed key-value store Cassandra to persist our data. Cassandra is a fantastic tool for the job and with it we’ve been able to achieve API response times that Mongo can only dream about. That said like all tools Cassandra has its quirks and peculiarities that you need to be aware of and it needs some TLC or the way it’s best known, tuning.

This is a complicated and long subject that we will likely revisit at some point in the future. For now I will try to give you an introduction to some of the things to be aware of, how to actually test various configurations and generally how to get started on the road to getting more performance out of your cluster.


Let’s start with the obvious. Cassandra is touted as having fantastic write performance and if you look at write latency graphs you will note that that is not an idle boast. The trick of course is that to be able to achieve that great write performance Cassandra “cheats” (I use the term extremely loosely) by doing only part of the necessary work at the time of the write. New data is written first to the commit log, then when the commit log is flushed to disk to an in memory structure called a memtable. When that is full it is in turn flushed to disk to a data structure called an SSTable that is immutable. If a new write happens on the same row after the SSTable has been created then that data will end up on a different SSTable.

Due to this when executing a read Cassandra may have to combine row fragments from multiple SSTables and also from any unflushed memtables in order to get the most up to date version of the row. Suffice to say this degrades read performance and also ends up using more disk space since you could be saving multiple versions of the same column that end up overwriting each other. In order to deal with this and ensure the number of SSTables to be read for a single operation is not unbounded Cassandra periodically performs an operation that compacts multiple SSTables to a single new one.

Compactions are not the only consideration when tuning Cassandra, but they are often the most important one to keep in mind.

There are different compaction strategies that you can choose from. Depending on the nature of your data each have their advantages and disadvantages. In this post we are not going to discuss different strategies however. Rather I will direct you to this excellent article from the official blog that compares the Size-Tiered and Levelled Compaction strategies. Instead I will talk a bit about how you test different strategies and compaction configurations.

one test is worth a thousand expert opinions, aka write-survey mode

We could talk for hours about the differences between various compaction strategies and compaction configuration options, but in the end of the day the only way to be absolutely sure that you have a configuration that is appropriate for you is to test it for your data model and under a representative load.

One way to accomplish this is by setting up a test environment that is modelled to simulate your production environment. This has the downside that you have to spend a not insignificant amount of time ensuring the simulation is accurate enough. With enough effort you will be able to minimise the risk that your test results will be skewed by discrepancies between the prod and test environments, but you will never be able to get it to 0.

Unfortunately the only other alternative is to run your tests directly on your production environment which unless you are really confident in your infrastructure’s resilience to roaming chaos monkeys has its own, more often than not unacceptable, risks.

Likely for us, the guys at Cassandra have provided us with a third much more palatable alternative. Enter write-survey mode. Under this mode a new Cassandra node will do all the normal steps a node does when joining the cluster except the last one. It will not announce itself as a full member and start serving reads. This allows you to setup a new node that will receive realistic write traffic thus letting you test the node’s ability to cope with it, but which will never serve reads. Therefore if the node falls over on its head the cluster’s ability to serve production traffic is not impeded and you don’t have to worry about your experiments impacting production systems. The impact of this node on the rest of the cluster is very light since all the other nodes have to do is to forward a subset of data to the new node.

For more details about write-survey mode you can have a look here.

That is all for this week. In this post we’ve covered the basics about what compactions are, how they work and how you can experiment with your cluster without PagerDuty having a fit and the systems engineers getting a nervous breakdown. When we revisit this we will move into more meaty subjects, namely what kind of data Cassandra is good at handling and how to tune it so it can handle a load that does not quite fit into that “ideal load” mould.

If you enjoyed the read, drop us a comment below or share the article, follow us on Twitter or subscribe to our #MetaBeers newsletter. Before you go, grab a PDF of the article, and let us know if it’s time we worked together.

blog comments powered by Disqus