Thomas Maroulis

testing the untested

One of the most challenging aspects of testing software is when you need to add unit tests to a codebase that up to that point was almost entirely untested. As we’ve discussed before, one of the important premises of testing is to improve the quality and structure of your code by virtue of pushing you towards writing code that is testable. In other words by creating a feedback loop that rewards code with low coupling and high cohesion.

While it does not axiomatically have to be this way, a commonly observed pattern in the wild is that untested code also tends to display a high degree of coupling. This creates a negative feedback loop of its own where code is untested, which causes high coupling, which makes it harder to add tests to it later and so on.

I should clarify here what I mean by untested. After all, a codebase can simultaneously be well covered by integration and end-to-end tests and still be structured poorly. An end-to-end test makes no assertions about how different components of the code connect to each other except one, that for this particular test case the way they are connected does not produce an error. Unfortunately, that assertion is not strong enough to also guarantee maintainable code. To accomplish that we would need unit tests. I am not going to expand on the reasons for this since that is beyond the scope of this article. If you are unfamiliar with the idea of the test pyramid or about why just using E2E tests is a bad idea I have linked a few very good articles at the end of this that I would strongly recommend.

dealing with the untested

While everything I have written so far can be (correctly) interpreted as an urging to write your tests at the same time as your code and to use tests to help shape the code’s design this is insufficient by itself. The reality of the job is that we all have to deal with untested code at some point. It might not be our own, though I would challenge anyone that claims to have never written untested code, it might be some legacy system from a long time ago that you’ve been asked to bring up to standards or it might be something else. Whatever the circumstances this is something that we have to deal with.

The challenge comes from having to break out of the negative feedback loop and also do so in a way that is safe and does not introduce bugs. This is further hampered by the fact that highly coupled, untested code tends to be quite brittle so we find ourselves in a situation where we need to make changes to the code to make it testable, but can’t make changes without the risk of inadvertently breaking something. The exact negative feedback loop we are trying to escape.

There are a few ways of dealing with this.


One technique that can be very effective is to use refactoring tools that rely on static analysis of the code to do potentially wide-reaching, but safe changes. Using such tools we can make a sequence of safe, incremental refactors that gently coax the code into a state that is at least partially testable, add some tests and then iterate until we bring it to a good state. This technique can be very powerful and surprisingly quick to apply, but it comes with some downsides. The first is that it can be quite complex and has a bit of a steep learning curve. You need to have a high degree of confidence both in your ability to apply the technique and in the tools you use and as such it’s something that in my experience of trying to teach it needs a bit of practice before developers feel comfortable enough to start using it in earnest on critical code.

The second disadvantage comes from using unit tests as stepping stones between refactors. Unit tests tend to be coupled with the interface of the unit they are trying to test. If in the course of the refactor we start changing those interfaces then we are going to invalidate a lot of those tests meaning that they will end up as throwaway effort. Worse, since we are actively invalidating the tests we won’t get any benefit from them while changing those interfaces which is a potentially risky change that we’d want a safety net for.

For a more in-depth look into some specific techniques for adding tests to untested code I would recommend Working Effectively with Legacy Code by Michael Feathers which I have linked to below.

using end-to-end tests as a safety net

Another, complementary and very useful technique is to use E2E and integration tests. As discussed previously these kinds of tests make few to no assertions about what the internal structure of the code looks like. In the context of using tests to drive code design this is a bad thing, but in this context it is actually an advantage since those tests will never be invalidated so long as we don’t make any business logic changes (which we shouldn’t in a refactor anyway). We can therefore write such tests in the beginning to create an initial safety net around the code before we start doing potentially risky refactors.

This naturally leads to the following process:

  1. Write high level E2E tests around the legacy code
  2. Refactor until we have a set of testable components
  3. Unit test those components
  4. Repeat from 2.

effective end-to-end tests

Another challenge that comes up with this is that it might be difficult to add effective E2E tests to a legacy codebase unless you already have a very good understanding of the expected behaviour of this service and that of any other service it may depend on. Tests also serve to document expected behaviour so this is another area where the lack of them becomes an obstacle.

I have recently been experimenting with a tool that I had recommended to me by Tommy Situ that seems to offer an elegant solution to this problem in the context of HTTP based micro-services. This tool is called Hoverfly and it’s an HTTP proxy written in Golang that can be configured to simulate an API. There is then a jUnit library for it that allows you to write tests that transparently use the Hoverfly server for any HTTP calls they may be making.

The most interesting aspect of it for me is that it’s very easy to switch it from simulating an API to actually letting your code talk to the real API and just have it record the exchanges. So you can write the tests with Hoverly configured in capture mode, let it record the exchanges and then switch it to simulation mode to decouple your tests from those APIs. This way you can write tests around your service without needing full knowledge of the expected behaviours and then in turn use those tests as part of characterisation testing to improve your understanding of it before the refactoring itself.

That’s all for this week. I hope this was interesting and I would love to hear your thoughts about refactoring legacy code.

further reading

If you enjoyed the read, drop us a comment below or share the article, follow us on Twitter or subscribe to our #MetaBeers newsletter. Before you go, grab a PDF of the article, and let us know if it’s time we worked together.

blog comments powered by Disqus