Our team is currently working with a client on a medium sized, medium complexity Java application which has quite low test coverage. We are introducing characterisation tests to snapshot functionality. These will give us the confidence to refactor away technical debt and extend the application without regression. One of the problems we are experiencing is the concurrent nature of the application. I have have worked on applications in the past which supported very high concurrency without issue but this application is different. I have not fully thought through why this application does differ but there are some obvious points:
- This application spawns threads in Java code a lot. In previous applications we have always avoided this complexity by utilising somebody else’s thread pool code.
- I am used to stateless service classes which operate on domain objects. The stateless service classes obviously have no concurrency issues and the domain objects can be protected using synchronisation blocks. This application seems to have a lot more stateful objects that interact (this is anecdotal, I am have not analysed the code specifically for this attribute).
One of the first refactorings we are looking at is to remove all the Thread.sleep calls from test classes. The CI server reports significant number of test failures which turn out to be false positives. In a significant number of cases the use of Thread.sleep is to blame. I have seen two slightly different uses of Thread.sleep in the test code.
- The test spawns a thread which is calling some method of the class under test whilst the main test thread interacts with the class under test in some other way. The test thread calls Thread.sleep to ensure that the second thread has time to complete its processing before the test verifies the post conditions.
- The class under test contains some internal thread spawning code. The test thread again needs to execute a Thread.sleep to remove the chances of a race condition before firing the asserts.
Both these approaches suffer from the same problems.
- The Thread.sleep might be long enough to allow the second thread to complete processing on one machine (e.g. the developers high spec workstation) but it is not long enough to allow the thread to complete its processing on a heavily loaded, differently configured, usually more resource constrained CI server. Under certain load situations the test fails. It works in others. The use of Thread.sleep has made the test non-deterministic.
- Often the response to the above problem is to make the sleep longer. Yesterday I saw a very simple test which took over thirteen seconds to execute. Most of that test duration was sleeps. Refactoring to remove the sleeps resulted in a test that executed in 0.4 seconds. Still a slowish test but a vast improvement. The last application I worked on had 70% coverage with 2200 tests. If each one had taken thirteen seconds to execute then a test run would have taken almost eight hours. In reality that suite took just over a minute on my workstation to complete. You can legitimately ask a developer to run a test suite which takes one minute before every checkin and repeat that execution on the CI server after checkin. The same is not true of a test suite that takes eight hours. You are probably severely impacting the teams velocity and working practices if the build before checkin takes eight minutes. There are very few excuses for tests with arbitrary delays built into them.
To resolve both issues we introduce a count down latch.
Where the test spawns a thread, the latch is decremented inside the spawned thread and where the test code had a sleep a latch.await(timeout) is used. We always specify a timeout to prevent a test that hangs in some odd situation. The timeout can be very generous, e.g. ten seconds where before a one second sleep was used. The latch will only wait until the work is done in the other thread and the race condition has passed. On your high spec workstation it might well not wait at all. On the overloaded CI server it will take longer, but only as long as it needs. A truly massive delay is probably not a great idea as there is a point where you want the test to fail to indicate there is a serious resource issue somewhere.
Where the class under test spawns a thread (an anti-pattern I suspect) then we amend the code so it creates a latch which it then returns to callers. The only user of this latch is the test code. Intrusive as it is, it is often the only way to safely test the code without more significant refactoring.
There are some larger issues here. Is the code fundamentally wrong in its use of threading? Should it be recoded to use a more consistent and simple concurrency model and rely more on third party thread pool support?
At risk of straying from my comfort zone of simple, pragmatic, software delivery, deep down I have never been very happy about the implications of complicated multi-threaded code and automated testing. You can write a class augmented with a simple and straightforward test class which verifies the classes operation and illustrates its use. You can apply coverage tools such as Emma and Cobutura which can give a measure of the amount of code under test and even the amount of complexity that is not being tested. I am not convinced it is always possible to write simple tests that ‘prove’ that a class works as expected when multiple threads are involved (note I say always and simple).
I do not know of any tools that can give you an assurance that you code will always work no matter what threads are involved. Perhaps a paradigm shift such as that introduced by languages such as Scala and Erlang will remove this issue?
There is some good advice available regarding testing concurrent code and I am sure lots of very clever people have spent lots of time thinking this through but its certainly not straight in my head yet.