Thursday, November 15, 2007

Demystefying Stubs, Fakes, and Mock Objects

This blog entry is my attempt to explain the different kinds test-only objects that can be substituted for the real thing in automated tests: Stubs, Fakes, and Mocks. It can be easy to get confused, so I hope this entry helps!

Stubs, fakes, and mocks are all objects that replace the objects that would normally be used in the actual application. What purpose do they serve? First, using such test-only objects promotes good design. Writing code so that classes have dependencies on interfaces and abstract classes which can be implemented either by the full-fledged production code or by a test-only class of some kind helps to reduce the overall coupling in an application. Test-only objects frequently come up when an application is accessing another system: Making sure that system is available and produces the same response to a given input all the time may be a problem (consider a service that provides the current price for stocks for example). Also, simply configuring something like a relational database or a web server so that tests can run against it can be an issue. More generally, using a test-only object makes it easier to set up the initial conditions for tests. Such objects can make it simpler to provide the initial state that the function under test will respond to. The time it takes for tests to run can also be an factor; using simple test-only objects can speed up the time it takes for tests to run substantially. Finally, in some cases one may want to develop some logic that depends on classes that haven't been written yet or that you don't have access to. Say for example another team is working on that functionality and you don't have access to their code. In such cases, you can write code against objects that, for the purposes of testing your own logic, implement the interfaces you expect to see in the yet-to-be-written API. The core idea behind the use of test-only objects is that we often want to write tests for the application logic we're currently working on without having to worry about th behaviour of some of the external code our logic is using.

Let's consider an example. I once wrote a program that allowed a user to schedule polling of different sensors - e.g. read sensor 'A' once an hour, sensor 'B' every minute, and sensor 'C' every second. I wanted to test my scheduling logic, but I wanted to make it independent from the actual sensors: I didn't want to have my application connected to real sensors just to make the my unit tests run. Also the scheduling system of course obtained the computer's system time to see whether it was time to read a given sensor. I also wanted my tests to be independent of the actual system time: What I wanted was to set the time in my test and make sure my program responded by reading the correct sensors - and avoided reading the wrong ones. I didn't want to have to set the actual computer's system time inside of my tests! So, for both of these cases, the business of actually connecting to the sensors and setting and reading the time, I created special test-only objects to stand-in for the ones that would be used in the actual application.

If you're interested in ways to instrument your code, and the trade-offs involved, so that you can substitute these kinds of test-only objects Jeff Langr's Don't Mock Me article is a great reference. I should note that his use of the word "mock" is more generic than the one that's often used. He means "mock" in the general sense of any kind of test object. As we'll see a bit further down, mock objects often have a specific meaning that's different from stubs and fakes.

Now, on to Stubs, Fakes, and Mocks.

Stub: A stub is an object that always does the same thing. It's very simple and very dumb. In our example above of polling sensors, the system time seems like a useful entity to replace with a stub. Let's suppose our scheduling code was in a class called Scheduler. This class might have a method called getSystemTime(). For the purpose of testing, we might create a TestingScheduler class that extends Scheduler and overrides the getSystemTime() method. Now you can set the system time in the constructor of this test-specific class, e.g:

public class TestingScheduler extends Scheduler {
public TestingScheduler(int timeInMillisForTest) {
this.timeInMillisForTest = timeInMillisForTest;
}

public int getSystemTime() {
return timeInMillisForTest;
}
}

When a TestingShcheduler object is used as part of a test, the rest of the Scheduler logic works normally, but it's now getting the time that's been set in the test instead of the actual system time.

Fake: A fake is a more sophisticated kind of test object. The idea is that the object actually displays some real behaviour, yet in some essential ways it is not the real thing. Fakes can be used to speed up the time it takes tests to run and/or to simplify configuration. For example, a project I am currently working on is using Oracle's Toplink as an object-relational mapper (ORM). This allows data in Java objects to be transparently saved to and retrieved from a relational database. To make tests that use this framework run faster, a much simplified memory-only implementation of Toplink's interfaces was implemented. This version doesn't know about transactions and doesn't actually persist data, but it works well enough to allow many of our tests to run against it - and since the actual Oracle database isn't involved, the tests run over an order of magnitude faster. Going back to the scheduler example, we developed a piece of software that could behave as though it was a real sensor. That way we were able to run a variety of fairly complicated tests to make sure our application could communicate with sensors correctly without actually having to hook up the tests to a real sensor. Any time you write code that simulates an external service - some sensors, a Web server, or what have you, you're creating a fake.

You can find a simple example of a fake in the TestNode class in my loop finder example. The TestNode implements the Node interface for the purposes of the unit tests. Classes that are actually part of the application have their own, more complex implementation, of this interface - but we're not interested in testing their implementation of the Node interface here. This allows us to write tests that can run in isolation from the rest of the application. From the perspective of the overall design, this approach helps us to reduce coupling between classes. The LoopFinder class only depends on the Node interface rather than on any specific implementation. That's an example of how making code easier to test concomitantly improves the design.

Mock: Mock objects can be the most confusing to understand. First of all, one can argue that the two types of test classes mentioned above are mocks. After all, they both "mock out" or "simulate" a real class. In fact mocks are a certain kind of stub or fake. However, the additional feature mock objects offer on top of acting as simple stubs or fakes is that they provide a flexible way to specify more directly how your function under test should actually operate. In this sense they also act as a kind of recording device: They keep track of which of the mock object's methods are called, with what kind of parameters, and how many times. If your function under test fails to exercise the mock as specified in the test, the test fails. That's why developing using mock objects is often called "interaction testing." You're not only writing a test which confirms that state after a given method call matches the expected values; you're also specifying how the objects in the function under test, which of course have been replaced with mocks, ought be exercised within a given test.

To sum up: A mock object framework can make sure a) that the method under test, when executed, will in fact call certain functions on the mock object (or objects) it interacts with and b) that the method under test will react in an appropriate way to whatever the mock objects do - this second part is not any different from what stubs and fakes offer.

We've already seen how stubs and fakes can be used, so let's create a hand-rolled example of the kind of thing that mock object frameworks can help with. Let's go back to the scheduler we've already talked about. Let's say the scheduler processes a queue of ScheduledItem objects (ScheduledItem might be an interface) . If it's time to run one of these items, the scheduler calls the item's execute method. In our test, we can create a queue of mock items such that only one of them is supposed to be executed. A simple way of implementing this mock item might look something like this:

public interface ScheduledItem {
public void execute();
public int getNextExecutionTime();
}

public class MockScheduledItem implements ScheduledItem {
private boolean wasExecuted;
private int nextRun;

public MockScheduledItem(int nextRun) {
this.nextRun = nextRun;
}

public void execute() {
wasExecuted = true;
}

public int getNextExecutionTime() {
return nextRun;
}

public boolean getWasExecuted() {
return wasExecuted;
}
}

Our test might look something like this:
public void testScheduler_MakeSureTheRightItemIsExecuted() {
//setup
MockScheduledItem shouldRun = new MockScheduledItem(1000)
MockScheduledItem shouldNotRun = new MockScheduledItem(2000)
Scheduler scheduler = new TestingScheduler(1100);
scheduler.add(shouldNotRun);
sheduler.add(shouldRun);

//execute
scheduler.processQueue();

//verify
assertTrue(shouldRun.getWasExecuted());
assertFalse(shouldNotRun.getWasExecuted());
}
That's a really simple, hand-rolled, example of a mock object. The test just makes sure that the processQueue method ran the execute method on the first item, but not for the second one. Of course this example is very simple. We could make it a little fancier by counting the number of times the execute method is called and make sure it's only called once during the test. Then we could start to implement functionality that makes sure functions belonging to a given mock object are called with particular arguments, in a particular order, etc. Mock object frameworks support this kind of functionality out of the box. You can take any class in your application and create a mock version of that class to be used as part of a test. There are a bunch of mocking frameworks for many different programming languages.

Before you dive in, consider my word of caution: In the great spectrum between pure black-box and pure white-box testing, using mock objects is about as "white-box" as it gets. You're saying things along the lines of "I want to make sure that when I call function X on object A (the function and object under test), that functions Y on object B and function Z on object C will be called in that order, with certain specific arguments." When the test you write is making sure that something *happened* as a result of your test, it tends to be easier to understand what the test is trying to do. On the other hand, if your test is just making sure that some functions were called, what does that really mean? Potentially very little. Also, because your mock objects are basically fakes or stubs, you are not guaranteed that the behaviour of the actual objects that are being mocked out will be consistent with the mocks. In other words, you can create a mock version of an object that adds one to its argument whereas the real function subtracts one. If you change the behaviour of a given function that is being mocked in a test somewhere, you have to be careful to make sure to adjust the mock accordingly. If you don't, you'll wind up with passing tests, but you may still have introduced a bug into the application. This kind of problem tends to become more likely as the sophistication of the fake implementation increases - and pure fakes also suffer from the same weakness. I do think that creating mock tests where the specified interactions become complicated and the mock itself is a sophisticated fake that can respond to a wide variety of interactions compounds the likeliness of running into this kind of problem. Also, on a more basic level, simply refactoring code can be difficult with mock objects. The mock frameworks sometimes use strings to represent the mock object's methods internally, so renaming a method using a refactoring tool may not actually update the mock, and your tests would suddenly fail just because you renamed a function. Of course even a slightly more complicated refactoring, like breaking up a method into two can also cause mock objects to fail trivially, telling you that yes indeed, you actually changed some code.

As you can tell, I am not a huge fan of extensive use of mock objects in the sense of specifying interactions. I believe that such objects can indeed be useful in specific cases, but that's not how I think about writing my code. When I write a test, I try to keep it simple and concentrate on what I can expect to happen as a result of running that test, not specifically what the execution path of the function under test will look like. There are of course cases where this type of interaction testing is useful. I think the scheduler example above is a good case in point. You want your test to make sure a method is called, but thats it; you're not interested in what the real implementation of that method may do. All in all, I tend to prefer to stick with simple hand-rolled stubs, fakes, and mocks in my TDD practice. Your mileage may vary. Martin Fowler has written about the distinction betwen mocks and stubs/fakes also.

I know that when I first encountered "mock objects", I had some trouble figuring out exactly what it meant and what all the fuss was about. If you've found this blog entry because you were experiencing the same confusion, I hope it's been of some help.

Additional Links:

9 comments:

Steve Freeman said...

Of course I'd say this, but I don't think you've understood mocks at all. On the contrary, we tend to think of interaction testing as Black Box -- all you can see are the object's interactions with its neighbours, not its internals. It's like adjusting the volume on my hi-fi, I can see that the output gets stronger but I don't see how.

Writing any kind of unit test in isolation, interaction-based or otherwise, does not tell you that your system is working. We always start a task with a higher-level test to drive the functionality and to make sure the pieces hang together. Unit tests are there to help you out with the implementation, not a measure of progress.

I suggest you read a bit more widely, such as the explanation in Dave Astels' book, or our own Mock Roles, Not Objects paper. There's also a lot of guidance in our series of Test Smells postings.

Vladimir Levin said...

Thanks for the input, Steven. The distinction between white box and black box testing is probably not very interesting. What I call white box, you may call black box, but I don't think that's really what matters. Anyhow, I think I know what you mean, but still, you're writing tests that address the fact that specific functions are getting called inside the method under test. That's white box in my book.

In any case, the distinction between top-down vs. bottom-up is probably more significant. Both ways of doing things have their issues, but I personally believe bottom-up is fundamentally a more robust and less risky approach. That's why I prefer more traditional TDD and only use mocks sparingly. If I am writing tests for class A and I realize I'll need to implement class B, I stop working on class A and start using TDD to create class B instead. Once that's done, I got back to using TDD for class A, but I don't mock out the calls to class B.

When a piece of software cleaves naturally into modules with a very clear interface between them so that one of them can in essence be treated as a library for the other, I consider that to be a resonable candidate for mocking. Still, I wouldn't be in favour of writing tests using mock objects as a matter of course.

In general I think making assumptions about what the interface should look like in the absence of real functionality over any kind of prolonged period of time dangerous.

I will however have a look at the links you've provided. Cheers!

Steve Freeman said...

Well, I can always hope to "turn" you :)

For myself, I often find myself building the wrong objects what I go bottom-up. It turns out that my design assumptions were wrong.

The point about discovering interfaces is to write them from the perspective (and in the domain) of the client, so that the client code is consistent. The real functionality is driven by the relevant acceptance tests. I don't find this dangerous.

Thanks

S.

Vladimir Levin said...

Steven,

Ok, I've read the PDF and skimmed the testing smells entry. I would rather e-mail this to you, but nowhere on your blog or company web site can I find contact information! If you read this comment, feel free to post or e-mail a reply.

I can understand developing an interface "from the client's point of view" at a very high level - say with each method in some class corresponding to an XP story or task, or a RUP use case. However, I don't feel so comfortable digging ever deeper into the logic of the application with code that's built on sand. I like the feeling when I've got another test passing that I've "accomplished" something. I can't really see getting the same feeling after writing some mock tests. I try to combine top-down and bottom-up thinking by writing tests at a higher level, commenting them out when I discover the need for some lower level functionality, and then getting back to them when I finish developing the lower level code.

I was able to bootstrap myself into "traditional" TDD more or less on my own - using a combination of reading and just trying it, often on small projects. However, because mocks don't really appeal to me on an intuitive level, I don't think it is likely I'll be able to "get it" by trying it by myself. Even looking at the examples in the PDF, I find them very abstruse. My eye doesn't fall easily on what the damn method under test is, and the tests read (to me) rather like truisms. I feel like asking, why not write the code first and have the testing framework generate the mock tests *from* the code. It would seem easier... I do always try to have an open mind though. If you are aware of any good workshops, I'll try to attend one in the next year. Perhaps I will understand this kind of approach better in the context of a hands-on tutorial.

Also, if you can point to a good tutorial that shows how to write a complete, but small, project with this "mock-driven" approach. I would be interested to read it. I TDD'd a sudoku solver (http://www.geocities.com/vladimir_levin/sudoku.html). Is there something like that I could read about with respect to mocks?

Steve Howell said...

I'd be interested in specific examples of where bottom-up design leads to creating the wrong objects.

Steve Freeman said...

Sorry for the delay (and I've fixed my website).

The way we approach it is to take very thin slices through the system. We'll start with some kind of end-to-end acceptance test, which keeps us honest and pointing in the right direction. Then we usually work our way "in" from wherever the relevant behaviour is triggered.

So, although it may feel like building on sand, we don't build much before we hit bedrock (to stretch the metaphor). In the process of working our way down, we might write some interaction or some state tests, it all depends.

We've been through another generation since the paper and jMock2 is, we think, a little easier to read. Unfortunately, there's a limit to what we can do with a language as clunky as Java, so every framework will take a little effort.

The line about truisms is interesting. Don't forget that the code didn't exist before we stated the obvious, and a system made up of obvious things is something we aspire to.

I think generating useful tests from code is a research project, it's been keeping the founders of Agitar busy for years.

Keep an eye on Agile2008. We've submitted a tutorial.

We're working on writing up a soup-to-nuts project, but it's taking too long :)

S.

Steve Freeman said...

@Steve Howell

I can't think of an example off the top of my head, but I know I've done it. "I'm really going to need a Customer class. Oh, it's an Account".

Maybe it's just a matter of where you do your refactoring and it's just part of the cost of learning about the domain.

Vladimir Levin said...

Thanks Steven!

I think we definitely agree on one very important point: It's a good idea to develop an application in "vertical" slices - i.e. develop features one at a time in their entirety. Both my approach of developing these slices more or less bottom up and your approach of defining interfaces and drilling in from the top-down are probably safe enough in this context. In both cases there is plenty of time to review the code and the design before moving on to the next feature. I think the traditional bottom-up approach that I tend to favor is a bit more conservative. However, based on what I've read, I am willing to say that the mock-driven way can likely work too, even though it doesn't feel "right" to me. I'd love to see some worked examples of real programs that were written in this way. Sometimes it's useful to *develop* intuition using a particular technique.

Anonymous said...

Thanks for sharing this link, but unfortunately it seems to be offline... Does anybody have a mirror or another source? Please reply to my post if you do!

I would appreciate if a staff member here at vladimirlevin.blogspot.com could post it.

Thanks,
John