Use Fakes, Not Mocks

TLDR: Unit tests should use fakes and not mocks when an external dependency is involved. A fake simulates the real external dependency whereas a mock only looks enough like the real thing to work in a test. In general, unit tests should almost always use fakes.

Ideally, unit tests would just check that functions return the correct values for some given input. However in practice, many unit tests check functions whose return values depend on some state. When the state is stored in an internal dependency, usually in the form of a mutable object, then the unit test can be constructed by simply setting the state to the required starting value at the beginning of the test.

The following is a simple test case where all of the state is stored in Person instances. This test just checks that the has_birthday method correctly mutates the state of the Person instances.

class Person:
    def __init__(self, name: str, age: int) -> None:
        self.name = name
        self.age = age

    def has_birthday(self) -> None:
        self.age += 1

def test_birthday() -> None:
    alice = Person("Alice", 42)
    assert alice.age == 42
    alice.has_birthday()
    assert alice.age == 43

    bob = Person("Bob", 101)
    assert bob.age == 101
    bob.has_birthday()
    assert bob.age == 102

But when the state is stored in an external dependency, something else must be done. Unit tests should be deterministic, so any time an external dependency is introduced, there is a chance that the tests will not execute exactly the same way each time. The most common external dependency is an API that is accessed over a network. You should avoid calling the API, even if a sandbox endpoint is provided. ( Note: This is only the case for unit tests. A service like this has to be called in integration tests. ) This is because the endpoint is not reliable. If the endpoint has an outage, the tests cannot run and so a valid PR cannot be merged into main. Large teams with large test suites can easily overwhelm a service with test only traffic; it is a bad idea to have to limit the size of the test suite to keep an external service running.

These external dependencies can also be less obvious. Any time a program needs to interface with the operating system on which it is running, it is also calling an external dependency. Unit tests that write to the local file system fail when running on a machine without any spare room on the disk. Unit tests that require open a local port require that the kernel allow them local network access, which is not guaranteed in all containerized environments.

But the chunks of code that call external dependencies, big or small, still need to be tested. To accomplish this, programmers usually create a test-only version of client API. There are broadly two flavors of test-only clients:

Mocks: A mock client implements the same function signatures as the real client, but the functions only return values that are useful for test code.
Fakes: A fake client also implements the same functions signatures, but as far as the calling code is concerned, a fake should work exactly the same as the real client.

More specifically, both a mock and a fake would have the same function signatures, but where a mock would implement just enough functionality to run some tests, the fake would appear to work exactly the same as the real API. A fake avoids calling the real dependency by simulating it internally. That way, the tests can be deterministic.

It is worth noting that some authors use the terms “mocks” and “fakes” interchangeably. In this write-up, I am using a narrower definition of both terms than those authors. When authors do make a distinction between mocks and fakes, their definitions usually line up with the ones used here.

With that in mind: You should always prefer fakes to mocks.

A typical mock may hard code the response to a given request. This would be useful if a unit test needed to call an external dependency, and the author needs to be sure that the same response will be returned each time.

Say we introduce a Redis-like key value store. We do not need to implement it, but say its client looks something like the following:

from typing import Optional

class KeyValueClient:
    def set(self, key: str, value: str) -> bool:
        """Return true if the operation is successful."""
        # implementation omitted

    def get(self, key: str) -> Optional[str]:
        """Get the value stored at a given key if it exists and no error occurs."""
        # implementation omitted

Say we then wanted to store each Person instance in the above example in the key-value store.

from hashlib import sha256
from json import json_dumps, json_loads
from typing import Optional

def rehash(x: str) -> str:
    m = sha256()
    m.update(x.encode("utf-8"))
    return m.hexdigest()

class Person:
    def __init__(self, kv_client: KeyValueClient, name: str, age: int) -> None:
        # set member variables
        self.name = name
        self.age = age

        # calculate a unique(ish) key for the person
        self.__key = rehash(name)

        # store the data at the key
        self.__client = kv_client
        value = json_dumps({"name": self.name, "age": self.age})
        assert self.__client.set(self.__key, value)

    def has_birthday(self) -> None:
        self.age += 1
        value = json_dumps({"name": name, "age": age})
        assert self.__client.set(self.__key, value)

def load_person(client: KeyValueClient, name: str) -> Optional[Person]:
    value = client.get(rehash(name))
    if value is None:
        return None
    parse = json_loads(value)
    return Person(client, parse["name"], parse["age"])

The same tests could then be re-implemented by adding a mock.

class KeyValueMock(KeyValueClient):
    def __init__(self) -> None:
        pass

    def set(self, key: str, value: str) -> bool:
        return True # hard coded

    def get(self, key: str) -> Optional[str]:
        if key == rehash("Alice"):
            return "{\"name\": \"Alice\", \"age\": 43}"
        return None

def test_birthday() -> None:
    client = KeyValueMock()

    alice = Person(client, "Alice", 42)
    assert alice.age == 42
    alice.has_birthday()
    assert alice.age == 43

    loaded_alice = load_person(client, "Alice")
    assert loaded_alice.age == 43

In this example, both the value of the mock client’s set and get methods are hard coded. In particular, a mocked set call will never fail, but it also does not actually mutate any state. Likewise, as long as the caller queries the mock client for Alice’s key, the client will also return a JSON blob with Alice’s age set to 43. This is because in the test code, the blob stored in the key-value store is only loaded when the load_person function is called. At that time, the calling code is expecting the age to be set to 43, so it is simply hard coded.

However, even this simple example demonstrates some of the problems with mocks. Because mocks hard-code the return values, any time a new test case needs to be added, the mock client will need to be updated. The above example does not include a “Bob” test. To add it, we would need to add a second if to the KeyValueMock implementation’s get method.

class KeyValueMock(KeyValueClient):

    # other methods omitted

    def get(self, key: str) -> Optional[str]:
        if key == rehash("Alice"):
            return "{\"name\": \"Alice\", \"age\": 43}"
        if key == rehash("Bob"):
            return "{\"name\": \"Bob\", \"age\": 102}"
        return None

To avoid an ever growing get method, we could instead write a different mock client for each of the tests, but then, instead of maintaining a single monster-method, the author is not maintaining a bank of mock clients.

class AliceKeyValueMock(KeyValueClient):

    # other methods omitted

    def get(self, key: str) -> Optional[str]:
        if key == rehash("Alice"):
            return "{\"name\": \"Alice\", \"age\": 43}"
        return None

class BobKeyValueMock(KeyValueClient):

    # other methods omitted

    def get(self, key: str) -> Optional[str]:
        if key == rehash("Bob"):
            return "{\"name\": \"Bob\", \"age\": 43}"
        return None

You might think we could simplify just by implementing a single class that takes the person’s name and age as arguments.

class KeyValueMock(KeyValueClient):
    def __init__(self, name: str, age: int) -> None:
        self.__key = rehash(name)
        self.__blob = json_dumps({"name": name, "age": age})

    def set(self, key: str, value: str) -> bool:
        return True # hard coded

    def get(self, key: str) -> Optional[str]:
        if key == self.__key:
            return self.__blob
        return None

def test_person() -> None:
    alice_client = KeyValueMock("Alice", 43)

    alice = Person(alice_client, "Alice", 42)
    assert alice.age == 42
    alice.has_birthday()
    assert alice.age == 43

    loaded_alice = load_person(alice_client, "Alice")
    assert loaded_alice.age == 43

    bob_client = KeyValueMock("Bob", 102)

    bob = Person(bob_client, "Bob", 101)
    assert bob.age == 101
    bob.has_birthday()
    assert bob.age = 102

    loaded_bob = load_person(bob_client, "Bob")
    assert loaded_bob.age == 102

However, this implementation will only allow test code to query for one blob value per key. There are ways around this. Another solution you might jump to is to keep a reference to the mock, and have the test code update the internal state as needed. The technique does not translate well to the Person test case, but the following test would not have been possible with any of the previous mock implementations.

class KeyValueMock(KeyValueClient):

    # other methods omitted

    def __init__(self) -> None:
        self.__blob = json_loads({"name": "Alice", "age": 42})

    def set_blob(self, blob: str) -> None:
        self.__blob = blob

    def get(self, key: str) -> Optional[str]:
        if key == rehash("Alice"):
            return self.__blob
        return None

def test_person() -> None:
    client = KeyValueMock()

    alice = Person(client, "Alice", 42)
    assert alice.age == 42
    client.set_blob(json_loads({"name": "Alice", "age": 42}))
    loaded_alice = load_person(client, "Alice")
    loaded_alice.age = 42

    alice.has_birthday()
    assert alice.age == 43
    client.set_blob(json_loads({"name": "Alice", "age": 43}))
    loaded_alice = load_person(alice_client, "Alice")
    assert loaded_alice.age == 43

    alice.has_birthday()
    assert alice.age == 44
    client.set_blob(json_loads(json_loads({"name": "Alice", "age": 42})))
    loaded_alice = load_person(client, "Alice")
    assert loaded_alice.age == 44

At this point the need for a fake should be obvious. Clearly, all of these problems could be addressed by making the KeyValueMock actually implement a key-value store. This type of test-only client is known as a fake.

from typing import Dict

class KeyValueFake(KeyValueClient):
    def __init__(self) -> None:
        self.__dict: Dict[str, str] = {}

    def set(self, key: str, value: str) -> bool:
        self.__dict[key] = value

    def get(self, key: str) -> Optional[str]:
        if key not in self.__dict:
            return None
        return self.__dict[key]

def test_person() -> None:
    client = KeyValueFake()

    loaded_alice = load_person(client, "Alice")
    assert loaded_alice is None
    loaded_bob = load_person(client, "Bob")
    assert loaded_bob is None

    alice = Person(client, "Alice", 42)
    assert alice.age == 42
    loaded_alice = load_person(client, "Alice")
    assert loaded_alice is not None and loaded_alice.age == 42

    alice.has_birthday()
    assert alice.age == 43
    loaded_alice = load_person(alice_client, "Alice")
    assert loaded_alice is not None and loaded_alice.age == 43

    alice.has_birthday()
    assert alice.age == 44
    loaded_alice = load_person(client, "Alice")
    assert loaded_alice is not None and loaded_alice.age == 44

    bob = Person(client, "Bob", 101)
    assert bob.age == 101
    loaded_bob = load_person(client, "Bob")
    assert loaded_bob is not None and loaded_bob.age == 101

    bob.has_birthday()
    assert bob.age == 102
    loaded_bob = load_person(client, "Bob")
    assert loaded_bob is not None and loaded_bob.age = 102

Because a fake implements the full API, tests can be extended without further edits to the test-only client. As long as the fake’s API appears to work the same way as the real dependency, test suites can grow in length and complexity without additional overhead.

It is worth mentioning that in this case, the fake is much shorter than the mock. Regardless of the client API, at some point it will always be easier to implement a full fake than to maintain a list of hard-coded return values. However, that line will move as the API becomes more complex. A fake SQL store will have to implement a large subset of the SQL standard. It is difficult to justify this much effort when test code itself is only a few lines long.

But this is only a problem if the test author has to maintain their own fakes. One official fake requires less maintenance and every downstream user of the dependency can take advantage of the fake. Further, for complex external dependencies like SQL stores, the dependency’s original authors are going to be better able to correctly simulate their code in a fake. As a dependency grows in complexity, the chances that an end user will be able to fake the API correctly diminishes, and the risk that a bug in the fake allows bad code to pass a unit test grows. While even a fake maintained by a service’s original authors will probably not exactly match the original behavior exactly, it is much more likely, especially for end users’ test cases.

You might argue that to model unusual behavior, we should use a mock instead of a fake so we can test a specific unusual behavior. A mock could just produce the weird output directory, but you would have to configure a fake by getting the client into a state where an unusual return value is possible. This can be challenging, but if it is not possible for a dependency to be put into a state where a specific response can be generated, then the test requiring that response is not checking behavior that could run in practice.

For example, say a test requires using an accounting service, and we want to check that the code works even if the balance is negative. If the accounting service is implemented to reject any transaction that would cause the balance to become negative. In that case, a fake would also prohibit the balance from becoming negative, so such a test could not be implemented with a fake. That is actually a sign that the test is not checking anything useful. In this case, it would be better to assert in the application code that the balance is always positive or zero instead of handling the case where it is negative.

Sometimes, a test author is trying to check that their code correctly handles a dependency failing. I have seen fakes that allow a user to configure simulated network latency or intermittent failures. These are great options to provide users when applicable, but it is better to have the fake fail as often the real dependencies SLA would allow. The fake should use the same seed for its pseudo-random number generator in each run, so that the tests are still deterministic, but that randomness should be used to decide when calls actually fail. Any use of a client should handle its errors. When a dependency can fail up to some error budget, then error handling should already be baked into any code using that dependency.

The major exception to using fakes is testing error handling during catastrophic failure. If code is designed to handle a class of errors that cannot occur while the dependency is operating within the SLA then one cannot use a fake. In this case, one would have to implement a bespoke mock that responds with the desired errors.

Nonetheless, fakes solve most of the problems associated with mocks. The tests are easier to maintain, and they can be run without setting up a specific test environment. Test suites can more easily grow in both size and complexity because developers can be confident their code integrates correctly.