How I Approach Testing

Tests Shouldn't Be "The Problem"

Cover Image

Testing is one of the most important things I do on a daily basis. Our team builds payment processing software. When we gave a presentation to our new manager I made sure to drive home the importance of our test effort:

A bug in our system is not just a nuisance, it means people don’t get their paycheck. It may mean someone misses their rent payment. It may mean someone can’t buy food for their family. Bugs in our system affect people’s lives.

Now the software you are working on may not have the same level of impact as ours but it doesn’t mean that lives aren’t impacted. People deal with buggy software everyday. Think how much of your brain is reserved for knowing the workarounds of software that doesn’t behave as it should. Reducing your bug count could be collectively add years to people’s lives.

In practice I spend more time on tests than I do on the code. The exception is when I am in exploration mode and the key outcome is learning. One line of code may spawn half a dozen tests, because of this writing tests has to be both quick and easy while still allowing me to come back later and understand what is going on.

I am very partial to Behavior-Driven Development-style (BDD) tests. These tests code the behavior of the system—not how a particular piece is coded—and these behaviors are usually specified as business requirements. BDD tests generally have three parts: Arrange, Act and Assert (also known as Given/When/Then).

Why Are You Testing?

One time, during a presentation on readable tests, I asked for a volunteer to share their test code. My boss at the time offered. Looking at the tests I said, “Hrmm, ConstructorTest1, it looks like the purpose of this test is to get to 100% code coverage.” Code coverage is a laudable goal, but using code coverage as your driver doesn’t result in the best tests, you are merely finding the shortest path to exercise code that was already written. On the other hand, when you set out to test behavior you are thinking about code in the context of how it fits into the system.

For instance, if I’m writing a method that will generate a payment instruction I’m not wondering “can the method handle a negative number without crashing?” I’m thinking “what should the behavior be if a negative number is supplied” and “how does that behavior change if I adjust the context?” This will often lead to discussions with the product owner, such as “do we want to allow refunds using this API by sending negative payment values?”

Incidentally, if code-coverage is your goal then you should work towards test-first development. If you write the minimum amount of code necessary to get the test to pass then you end up with near 100% code coverage automatically and that’s a beautiful thing.

Structure of a Test

When our team writes tests, we follow a fairly standard layout. Here is an example test in C# using [Machine.Specifications] (MSpec):

[Subject(Foo, "CheckIsValid")]
class FooTest_CheckIsValid_WithValidData {
    const string ValidInput = "non-empty string";

    Establish a_profanity_filter = () => {
        MockProfanityFilter = new Mock<IPofanityFilter>();
        MockProfanityFilter.Setup(m => m.HasProfanity(Moq.It.IsAny<string>()))

        TestSubject = new Foo(MockProfanityFilter.Object);

    Because of_passing_valid_data = () => ActualResult = TestSubject.CheckIsValid(ValidInput);

    It should_return_no_error = () => ActualResult.ShouldEqual(May.NoValue);

    It should_check_profanity_filter = () => MockProfanityFilter.Verify(m => m.HasProfanity(ValidInput));

    static Mock<IProfanityFilter> MockProfanityFilter;
    static Foo TestSubject

As a quick introduction to MSpec, it uses lambdas to define the Given/When/Then (Establish/Because/It) of the test. The Subject attribute defines the test and also groups test output hierarchically. You can read this test as “Foo:CheckIsValid, when given a profanity filter, because of passing valid data, it should return no error and it should check profanity filter.”

At the very top of the test we define class members that define key context for the test. In this case, we put the input we are going to be using. Both by the name and the value we help describe what valid content is. It’s not a complete list as we will spend more time defining and testing invalid input in subsequent tests. At the bottom of the class are incidental members, variables that handle bits of the test but aren’t giving us a lot of information. It’s more important the detail of assigning TestSubject a value than seeing how it is declared.

In the Establish section we define the context for the test, including setting up any of the test subject’s dependencies. In this case we create a mock profanity filter that always returns false, valid input is also not profane. At the very bottom of this setup the test subject is created and we can see all parts that are passed to it.

Note that the entire setup is present in this class. There are no base classes or common code hiding several directories away. Everything needed for the test is visible in a single class on a single screen.

Some people like automock or other helpers here. I tend to avoid things like that. To me it’s more important that I can see all of the dependencies and how they are defined rather than rely on magic that I can’t see. It also lets me see when the subject starts getting too large to be easily testable. If I can’t setup all of the dependencies in just a few lines of code chances are the class or method is doing too much.

Once we’ve defined the context we then have the Because clause that does the actual work. In most cases this is a single line of code acting on the subject, here we call the method we’re testing with the input and capture the output.

Finally, we have the assertions. There are the It clauses and each one defines a different result we expect to see. There is only one assertion per It so that any failures are unambiguous. Also, we only assert that which we care about for the behavior. If the subject writes a log message we don’t check it, unless the log message is part of the business logic (i.e. logging an audit message when an account is deleted).

Let’s look at another example from Javascript (actually Livescript). This is from a Hexo plugin that gets picture data from Unsplash and drives the awesome pictures that front every blog post. I have removed a lot of the cases here for brevity.

describe \parseUnsplash ->
  sut = parseUnsplash
  const default_crop = \entropy

  specify "it should return undefined for a local file" ->
    expect(sut "foo.jpg")

  describe "when using long form specification" ->
    specify "it should return undefined for a different site" ->
      expect(sut "")

    specify "it should match http" ->
      expect(sut "")

    context "without a specified crop" ->
      const expected_id = \0987
      result = sut "{expected_id}"

      specify "it should return id" -> expect( expected_id
      specify "it should return default crop" ->
       expect(result.crop).to.equal default_crop

    context "with a specified crop" ->
      const expected_id = \1a2b3c4d
      const expected_crop = \face
      result = sut "{expected_id}##{expected_crop}"

      specify "it should return id" -> expect( expected_id
      specify "it should return expected crop" ->
       expect(result.crop).to.equal expected_crop

Here LiveScript, Mocha and Chai combine to offer a very succint syntax (the -> operator defines a function which is passed as the second parameter to the specify, describe, etc functions). We don’t have much in the way of context, so we just set up the subject under test (sut) which is simply an imported method. At the beginning we set the value for default_crop since it holds true across the specification. First we have a simple test for the local-file case. We then jump to the long-form version of specifying the Unsplash image, which in this case is a URL. There are several cases nested under this section. For the case “with a specified crop” we put the pertinent data for the case at the top of the method, like we did with C#.

Each describe entry gives us another nested context that is exposed in the test results.

      ✓ it should return undefined for a local file
      when using long form specification
        ✓ it should return undefined for a different site
        ✓ it should match http
        without a specified crop
          ✓ it should return id
          ✓ it should return default crop
        with a specified crop
          ✓ it should return id
          ✓ it should return expected crop

This allows you to build up rather complicate contexts that build upon the earlier, simpler contexts. You can also see how the specification reads as plain English:

when using long form specification, without a specified crop, it should return default crop

Note that for simple contexts I omit the “when” clause altogether: “it should return undefined for a local file.” Sometimes the context is so simple that it doesn’t need a separate clause. Experiment with your test structure and find what works for you and what doesn’t.

That repository actually shows off a bunch of different patterns I use when trying to simplify the tests and make them readable.

Deciding What to Test

Almost as important as how to structure tests is what to test. In the next installment I’ll talk about how I go about deciding this and how to divide the behavior into testable chunks.


These are the top principles I use for effective automated testing:

  • Use BDD to test the behavior instead of the code
  • Present the entire context; avoid hiding test setup in other locations
  • Structure tests so the important stuff is up front
  • Avoid “magic”

Any comments or questions? Leave them below, I’d love to hear what works for you.