Thursday, October 25, 2012

Refactoring-Safe Switch on Enum

A switch statement in Java on an enumerated type looks something like this:

enum TrafficLight {
    GREEN,
    ORANGE,
    RED
}

TrafficLight trafficLight = TrafficLight.GREEN;

switch (trafficLight) {
    case GREEN:
        doGreen();
        break;
    case ORANGE:
        doOrange();
        break;
    case RED:
        doRed();
        break;
}

This works, job done, I'm going home. Hopefully no one ever changes the enum.

The java manual writes:
You should use enum types any time you need to represent a fixed set of constants. That includes natural enum types such as the planets in our solar system and data sets where you know all possible values at compile time—for example, the choices on a menu, command line flags, and so on.
I see 2 possible problems with this definition:
  1. This text proves that what we think is set in stone can still change... planets in our solar system? It was probably written before 2006 when Pluto still counted as the 9th planet in our solar system. 
  2. "Compile time" vs. programming time. The enum values are known when writing the switch code. Who's compile time? Of the switch or the enum? "Choises of a menu" or "command line flags" can surely be modified and recompiled later on, without touching every line of code that made use of those enums. Especially when the enum comes from another code base (library).
Most enums I come across are not as unshiftable as the Months (JANUARY to DECEMBER).

So: What happens if someone changes TrafficLight after I wrote and compiled my switch? Possible scenarios:
  1. Element is renamed: someone decides to rename ORANGE to YELLOW.  
    1. Enum is defined in an external lib: compilation of switch code fails, we need to change. That's good, the switch statement must be updated and won't cause problems.
    2. Enum is in same code base: refactoring renames switch block too, case closed.
  2. Element is removed: That's good, the switch statement must be updated to compile, case closed. 
  3. Element is added: someone adds GREEN_BLINKING. We won't notice. If the developer who added the enum value does not go through all the usages then it's a possible cause of bugs.

Basic solution

switch (trafficLight) {
    case GREEN:
        doGreen();
        break;
    case ORANGE:
        doOrange();
        break;
    case RED:
        doRed();
        break;
    default:
        throw new UnsupportedOperationException("Unsupported enum value: "+trafficLight+"!");
}

Now we notice it as soon as the code is executed with TrafficLight.GREEN_BLINKING.
This still has the potential to go into production with a bug in the case that GREEN_BLINKING did not occur when testing. Maybe it's a very rarely used value. That's likely, because if it was an obvious element of the enum then it would have been in right from the start. And it's likely too that there is no unit test covering this value... if someone went to add one now then he would have updated the switch code too.

Thus a safer syntax is:

Better solution

enum TrafficLight {
    GREEN,
    ORANGE,
    RED;
    public static void assertSize(int expectedItems) {
        assert values().length == expectedItems : "Update the code calling this with "+
                                                  expectedItems+" instead of "+values().length+"!";
    }
}

TrafficLight trafficLight = TrafficLight.GREEN;

TrafficLight.assertSize(3);
switch (trafficLight) {
    case GREEN:
        doGreen();
        break;
    case ORANGE:
        doOrange();
        break;
    case RED:
        doRed();
        break;
    default:
        throw new UnsupportedOperationException("Unsupported enum value: "+trafficLight+"!");
}

This guarantees that, as long as this path is executed and assertions are turned on, we'll see.

Alternative solution

Java's switch still bears potential for bugs. Unfortunately there is no switch syntax that automatically breaks. Forgetting a break statement is a common programming error. If-then-else could be used, but that's ugly and has other issues.

Another technique is to force the enum values to implement the functionality directly:

enum TrafficLight {
    GREEN {
        @Override public void handle() {
            handleGreen();
        }
    },
    ORANGE {
        @Override public void handle() {
            handleOrange();
        }
    },
    RED {
        @Override public void handle() {
            handleRed();
        }
    };
    public abstract void handle();
}

This way one can't simply add a value and break existing code. Unfortunately this technique isn't always feasible, for example when the enum is from an external dependency.

Conclusions

Suggestion for Java: add an assertSize(), expect() or similar method to enum.
Suggestion for IDEs: auto-generate such code when creating an enum or when switching on one.

History and evolution

My original use of this pattern was to have the enum's assertSize() method throw AssertionError directly instead of using the assert statement, and then throw AssertionError("Dead code reached!") again in the switch default block. That combination works, however, it always aborts the program - even in production. After rethinking this topic I've come to realize that it's bad.

There are many scenarios where a program could recover and go on from such a situation. For example a server program serving clients. A web server or web service, where each individual request might fail. If one enters such an execution path, the main routine can still catch that, and return an internal server error to the user. An abort would be fatal.

Tuesday, October 23, 2012

@GeneratedCode: Document when code was auto-generated

IDEs such as IntelliJ and Eclipse provide functionality to generate methods including toString(), equals() and hashCode().

When making changes to existing code, and I need to update such possibly auto-generated methods, I often wonder whether I can just drop and re-create them, or if they include some magic I'm not aware of.

A very simple example class in Java with equals and hashCode:

public class Person {

    private final String givenName;
    private final String surname;
    private final org.joda.time.LocalDate dateOfBirth;

    public Person(String givenName, String surname, LocalDate dateOfBirth) {
        this.givenName = givenName;
        this.surname = surname;
        this.dateOfBirth = dateOfBirth;
    }
 
    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;

        Person person = (Person) o;

        if (dateOfBirth != null ? !dateOfBirth.equals(person.dateOfBirth) : person.dateOfBirth != null) return false;
        if (givenName != null ? !givenName.equals(person.givenName) : person.givenName != null) return false;
        if (surname != null ? !surname.equals(person.surname) : person.surname != null) return false;

        return true;
    }

    @Override
    public int hashCode() {
        int result = givenName != null ? givenName.hashCode() : 0;
        result = 31 * result + (surname != null ? surname.hashCode() : 0);
        result = 31 * result + (dateOfBirth != null ? dateOfBirth.hashCode() : 0);
        return result;
    }

}

Scenario: add another field called gender. Even though the class really is very basic, it already takes a moment to understand. Should I hack in the additional field, or re-create the methods from scratch?

With more complex classes it gets worse.

My solution: I document whenever I have the IDE create such code for me. I've chosen a new @GeneratedCode annotation. A Javadoc comment would work too. Once I modify the code by hand I remove the annotation.

A suggestion for IDE developers: The generated code could include an annotation, Javadoc or other visual mark. And, even better, it could remove that automatically when the code gets modified by the programmer and differs from the original generated one. And (more wishful thinking) when refactoring the class (adding a new field) it could even ask if that new field should be included in the method, which could simply be done be re-creating.

Thursday, October 18, 2012

My Vocabulary Enum Pattern

When writing a library to be used by other developers, adding code documentation (javadoc) is essential.

Unfortunately javadoc often requires text duplication - which violates the don't repeat yourself (dry) principle.

This causes

  • incomplete documentation
  • outdated documentation
  • wasted time
In open source software I find this problem all the time.

One case where this becomes eminent is the following: a library that provides a builder for a simple value object. Where do you document the values? 
  1. In the setter of the builder? 
  2. In the Constructor of the value object? 
  3. In the private field of the value object?
  4. In the getter of the value object?
My solution to this is an enum called Vocabulary that contains all such values.

/**
 * Dictionary for terms used.
 *
 * <p>Javadoc can link to these values with details explanations, instead of copy/pasting 
 * the same limited and possibly outdated text.</p>
 */
public enum Vocabulary {

    /*
    Even though Java enums follow the convention to have all values in UPPER CASE,
    there is no need to follow this pattern here. Use the terms in the case that
    they appear in code.
     */

    /**
     * The api-key also known as user-id. 
     * A UUID of 44 characters such as "da39a3ee-5e6b4b0d-3255bfef-95601890-afd80709"
     */
    apiKey,

    /**
     * Such as "/service/ping". Starts with but doesn't end in a slash.
     */
    servicePath,

}

Then the builder and value object look like:

class MyValueObject {
    /**
     * @see Vocabulary #apiKey
     */
    public String getApiKey() { return apiKey; }
}

Within the IDE it's quick to get the explanation (ctrl-q in IntelliJ). And in the generated javadoc html pages it's just a click away. It would be nicer if there was a template syntax/conversion to inline the comment (is there?). Another thing I like about this is that it's easy on my eye - there is little noise in the javadoc. Once I understand the concept behind a term, I don't have to read it again.

Note: I do not add objects to the Vocabulary that are of a specific custom type. A String or Integer needs to be in because it's too general, the data type has no information about permitted value ranges. A MyType however already has the javadoc in its own class and thus usually does not need further documentation when used as a method parameter or return type.