Monday, November 4, 2013

Java toString(): the Program Logic vs. Debug Dilemma

I'll start with my real world problem of how to implement toString() for my new class, followed by an analysis of how Java uses toString(), and finishing with my conclusions.


Real world problem: my class

Here's the (stripped down for simplicity) version of my new class:

1:  /**  
2:   * Represents a possibly multi-byte character and provides information about it.  
3:   * <p>A UnicodeCharacter is the equivalent of a Java "code point".</p>  
4:   */  
5:  public final class UnicodeCharacter {  
6:    private final int codepoint;  
7:    public UnicodeCharacter(int codepoint) {  
8:      this.codepoint = codepoint;  
9:    }  
10:    public int getCodepoint() {  
11:      return codepoint;  
12:    }  
13:    /**  
14:     * @return Array with usually 1 character, 2 characters for multi-byte.  
15:     */  
16:    public char[] getChars() {  
17:      return UCharacter.toChars(codepoint);  
18:    }  
19:    /**  
20:     * @return Tells if the {@link #getCodepoint codepoint} is for a surrogate pair. 
21:     */  
22:    public boolean isMultiByte() {  
23:      return getChars().length>1;  
24:    }  
25:    /**  
26:     * @return For example the category UCharacterCategory.UPPERCASE_LETTER  
27:     */  
28:    public Byte getCategory() {  
29:      return (byte) UCharacter.getType(codepoint);  
30:    }  
31:    /**  
32:     * See {UCharacter#getName}  
33:     */  
34:    public String getName() {  
35:      return UCharacter.getName(codepoint);  
36:    }  
37:  }  


Now for toString(), what should it be?

Suggestion 1: for the human user, debug output

1:    public String toString() {  
2:      return "UnicodeCharacter{" +  
3:          "cp=" + codepoint +  
4:          " ,string='" + getString() +  
5:          " ,name=" + getName() +  
6:          '}';  
7:    }  
Example output:
  • UnicodeCharacter{cp=65,string='A',name=LATIN CAPITAL LETTER A}
  • UnicodeCharacter{cp=1040,string='А',name=CYRILLIC CAPITAL LETTER A}
  • UnicodeCharacter{cp=32,string=' ',name=SPACE} 

Suggestion 2: for the machine, program logic, concatenateable

1:    public String toString() {  
2:      return UTF16.valueOf(codepoint);  
3:    }  

Example output for the same 3 (yes 3) characters:
  • A
  • А
  •  

Both have their advantages and drawbacks. It certainly needs a concatenateable method, but that can be named "getString()". The debug method is nice to have - you see, just by looking at the characters you can't tell whether it's the Latin or Cyrillic A, or what kind of whitespace it is.

Unfortunately, in this case, expanding the object in the debugger doesn't help, because solely the code point is a property of the object. The other information (character and name) are computed:




Analysis of Java's toString()

Here's what Java has to say about Object.toString():
Returns a string representation of the object. In general, the toString method returns a string that "textually represents" this object. The result should be a concise but informative representation that is easy for a person to read.
Both my suggestions follow the specification. The one with debug info is easier to read for a person. But I'm not sold yet.

Who calls toString()?

  • Java itself:
    When doing string concatenation: String s = myString + myCar;
    Is the same as doing String s = myString.toString().concat(myCar.toString());
  • JDK methods:
    String.valueOf(Object obj), and thus every method that uses this such as PrintStream.print(Object o).
    Arrays.toString(Object[] a) for every object in the array a.
    StringBuilder and StringBuffer: the toString() method is used as the build method.
  • Logging:
    System.out.println().
    Your favorite logging framework.
  • Debugging:
    Your favorite IDE in the debugger.

Hrm. So there are mainly 2 uses:
  1. String representation: toString() returns the object's value "as string" as close as possible.
    It is absolutely required to override toString(), and to do it in this way.
  2. Debug information: the object's values for the human.
    For example IntelliJ IDEA's default toString() template generates this kind.
    It's just nice to have.
Sometimes, as an additional benefit, the object provides a constructor accepting that string as a parameter to re-create it. Example: Integer.toString() and new Integer(String).

How does Java in the JDK define toString() in their classes?

For some simple value classes there's not much choice. Integer for example: returning "-43" makes sense.

Character could return more than just the character as string, but it does not. String could tell the length and cut it if it's too much, but it does not. StringBuilder and StringBuffer could report the appended chunks separately, and tell how many, and cut, but they don't. If they would, the classes would need a separate method for string concatenation, and concatenation with + would not work anymore. Now here's an observation: They all implement CharSequence, which was added in JDK 1.4, and it overrides the toString() method signature just to say something about it:
Returns a string containing the characters in this sequence in the same order as this sequence. The length of the string will be the length of this sequence.
So that's why.

Conclusions

My class UnicodeCharacter is a wrapper around a unicode codepoint just like Character is a wrapper around the char primitive. It's a character supporting those that don't fit into a char. And as such it really should implement the CharSequence interface. Then the decision is made: toString() must be suggestion 2, only returning the character's string value.

In some rare cases it would be nice to have 2 different methods: one for the string value (toString()) and one for the debug info (toDebug() or toDebugString()). The method could be defined in Object, with a default implementation: calling toString().


3 comments:

  1. Thanks you for the great article!

    I usually use some toString() contact:

    toString() as data converter "as string" only with primitive datatypes wrappers, i.e. Integer, Boolean, Double, Date.
    Important thing here is that this resulting string representation of value can be parsed verse to get an original value.

    Another thing is when we need to show the object to a user. In this case it may be better to create another method called like a getDisplayName(), getTitle() or getCaption(). For example User class can contain getDisplayName() that return First Name and Second Name with space between them.

    In all other cases toString() used for logging output. Since I use Groovy so I do it with @ToString annotation that generates this method in runtime.
    For example class User can contain toString() method that return user login or email.



    ReplyDelete
    Replies
    1. I expanded this comment to article in my blog
      http://stokito.wordpress.com/2013/11/07/tostring-contract/

      Delete
  2. Interesting topic, of which I have been thinking many times myself.
    I think the example you are using is pretty uncommon. This is a rather technical class and in most cases you don't want your classes to implement CharSequence. In this case I think you reached the right conclusion, but in general domain objects rarely has a business method to be named toString().
    In almost every case toString() is a technical method used for logging, debugging and error-messages in different leves of test and as such it should not be used in business logic.

    ReplyDelete