Tuesday, May 20, 2014

If your project is still hosted on SourceForge, I automatically assume it's dead.


SourceForge was the first free project hosting platform for open source software. It was immensely popular. It was the hub during the early open source years on the web - it was the GitHub of today.

Because of its historical popularity, there is still a large amount of projects
hosted there. Some are popular, some are dead - just like in any source code
repository. Many developers moved their code to more modern platforms since then.
At first to Google Code, now to GitHub.


Here are 3 reasons why you should move away from SourceForge.

1) SourceForge feels dead. Many active projects moved away. 

When I look for a software, I don't mind too much where it's hosted. Maturity,
stability, community size and active development are the factors I look for.

I don't advocate moving your whole project with source code and bugtracker every
couple of years to the next hip kid on the block.

But the fact that so many moved on means that what's left is probably abandoned. When I search for a software, and a GitHub and a SourceForge project comes up, I tend to click the GitHub first. Also, because of my search history, I probably get to see the GitHub links first. I'm certainly not alone here.

2)  Developers have GitHub accounts

Open Source needs contributors and users. And the easiest way for developers to pitch in or file a bug report is when they already know the system and have an account. If it's complicated, I pass.

3) Scam ads: this stinks!

Here's the reason for my blog post.

Once again I was downloading a (totally outdated and abandoned, but familiar and working) software that is still on SourceForge, and then I got to see this ad:



It leads to a suspicious looking website, with Microsoft Partner logo and Windows logo, offering Windows driver updates. The kind my mother would click, thinking it's from Microsoft.


And here's the profile on web of trust: https://www.mywot.com/en/scorecard/driverupdate.net


The site even manages to get a top rating on WoT by spamming good ratings, even though all the important comments say it's a scam. WoT, please fix.
"they say I need over a dozen drivers updated. Confirming with the manufacturer, I am up to date. ..." --pmheart6
"why do all the positive comments use poor english with typical chinese mistakes?? A dishonest sales trap. It's difficult to uninstall and must be done manually." --KITRVL



SF changed ownership in 2012, and as the Wikipedia article writes "More recently additional revenue generation schemes, such as bundleware models have been trialled, with the goal of further improving sourceforge's revenue."

This looks to me like beating a dead horse. There's only downhill from here.

Please spare other people from such frustration. Even if the platform was good once, now is the time to move the code away. To BitBucket or Google Code or GitHub or wherever. Thanks.

Tuesday, April 29, 2014

Java: .equals() or == on enum values?

Both do pretty much the same thing: checking whether 2 enum values are the same. And if you search your code base, you probably find both kinds. There are some subtle differences. What are they - and which syntax should I use?

Same same - but different

Enum values are guaranteed singletons - and thus the == reference comparison is safe. The enum itself just delegates to Object.equals(), which does exactly that.



So if they execute the same code, why should I bother?

Learning the hard way - actual bug

I produced a bug which broke the build. Luckily we did have a unit test in place that caught it. But still I would have seen it earlier, had I used the "correct" syntax.

So which one is it?

And the winner is: ==

It provides compile-time safety. While MyColor.BLACK.equals(acceptsAnything)gives a warning at most, the check MyColor.BLACK == something won't even compile if something is not an instance of MyColor.



This makes it refactoring-safe. If you go - like me - and change the type of something later on, you'll notice instantly.

Another difference is the support for null. The check with == allows null on both sides. It doesn't throw, however, it may also "hide" a null value where there shouldn't be one.
The check with equals() allows null only on the right side, and throws a NullPointerException if the left side is null. That may be desired sometimes. Anyway, I vote for not letting null slip into this comparison at all. It should be checked/handled before, instead of being smart. (See my article about null handling.)
So the comparison with == is "safer" also at run-time in that it never throws a NPE, but is possibly hiding a bug when the left side should never have been null.

Then there's the argument of performance. It's irrelevant. I'm not going there.

And the visual aspect. Which one looks nicer?
When I can have compile-time safety, I don't care about the looks. I was used to .equals() simply because that's how Strings are compared. But in retrospect that's a pretty lame explanation.
On this StackOverflow question, which is exactly about the topic,  Kevin Bourrillion from Guava commented that "== may appear incorrect to the reader until he looks at the types" and concludes that "In that sense, it's less distracting to read ".equals()". Au contraire! When I see CONSTANTS around == I instantly know they're either enums, or primitives, of matching types. If not, the code is either red because it doesn't compile, or with a yellow warning from my intelligent IDE saying that .equals() should be used. For example Strings or primitive wrappers like Integer.

After that incident at the company we've decided to go with reference equality comparison ==.

Tuesday, April 22, 2014

Java: Evolution of Handling Null References

Using null in programming is playing with fire. It's powerful and sometimes the right choice, but the infamous NullPointerException can sneak in quickly. That's why I advocate for every software project to include a section about handling null in the coding guidelines.

The first time I've seen a Java exception was in the 90s in the web browser's status bar when surfing the net: a NPE caused by an applet. I had no idea what it meant and what I had done wrong. Today I know that the programmer let a null reference slip in where it wasn't expected to happen.

When looking at open source software written in Java I often come across code where null references are used but not documented. Sometimes preconditions are in place. When stepping deeper it gets really tricky to figure out which variable is now allowed to be null and which is not. For the author it was obvious at the time of writing the code... but software becomes better when written and looked over by many, and thus it is important to make it clear for everyone, everywhere.

Software is constantly shipped with NPEs detected later on. Redeployments and bugfix releases are expensive - and it's a pity because this kind of bug could be eliminated almost completely.


Here's my personal evolution of dealing with null in Java.

Level 1: No Plan

Using it wherever, no information about it in the Javadoc.
Fixing NPEs as they occur, seeing no problem in that.
public class Foo {
    
    public String getText() {
        return null;
    }
    
}

Level 2: Document Null - Sometimes

Realizing that NPEs are a problem that could and should be avoided. Detecting them late is expensive, especially after deployment.
Starting to document variables, arguments and return values that can be null... incomplete.
public class Foo {
    
    /**
     * @return the text, or null
     */
    public String getText() {
        return null;
    }
    
}

Level 3: Add "Null" to Method Names

Realizing it's still a problem. Having Javadoc is nice, but useless when not read.
Starting to name methods with null such as getFooOrNull() to force it to be seen.
public class Foo {
    
    /**
     * @return the text, or null
     */
    public String getTextOrNull() {
        return null;
    }
    
}

Level 4: Code Annotations

Using Jetbrains' null annotations: Annotating all variables with @Nullable and @NotNull. This is a huge step forward. The crippled method name pattern 'OrNull' is obsolete. The code is automatically documented.
public class Foo {
    
    @Nullable
    public String getText() {
        return null;
    }
    
}
And the best part: the IDE checks the code and warns on a mistake.


Using these annotations strictly I don't remember causing a single NPE in years. The drawback is more typing, more text on the screen. But then we use static typing, and not defining null is just incomplete.

Level 5: Guava's Optional

The concept: Instead of using the null reference directly, wrap it in another object that permits null. The getter methods on it myOptional.get(), myOptional.orNull() and myOptional.or(alternative) force the user to think about what to do when it's null. The API becomes extremely clean.

If you don't know about it yet, read the Guava page about using and avoiding null.

It took me a couple of days to get used to this. And I did produce a handful of bugs initially because I've used Optional.of() instead of Optional.fromNullable() by mistake.

Although step 4 with annotations already got rid of NPEs and improved the code confidence, this was another big step forward. The API of the Optional class is clean, and the API of code using it is consistent and clear. A developer only has to learn the concept once, and because it's from Guava, sooner or later everyone will know it.

The Guava team uses @Nullable annotations for the few places where null is permitted. Anywhere else there are no annotations.

Level 6: Java 8 has Optional built in

Oracle has a nice article on it. The API is slightly different, and I have not made the transition to Java 8 yet.



Finger Pointing

The bugtracker for the Glassfish software currently has 1309 matches for NullPointerException. Wow. (Go to the issue navigator, select the 11 projects on the left starting with "glassfish", and add the query NullPointerException, hit enter. The software is session-based, can't paste a link...)

The Grizzly project has 69.

This is a codebase that's still on level 1 regarding null. Exactly 2 years ago I had recommended to improve the Javadoc and start using null annotations. The task was accepted, got priority "Major" assigned, but is still open and things haven't changed.

I had also mentioned in that task the undocumented method int getQueueLimit() which would return the magical -1 for 'no limit'. Nowadays - being a level 5 guru ;-) - I'd instantly turn this into a self-documenting Optional. This forces users to think about the exceptional case. No horrible bugs from computations with -1 can occur. Users of it would then just do something like getQueueLimit().or(Integer.MAX_INT) or whatever suits their case - short, clear and safe.



My Recommendations

1) Avoid Null when possible.

Consider the null object pattern.

Return empty collections, not null.

Throw UnsupportedOperationException() instead of returning null.
Don't do this:
public Object foo() {
     return null; //todo
}
It is done sometimes as a quick way to satisfy the compiler. It's the no-impl pattern.

Do this instead:
public Object foo() {
     throw new UnsupportedOperationException(); //todo
}
If you return null then it (null) can go a long way until it throws a nasty NullPointerException somewhere. By throwing UOE directly the cause is clear in the stack trace.

2) Use Guava's Optional.

Wherever you would use null, use Optional. 
Except in very low level code.

3) Use Jetbrains's Null Annotations.


If allowing null then use the @Nullable annotation.This is a must. It automatically tells the user and the IDE that null is possible/permitted.

Use the @NotNull annotation everywhere else. This is optional. The Guava people do not, I do.

To get the annotations using Maven:

        <dependency>
            <groupId>com.intellij</groupId>
            <artifactId>annotations</artifactId>
            <version>12.0</version>
        </dependency>


Tuesday, March 11, 2014

Software Versioning and Bugfixes

There are many ways how to give version identifiers to software builds.

The goals of software versioning, as I understand them:
  1. Give each build (release, snapshot) a unique identifier.
  2. Apparent order: it must be obvious which is newer when comparing 2.
  3. Apparent changes: contains breaking changes, new functionality, or just bug fixes.
  4. Possibility to release updates later on that "fit in between".
I like the increasingly popular semantic versioning. It accommodates all these goals.

Apparent Order

1.9.0 -> 1.10.0 -> 1.11.0
This is important for me as a user, it lets me select the newest version quickly.

Apparent Changes

Going from 1.9.0 to 2.0.0 may mean that some of my use cases need to be adjusted, some things don't work the same anymore. I might prefer to stay on the "1" branch.

Going from 1.9.0, 1.10.0 contains more functionality, it seems pretty safe. Experience taught me that new features bring new bugs - even if I don't make use of the new stuff.

Going from 1.9.0 to 1.9.1 means there are just bug fixes. It is almost a safe bet to upgrade this. A quick look over the change-list helps me to decide where I might be affected. The risk is very low that this introduces new bugs.

Release Updates

When after 1.9 the version 1.10 has been released, and then bugs are found in 1.9.0 that should be fixed, not everyone will be happy with a 1.11 that now contains new functionality and bug fixes. Hence the 3 numbers.

H2 Database thinks otherwise

Unfortunately this software library that I use, like and value follows a different version system. Just like Google Chrome, they have been publishing new versions with what is effectively a single continuous version number.

For a recent history see http://www.h2database.com/html/changelog.html
And longer: http://mvnrepository.com/artifact/com.h2database/h2

There are no bug fix releases. Each new version brings new features, improvements, bug fixes, and new bugs, which becomes clear when you follow the issues. But to learn in which version they were introduced, and so on, you need to dig and read the bug tracker.

For a web browser, this release cycle OK. Give me the latest, I can cope with some level of annoyances and crashes now and then.

The database however is the back bone of many applications. It is the piece of software that one relies on, that is the worst to break. And as such, I really wish it would maintain two code branches. One for latest developments, and one that lags behind where just fixes are applied.

Whenever I consider updating this library in my code, I hesitate. Is it really necessary? Most times I answer no and move on. Updating means reading all recent changes, and running tests.

I only use this database for read-only scenarios, not as a live database with active changes.

My suggestion to H2 and in general

  1. Move code from Google Code to GitHub
  2. Change to semantic versioning
  3. Create a branch that lags behind, and let contributors do the work of maintaining it (applying just the fixes)

Monday, December 2, 2013

IntelliJ IDEA: add Project Files to Version Control?

Do the contents of the .idea project folder and the .iml module files belong into version control?

Yes, says the IntelliJ team:
Here is what you need to share:
  • All the files under .idea directory in the project root except the workspace.xml and tasks.xml files which store user specific settings
  • All the .iml module files that can be located in different module directories

No for .iml files say Maven users who are annoyed by ever-changing iml files, see these 2 older but still open bugs:

What are the drawbacks if I don't share these files? None, from my experience. When doing a fresh checkout of a project, IDEA asks whether I'd like to create a new project. Yes please, and all files are generated correctly. 

The only annoyance we have had at the company from time to time is when someone accidentally adds an .iml file to the VCS. There's a feature in IDEA to stop it from asking:

Settings -> Version Control -> Ignored Files

This setting has to be applied in every project separately. And thus it's not very helpful in this case. Because every new project I check out, IDEA asks me whether I want to add the .iml file to version control (one by one), and I can just tell it to stop asking:


Conclusion

For the time being we keep IDEA files out of the VCS. Our projects are usually multi-module maven, with lots of iml files. Your mileage may vary.



Friday, November 29, 2013

Tabs and Saved Session Folders for PuTTY

For many years I've been using the PuTTY network terminal to connect to remote servers when doing sysadmin tasks. And when I do, my monitors quickly fill up with those black windows. Remember web browsing before tabs? Like that. I totally lose track.

The other thing that's annoying is that it does not let me categorize my saved sessions. All are in one alphabetical list. That might work when you just have a handful, but my list grew to dozens. Meh.


So I checked again today on the official site for an update: 



With such an user interface of the site, it would be too much to expect gimmicks in the program. Yes there was an update. Security fixes.

So I googled for what else there is. And apparently there are plenty of wrappers around PuTTY that bring exactly what I want: Tabs and Session Folders. I don't know how they could hide from me for so long - maybe because neither the official site nor the Wikipedia page mention any.

I've decided to go with this open source and actively maintained one: superputty


Conclusion

Same putty, more gui. Awesome.

Friday, November 8, 2013

Java: Should I assert or throw AssertionError?

Assertions were introduced to the Java language in JDK 1.4, that was in 2002. It brought the assert statement
assert x != null : "x was null!";
and the throwable error class
if (x==null) throw new AssertionError("x was null!");
Which one should I use in my code?



Well then, what's the difference?

The assert statement

Unless assertions are actively turned on when running the program (-ea), they are not executed. They are still in the bytecode, but they don't run.
  • Bugs go unnoticed.
    These bugs should have been detected already in unit tests and the testing phase before deployment when running with assertions on. But experience taught us that some slip through.
  • No performance loss.
    There are a few rare scenarios where it matters.
  • Side effects are possible.
    Badly programmed assertions can cause nasty side effects. Because sometimes the assert code is executed, and other times it's not, the program may behave slightly different.
Using the assert statement gives control to the runtime config whether they run or not. The choice is left to the person who runs the program. The choice can be made from run to run, no recompile is needed.

The AssertionError

The asserting code is always executed. Changing your mind about execution means a recompile and re-deployment.
  • Bugs always show up.
    Even in production they abort the program.
  • Performance loss.
    In most cases it's irrelevant.
  • Guaranteed to have no side effects.
    The executed code is always the same.

 

Vital or nice to have?

Programs were written in Java long before assertions were available. Some came up with their own self-made assertions logic - which was not as clean and short as the real thing. And most probably didn't.
Assertions are not allowed to alter the program logic, were not always available, and are not mandatory. One can very well run the same program with them turned off. Conclusion: nice to have.

However, it is not guaranteed that a specific bug ever shows up without assertions running. Let's take a flight simulator for example. Assume a computation bug that only occurs in a rarely executed code path. The bug can cause the airplane to fly at slightly reduced speed, and no one ever notices. Or it can be a number overflow, causing the plane to fly backwards. That will surely be seen each time.

Would assertions help in production code?

It depends on the domain. Assertion means abort. Is that what you'd want, to prevent the worst? Or would you rather try to go on, hoping that it's a minor, neglegtible bug?
  1. End-of-day accounting program: rather abort, alert the technicians, they fix it, and go on. No damage done, and bug fixed.
  2. Real-time program where thousands of employees depend on it, and an abort means a big loss of work time. No abort. Hope that the bug eventually shows up as a side effect, and it can be traced and fixed.
  3. Program where an abort is the worst case scenario, like a flight simulator: go on.

Conclusions

assert, not AssertionError

When writing library code, assert is the right choice. It gives the power to the user whether assertions should be on or off, whether he favors detection + abort, or to go unnoticed.

When writing an application, use assert as well. If you want to enforce assertions, you can still hack in this piece of code, and if you change your mind later, you don't have to go through all assertions and replace them:
static {
    boolean assertsEnabled = false;
    assert assertsEnabled = true; // Intentional side effect
    if (!assertsEnabled)
        throw new RuntimeException("Asserts must be enabled (-ea)!");
} 
Also, with assert, they can be enabled on a per-package level
java -ea:com.example... MyApp

AssertionError to satisfy compilation

Sometimes, when reaching code that should never be reached, an AssertionError is thrown to make the code compile. Example:
switch (TrafficLight) {
    case GREEN:
        return doGreen();
    case ORANGE:
        return doOrange();
    case RED:
        return doRed();
    default: 
        //Ugh! I gotta do something, but have no clue what to do!
        //Let's abort the app and turn the semaphore offline.
        throw new AssertionError("Dead code reached");
} 
An assert statement would not compile when a return is required.

However, in some cases, it might be more suited to throw an exception instead of an error, for example an UnsupportedOperationException. In the above case an exception barrier could catch that, then turn all semaphores at the intersection to blinking orange for a minute, and then restart the green interval as it would from a clean program start and continue normal operation. That would have several advantages:
  • be cheaper than sending a technician
  • the semaphore is off for short time only
  • while it's off it's blinking orange, rather than being totally off, that seems more secure
The problem would be detected in both cases. With an AssertionError it's in the output for free. And with UnsupportedOperationException it's the task of the one catching the exception to log it.

So again, the choice is between a hard abort, or giving the program the chance to recover and continue if a higher level decides to do so.


What I'm missing: detection and logging!

In the production scenarios 2 and 3 from above I'd want a 3rd way of handling assertions, which is not offered by Java's assertion feature.

Java has 2 strategies:
  1. Don't even check, hence no abort
  2. Check, and conditionally abort or continue
And I'm missing the 3rd one with the behavior:
  1. Run assertion code to detect bugs and don't abort, but instead log.
    It's a clear bug, it's detected, so log it. There's no cheaper way to detect it than right here, in the assert code that was written already.
 

Configure your IDE to run with assertions

It strikes me that the default runconfig in the most common IDEs does not have assertions enabled. It actually happened to me that I had spent way too much time chasing a bug, when the assertion would have shown it instantly - but they weren't on.

In IntelliJ IDEA: You need to modify the Defaults for all kinds that you're using: Application, JUnit, TestNG, ... and changing the defaults does not change the runconfigs you've created earlier. And you need to do this in every project, separately. (Please add a comment if you know how to set this once and for all!)


For Eclipse there's an explanation here http://stackoverflow.com/questions/5509082/eclipse-enable-assertions also see the 2nd answer about JUnit tests.