Sunday, December 2, 2012

My PHP WTF Of The Day: max_input_vars

I recently moved the installation of a legacy in-house LAMP application from a dated server (hardware failures) to a virtual machine based on backups. To make the transition as smooth as possible, I chose the same software components: centos, mysql5 (the dumps are not compatible with 5.1 and newer thanks to a charset collation bug), php 5.1.6. Because it's an in-house app behind firewalls I don't need to worry about security updates.

Smoke signal testing passed, and so I gave the OK for the users to continue their work.

After weeks of using it, a problem was reported about just one function of the app that would sometimes return a blank screen. It took me hours of debugging (read: echo) to figure out what's going on, digging through some old PHP code (fun!): It appeared that only 1000 post variables arrived on the server. (Well, 1006 actually, but 2 were added by PHP, and that sounded like a PHP-style limitation of 1000.) A quick google lookup revealed that PHP introduced a new feature where it would limit the number of post variables. For safety reasons.

The variable is called "max_input_vars" with a default of 1000. PHP states that this feature was introduced in 5.3.9, but I'm running 5.1.6 and the limit is enforced.

Because the server is for production, it was running with on-screen warnings turned off. PHP says that it "prints a warning and cuts". For me, that's a real WTF. A post request should be processed as all-or-nothing. It should instead refuse the request completely. But for a technology named "personal home page" the priorities are different.

Cutting the post data is as if a browser would enforce a URL length limit of 256 and just cut there, or a database would just truncate a string too long to fit in a field (oh, no, wait, mysql does this without the strict option, and so did earlier versions of postgres ...).

The purpose of this new feature is to harden security... what it did for me is hardening my hate-hate relationship with PHP.

Monday, November 19, 2012

PrettyTime and Joda playing nice

PrettyTime is a library for printing elapsed times such as "2 hours ago". Joda is the library to replace JDK's ancient and problematic package.

Unfortunately, PrettyTime uses JDK's date class only, and doesn't welcome Joda classes such as LocalDateTime to be passed in directly. Also, it does not support time zones yet (apparently that's low priority).

Because of its great i18n support (lots of translations) I live with this handicap, and convert:

    LocalDateTime now =;

    //store this instant to db:
    long time = now.toDate().getTime();

    //then load it from db again:
    LocalDateTime nowAgain = new LocalDateTime(time);
    assert now.equals(nowAgain);

    //wrong: printing this is off by the local time zone difference to UTC:
    System.out.println(new PrettyTime().format(nowAgain.toDate()));

    //correctly adjusted:
    Date javaUtilDate = nowAgain.toDateTime(DateTimeZone.UTC).toDate();
    System.out.println(new PrettyTime().format(javaUtilDate));
This prints "moments ago". Phew! Time for beer.

Friday, November 16, 2012

Play framework: develop with custom domain and port

What it's about

This howto shows you how to use a custom host name and port number when developing with the play framework v2.

Why would you do that? 

In my case it's because I want the development environment to be as close to the final production environment as it can be.


You have play v2 installed, you start it command line, and you're developing with remote debugging in IntelliJ or Eclipse or whatever.
Your desired host name is with port 80.


If your host name is not in the dns then add a hosts entry. It must point to your local ip address (such as, not to the loopback address

By default, Skype listens on port 80 for incoming connections. Either shut down Skype completely for the time of development, or configure it to NOT listen on port 80. Just shutting it down for the moment of starting the embedded play server (Netty) isn't enough, because after a while Skype gets a hold if the port again.

In the console, type "play debug" as usual. But this time run with "run 80".

Test it in the browser.

Final notes

There is no configuration within play for the port or host. If you deploy your app then you probably have that kind of config in Tomcat or so.

More info:

As of now (version 2.0.4) running on a certain port (such as run 80) does not complain when another application or service is occupying that port already.

Monday, November 5, 2012

How important is the domain name today?

Even though domain names have become less important in the past decade, we can still find the old patterns in those of successful websites.

Why domain names were so important

For those of you who weren't there in the mid 90s, here's a bit of history. Don't laugh.
When the www became popular, domain names were a new thing. Not every business had a website. New internet businesses were founded with the domain name as company name.

An advertisement for a business such as amazon in the mass media (think television, or newspaper (in paper form)) had to make it clear that you can go and type that into the browser's address bar. Thus it needed to tell you the web address ... and that meant the full web address. Yes, including the top level extension .com. And yes, with the www, because that was the ultimate "web" indicator. And, not joking, including http://, because browsers required you to tell them what protocol you want. It could have been ftp, or gopher, how could it guess.

The older generation (my mother) was going bananas on all these strange characters... http//:? //http?  www.http? http@www?

Email was new too. My mother's email address was or so.

To not add to the confusion, you'd better stick to the .com extension like everyone else.

If you mistyped the address in the browser, all you got was an error message. Sorry. You could then goo... no wait, you could search for it on Altavista. If you had a bookmark or knew the address by heart. It was (Yeah, the www in there is useless, but it probably wouldn't load the site without ...). Autocomplete wasn't invented yet.

Years later, because http became the most popular protocol, and "the internet" meant the web, browsers started to add the http:// prefix automatically if omitted.

Then, even more artificial intelligence was built in: converting a word such as amazon into What a salvation!

OK, it wasn't real AI, but it helped. And because everyone important used .com, it worked well. Everyone? well, mostly. The white house is on a .gov address. And because the .com belonged to someone showing pictures of naked women (safe for work), browser programmers went back to work to come up with even more magic: exception rules.

And today? 

Having .com in the business name isn't important anymore. Everyone's online. (Online? Yes, online/offline you know, modems...)

Amazon 2004
Amazon 2012

People don't type addresses anymore. The address bar is just to copy/paste, or for auto-complete, or to google the name because it's quicker, or to spell-check a word.

Users will find your site based on the name, no matter what address you have.

Patterns in domain names of the top visited sites

Here's what I've found when looking over Alexa's top ranked sites list, mostly the top 100.

They're all .com

And if they are not, then
  • Either they redirect .com to the primary domain (#6, #43, #146, #160
  • Or it's a country specific domain (16x Google such as
  • Legal issues (#69
  • Domain hack  #80

2 syllables

Practically all internet brands have a 2-syllables, 1-2 word domain name, without hyphen, ending in .com. Goo-gle, face-book, you-tube, bai-du, twit-ter, ama-zon, linked-in (argument), blog-spot, tao-bao, si-na, yan-dex, word-press, e-bay, ...)
Exceptions are wikipedia (known as "wi-ki"), 1-syllable domains (live, bing), and abbreviations (qq, msn, vk).

Number domains

There are some number domains: #107, #117, #157, #223

No hyphen

There are practically no domains with a hyphen. The first real website with a dash is #334, and that's because it's their business name. In case you're bored you can also take a look at #342 and his Wikipedia page. The first site that went with a domain name with a minus in it is #442, but apparently only until they got enough cash to buy the real one, and are redirecting now to (#6177). So there really isn't any worth mentioning. A hyphen is a no-no.

Domain hacks

Domain hacks are becoming more popular. The well known #523 is now at Google uses #962 for its url shortening service. (I surely missed some important ones, please comment.)

Typo domains, missing schwa

A new trend is domain names with a spelling mistake. Simply because every possible name you could come up with isn't available as dot com. It's often the schwa in the last syllable that's omitted. Examples are #36, #201 Twitter #8 initially launched as #717 is more than a typo, but falls into that category. (I surely missed some important ones, please comment.)

Conclusion about today's domain selections

When choosing a domain name for a new global site (the next twitter killer), people go for
  • a relatively short (not 3 or 4 words),
  • possibly with a typo (because nothing else is available)
  • never hyphenated
  • dot-com domain.
  • Or they invent a new word (that's hard to remember).
  • Or alternatively a very short domain hack.

And the future?

Next year, in 2013, up to 2000 new top level domains will become active. Even though we have lots of global tld's already (not cctld's, those are fine), no one important uses them. Will this mess make .com even more important, or will lots of domain hacks appear? What's your opinion?

What do you do when you're in need of a new, attractive name? Do you try to buy one? Even though most domain names are inactive, the owners usually have a different opinion about their value. Do you misspell, hyphenate, choosealongname, invent a new word, go .net or .org, or look for a domain hack?

WWW or not?

Up until today the web still hasn't found consensus about whether addresses should use the www or not. It's a useless leftover from the past. Most websites still use it, including the tops (google, facebook, yahoo, amazon, ebay, bing, msn, paypal ...), newer ones (twitter, wordpress, stripe) don't.  

What's important is that both addresses are functional, and that one redirects to the other.
Example rewrite rule for Apache to redirect to non-www:
RewriteCond %{HTTP_HOST} ^www\.(.*)$ [NC]
RewriteRule ^(.*)$ http://%1/$1 [R=301,L]
It has become common practice to not use www with 3rd level domains, but to optionally make it a redirect. Example:, with a forwarding from

Thursday, November 1, 2012

Google: from Innovator to Average Quality Copycat?

Even though Google keeps shutting down products that did not take off or that don't pay off, they still pursue way too many new businesses and new products to become great at any one of them. Well, some are great. A few. But more and more are becoming just average.

Why is that?

Apparently Google has reached a critical size, and maintaining the existing stack demands huge resources. For developing new products, such as Google Drive, there just aren't enough a-list candidates available. And there is competition on the job market. While 10 years ago any candidate would choose Google over Microsoft, there are far more attractive players on the market now. Plus: starting your own business has never been cheaper.

At the same time other, agile players appear on the market, focusing on just one product.

Why do I use Google products even though there are viable alternatives?

  • Almost 100% uptime.
  • It's usually above average in functionality. 
  • The free version is not crippled beyond usefulness.
  • It's integrated with all the other products, same login, 1 click away.
  • "Don't be evil" motto. They don't run out of money and need to find new sources of income.
  • I don't want to learn a new product every 2 years. A new player might disappear quickly when running out of fundings. 

3 product categories

Products, not just for software, can be grouped into 3 categories:
  1. Those that make you freak out from time to time.
  2. Those that just work as they should.
  3. Those that, besides working, have some awesome things that you wouldn't even think of, but once you see them, instantly understand and love.
Microsoft products used to be in group 1. Your MS Word crashed and you didn't have the document stored? Windows blue screen?

Google was a breeze. Google products kept coming out with those category 3 features. Lots of innovation. Page rank. Text ads. Chrome is full of them. The software update process for example. Or how chrome opens tabs next to the current one instead of at the end. It took one instant, one click, and my thought was "Omg, why did no one think of this before!".

Enters Google Drive

Recently, Google products dropped to category 2. And the reason for this blog post is Google Drive, which is a clear category 1 product, and should be labeled Beta at most. But the Windows app "about" message gives it a stable version 1.5, and Wikipedia labels it as "stable". It only came out 5 months ago, but because Google entered that profitable market late, they apparently can't afford to label an "online backup service" as beta at this time.

Google Docs was good. My experience with the Google Drive Windows tool is bad so far. On the first day, after installation, and copying 2GB of data into the folder, it could not synchronize all files (had to re-try several times), gave useless error messages "an unknown issue occurred", crashed twice.
Now, a couple weeks later, I added a new folder with a 50MB file in it. Then I wanted to rename that folder, and Windows Explorer told me I can't because the file is locked (that must have been Google Drive). So instead (I did not wait) I created a new folder, and just moved the file itself (that worked). Job done, I thought. The next day, after waking Windows up from sleep mode, Google Drive tells me it crashed. It doesn't restart automatically. So I better check if my files are synchronized. As I drill down the folders I notice that they all have the green checkmark icon (synchronized). Arriving at the leaf, surprise surprise, the file is not synchronized.

Only now I realize that the web is full of articles about how Google Drive sucks, of people claiming that they lost files. A 26% hate rate is devastating (whereas Dropbox reaches almost 100% love rate from 600k people).

My experience with Dropbox

Way better. I've been using it for almost 2 years. Transfer rates are much slower, but that's not an important criteria for me. While using it almost daily, it never crashed, and I experienced the following 2 issues:
  1. Sometimes, when opening an MS Excel document (yes I still have some of those), Dropbox updates the change timestamp. This sucks, I don't want an entry in the changelog.
  2. One time, after waking up another device that was not synchronized in weeks, Dropbox thought that some files were conflicting, even though no one touched them in the meantime, and their content was identical too. What Dropbox did was keeping both "versions", one with a different file name (conflict and timestamp). Annoying, but no data lost.

My conclusion: Don't use Google Drive as a backup solution, have another backup somewhere. Or best, don't use Google Drive except for the Google Docs online documents.

A word to Google 

Please focus on the important products, innovate, and don't shut down the ones I'm using. And keep working on the robot car, I can't wait, the world will be a better place.

A word to Google Skeptics

This is good news for those who were concerned that Google would grow too big, take over the world, and know all about you. Nah. Remember the talks about breaking up Microsoft because it was too powerful, 12 years ago?

Thursday, October 25, 2012

Refactoring-Safe Switch on Enum

A switch statement in Java on an enumerated type looks something like this:

enum TrafficLight {

TrafficLight trafficLight = TrafficLight.GREEN;

switch (trafficLight) {
    case GREEN:
    case ORANGE:
    case RED:

This works, job done, I'm going home. Hopefully no one ever changes the enum.

The java manual writes:
You should use enum types any time you need to represent a fixed set of constants. That includes natural enum types such as the planets in our solar system and data sets where you know all possible values at compile time—for example, the choices on a menu, command line flags, and so on.
I see 2 possible problems with this definition:
  1. This text proves that what we think is set in stone can still change... planets in our solar system? It was probably written before 2006 when Pluto still counted as the 9th planet in our solar system. 
  2. "Compile time" vs. programming time. The enum values are known when writing the switch code. Who's compile time? Of the switch or the enum? "Choises of a menu" or "command line flags" can surely be modified and recompiled later on, without touching every line of code that made use of those enums. Especially when the enum comes from another code base (library).
Most enums I come across are not as unshiftable as the Months (JANUARY to DECEMBER).

So: What happens if someone changes TrafficLight after I wrote and compiled my switch? Possible scenarios:
  1. Element is renamed: someone decides to rename ORANGE to YELLOW.  
    1. Enum is defined in an external lib: compilation of switch code fails, we need to change. That's good, the switch statement must be updated and won't cause problems.
    2. Enum is in same code base: refactoring renames switch block too, case closed.
  2. Element is removed: That's good, the switch statement must be updated to compile, case closed. 
  3. Element is added: someone adds GREEN_BLINKING. We won't notice. If the developer who added the enum value does not go through all the usages then it's a possible cause of bugs.

Basic solution

switch (trafficLight) {
    case GREEN:
    case ORANGE:
    case RED:
        throw new UnsupportedOperationException("Unsupported enum value: "+trafficLight+"!");

Now we notice it as soon as the code is executed with TrafficLight.GREEN_BLINKING.
This still has the potential to go into production with a bug in the case that GREEN_BLINKING did not occur when testing. Maybe it's a very rarely used value. That's likely, because if it was an obvious element of the enum then it would have been in right from the start. And it's likely too that there is no unit test covering this value... if someone went to add one now then he would have updated the switch code too.

Thus a safer syntax is:

Better solution

enum TrafficLight {
    public static void assertSize(int expectedItems) {
        assert values().length == expectedItems : "Update the code calling this with "+
                                                  expectedItems+" instead of "+values().length+"!";

TrafficLight trafficLight = TrafficLight.GREEN;

switch (trafficLight) {
    case GREEN:
    case ORANGE:
    case RED:
        throw new UnsupportedOperationException("Unsupported enum value: "+trafficLight+"!");

This guarantees that, as long as this path is executed and assertions are turned on, we'll see.

Alternative solution

Java's switch still bears potential for bugs. Unfortunately there is no switch syntax that automatically breaks. Forgetting a break statement is a common programming error. If-then-else could be used, but that's ugly and has other issues.

Another technique is to force the enum values to implement the functionality directly:

enum TrafficLight {
    GREEN {
        @Override public void handle() {
    ORANGE {
        @Override public void handle() {
    RED {
        @Override public void handle() {
    public abstract void handle();

This way one can't simply add a value and break existing code. Unfortunately this technique isn't always feasible, for example when the enum is from an external dependency.


Suggestion for Java: add an assertSize(), expect() or similar method to enum.
Suggestion for IDEs: auto-generate such code when creating an enum or when switching on one.

History and evolution

My original use of this pattern was to have the enum's assertSize() method throw AssertionError directly instead of using the assert statement, and then throw AssertionError("Dead code reached!") again in the switch default block. That combination works, however, it always aborts the program - even in production. After rethinking this topic I've come to realize that it's bad.

There are many scenarios where a program could recover and go on from such a situation. For example a server program serving clients. A web server or web service, where each individual request might fail. If one enters such an execution path, the main routine can still catch that, and return an internal server error to the user. An abort would be fatal.

Tuesday, October 23, 2012

@GeneratedCode: Document when code was auto-generated

IDEs such as IntelliJ and Eclipse provide functionality to generate methods including toString(), equals() and hashCode().

When making changes to existing code, and I need to update such possibly auto-generated methods, I often wonder whether I can just drop and re-create them, or if they include some magic I'm not aware of.

A very simple example class in Java with equals and hashCode:

public class Person {

    private final String givenName;
    private final String surname;
    private final org.joda.time.LocalDate dateOfBirth;

    public Person(String givenName, String surname, LocalDate dateOfBirth) {
        this.givenName = givenName;
        this.surname = surname;
        this.dateOfBirth = dateOfBirth;
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;

        Person person = (Person) o;

        if (dateOfBirth != null ? !dateOfBirth.equals(person.dateOfBirth) : person.dateOfBirth != null) return false;
        if (givenName != null ? !givenName.equals(person.givenName) : person.givenName != null) return false;
        if (surname != null ? !surname.equals(person.surname) : person.surname != null) return false;

        return true;

    public int hashCode() {
        int result = givenName != null ? givenName.hashCode() : 0;
        result = 31 * result + (surname != null ? surname.hashCode() : 0);
        result = 31 * result + (dateOfBirth != null ? dateOfBirth.hashCode() : 0);
        return result;


Scenario: add another field called gender. Even though the class really is very basic, it already takes a moment to understand. Should I hack in the additional field, or re-create the methods from scratch?

With more complex classes it gets worse.

My solution: I document whenever I have the IDE create such code for me. I've chosen a new @GeneratedCode annotation. A Javadoc comment would work too. Once I modify the code by hand I remove the annotation.

A suggestion for IDE developers: The generated code could include an annotation, Javadoc or other visual mark. And, even better, it could remove that automatically when the code gets modified by the programmer and differs from the original generated one. And (more wishful thinking) when refactoring the class (adding a new field) it could even ask if that new field should be included in the method, which could simply be done be re-creating.

Thursday, October 18, 2012

My Vocabulary Enum Pattern

When writing a library to be used by other developers, adding code documentation (javadoc) is essential.

Unfortunately javadoc often requires text duplication - which violates the don't repeat yourself (dry) principle.

This causes

  • incomplete documentation
  • outdated documentation
  • wasted time
In open source software I find this problem all the time.

One case where this becomes eminent is the following: a library that provides a builder for a simple value object. Where do you document the values? 
  1. In the setter of the builder? 
  2. In the Constructor of the value object? 
  3. In the private field of the value object?
  4. In the getter of the value object?
My solution to this is an enum called Vocabulary that contains all such values.

 * Dictionary for terms used.
 * <p>Javadoc can link to these values with details explanations, instead of copy/pasting 
 * the same limited and possibly outdated text.</p>
public enum Vocabulary {

    Even though Java enums follow the convention to have all values in UPPER CASE,
    there is no need to follow this pattern here. Use the terms in the case that
    they appear in code.

     * The api-key also known as user-id. 
     * A UUID of 44 characters such as "da39a3ee-5e6b4b0d-3255bfef-95601890-afd80709"

     * Such as "/service/ping". Starts with but doesn't end in a slash.


Then the builder and value object look like:

class MyValueObject {
     * @see Vocabulary #apiKey
    public String getApiKey() { return apiKey; }

Within the IDE it's quick to get the explanation (ctrl-q in IntelliJ). And in the generated javadoc html pages it's just a click away. It would be nicer if there was a template syntax/conversion to inline the comment (is there?). Another thing I like about this is that it's easy on my eye - there is little noise in the javadoc. Once I understand the concept behind a term, I don't have to read it again.

Note: I do not add objects to the Vocabulary that are of a specific custom type. A String or Integer needs to be in because it's too general, the data type has no information about permitted value ranges. A MyType however already has the javadoc in its own class and thus usually does not need further documentation when used as a method parameter or return type.