24 September 2009

Google Outage: The "Fail Whale" Fails to Amuse

Many IT managers are anxiously monitoring the landscape to read whether the time is right to move applications to The Cloud.

This May 2009 report of a short Google outage by ZDNet's Larry Dignan is not the sort of message that Google, Microsoft's Azure team, or Salesforce.com would like to distribute. As others have pointed out, it's not that internal data centers are immune to outages. On premises outages, despite claims of 99.999 uptime, can be just as difficult to correct and just as pervasive in their effects on the enterprise.

This is not an issue of cloud vs. on-premises IT infrastructure. Rather the issue is one of perception and how outsourced cloud outages are handled.

Google apparently put up its "fail whale" (see the ZDNet post for a screenshot), but didn't post anything to Twitter until after the problem, from its point of view, was "solved." Cloud outages can take down every application, which arguably is less likely to happen with some types of in-house outages. The loss of control and lack of information exacerbates the shroud of mystery that accompanies these normally high reliability systems. One sees this at the airport on a regular basis. When passengers are kept informed on a regular basis, they may not be happy, but they are less unhappy than when they are kept in the dark and encouraged to foster rumors.

No comments: