It's 2010 and yet the entire state of Virginia's DMV system is being taken down for an entire day by a hardware failure. Here's the state's message to its citizens: "DMV will not be able to process driver's licenses or ID cards in its 74 customer service centers on Monday, August 30 due to a statewide computer system problem affecting those transactions only." As reported by ZDNet, the outage is being blamed by contractor Northrop Grumman on the failure of two circuit boards manufactured by EMC -- i.e. a SAN failure. I must agree with ZDNet's Dignan that something just doesn't seem right about the explanation being given by project managers. Disaster planning? Redundancy? Single point of the failure? And that's just the beginning. What if there'd been a real disaster? After all, this is Virginia, home of many federal agencies and much of the federal workforce. ◦
30 August 2010
VA DMV Closes Due to Statewide Glitch
It's 2010 and yet the entire state of Virginia's DMV system is being taken down for an entire day by a hardware failure. Here's the state's message to its citizens: "DMV will not be able to process driver's licenses or ID cards in its 74 customer service centers on Monday, August 30 due to a statewide computer system problem affecting those transactions only." As reported by ZDNet, the outage is being blamed by contractor Northrop Grumman on the failure of two circuit boards manufactured by EMC -- i.e. a SAN failure. I must agree with ZDNet's Dignan that something just doesn't seem right about the explanation being given by project managers. Disaster planning? Redundancy? Single point of the failure? And that's just the beginning. What if there'd been a real disaster? After all, this is Virginia, home of many federal agencies and much of the federal workforce. ◦
16 June 2010
Twitter Outages: Network Simulation, Anyone?
Whether caused by increased use over the World Cup, or other causes, Twitter is having service level management and possibly other issues. In their June 11 post acknowledging the problems and blaming it on a "perfect storm" of issues, their engineers attributed the outages to capacity planning and errors in configuration management. While some, such as Forrester's Gualtieri, quoted by Computerworld, accept this level of transparency, it's minimal at best. My recent TechRepublic post on capacity planning for backup has some suggestions that Twitter might want to consider; the methods currently in use by the firm, one guesses, haven't involved careful simulation or other prudent measures.
◦
Twitter Outages: Network Simulation, Anyone?
Labels:
capacity planning,
simulation,
Twitter
07 June 2010
iPhone Contract Perk May Have Sunk AT&T Servers
Some days or months before the iPhone 4 announcement that transpired today, it's clear that Apple and AT&T very likely had a number of planning sessions. The result was an agreement by AT&T to offer a number of iPhone customers currently under contract an option to upgrade to iPhone 4 this month. The unsurprising result was undoubtedly a peak in traffic to the AT&T iPhone account management server page. The surprising result was that, despite the advance planning, the AT&T site may have been unprepared for the traffic.
According to a Computerworld post by Gregg Keizer, the result was that the site could was either taken offline intentionally or, probably more likely, could not handle the volume. ◦

According to a Computerworld post by Gregg Keizer, the result was that the site could was either taken offline intentionally or, probably more likely, could not handle the volume. ◦
iPhone Contract Perk May Have Sunk AT&T Servers
01 June 2010
Configuration Mgmt Failure Caused Military GPS Outage
It's been said more times than this glitch reporter could count, but a net-centric military must make certain assumptions about what services are of status "always-on." GPS is one of those. But apparently, according to an AP report, "as many as 10,000 U.S. military GPS receivers were rendered useless for days. . ."
The problem was blamed on "incompatible software." According to the report, an Air Force defense contractor installed software in certain Trimble Navigation receivers that was incompatible with other elements of the system -- a ground control system that received an update in January 2010. The update was part of a new generation of GPS satellites ("Block IIF).
A more IT-savvy writer might have referred to this as a configuration management failure, but at least the AP kept after the Air Force to provide a narrative for the problem.
The AP story concludes with a discussion of cybersecurity risks to the GPS software. While the discussion covers jamming and straightforward outages, the risks of insider threat are not fully explored. (Note: The Trimble "recon" handheld shown is illustrative of the company's products -- not necessarily the one involved in this glitch report).
◦

The problem was blamed on "incompatible software." According to the report, an Air Force defense contractor installed software in certain Trimble Navigation receivers that was incompatible with other elements of the system -- a ground control system that received an update in January 2010. The update was part of a new generation of GPS satellites ("Block IIF).
A more IT-savvy writer might have referred to this as a configuration management failure, but at least the AP kept after the Air Force to provide a narrative for the problem.
The AP story concludes with a discussion of cybersecurity risks to the GPS software. While the discussion covers jamming and straightforward outages, the risks of insider threat are not fully explored. (Note: The Trimble "recon" handheld shown is illustrative of the company's products -- not necessarily the one involved in this glitch report).
◦
Configuration Mgmt Failure Caused Military GPS Outage
Labels:
configuration management,
GPS,
USAF
01 March 2010
It's Not A Game: PS3 Firmware Bug Affects Product Worldwide
"No offline game play" appears to be the result of a firmware bug, according to CNET's coverage of a worldwide glitch affecting almost all Playstation 3 (PS3) game consoles. The bug, reported by the console software as "error 8001050F," (this site offers a humorous "fix" video) is variously reported as caused by a calendar problem or by issues with "trophy support." A calendar issue, either direct or indirect, could be suspected due to longstanding issues with weak software testing of leap year conditions. A recent example appeared in a preview version of Microsoft SQL Server 2008.
Photo courtesy of Wikipedia Commons.
◦
Photo courtesy of Wikipedia Commons.
It's Not A Game: PS3 Firmware Bug Affects Product Worldwide
26 December 2009
Software Test Failure Apparent Cause of Latest Blackberry Outage
In its second outage this month, messaging services including email were affected for all North American customers of Research In Motion (RIM), maker of the popular Blackberry smartphones. Phone service was unaffected. While no announcements are archived on RIM's Press Releases page, it's tempting to recall the April 2007 outage, which was blamed on an update intended to improve cache performance. This time it appears the cause was an error in a new release of Messenger, the client application for RIM's devices.
◦
Software Test Failure Apparent Cause of Latest Blackberry Outage
18 December 2009
Blackberry Email Outage at RIM Affected All Carriers
See http://bit.ly/7dxqfP. Yesterday's outage affected all carriers. Details when and if provided by RIM.
◦

Blackberry Email Outage at RIM Affected All Carriers
Subscribe to:
Posts (Atom)





