While the postmortem is still ongoing, a May 2017 British Airways computer system glitch appears to have had much in common with a 2016 Southwest Airlines outage. Both involved seemingly well-understood hardware failures that produced a cascade of problems that delayed an orderly recovery.
Planning for such outages is nontrivial. As the scale of data and network connectivity increases, models for post-failure recovery processing are difficult to model. That said, public perception that the airlines should be doing better to mitigate these outages is understandable. Second-guessing has already begun. Was it a lapse in cybersecurity? Massive outsourcing? Loss of talent? Cost-cutting?
It may have been a failure to properly model British Airways systems and the processes required to recover from a hardware outage.
H. Herodotou, B. Ding, S. Balakrishnan, G. Outhred, and P. Fitter, "Scalable near real-time failure localization of data center networks," in Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD '14. New York, NY, USA: ACM, 2014, pp. 1689-1698. [Online]. Available: http://doi.acm.org/10.1145/2623330.2623365