No items found.

When Lightning Strikes Twice: Critical Power Resilience Lessons from Business Disasters

High-profile disasters like TSB's £330 million service failure and British Airways' £58 million outage demonstrate that power resilience is foundational to business continuity. While traditional on-site redundancy protects against component failures, these cases reveal a critical lesson: true resilience requires geographic diversity across multiple sites to protect against catastrophic single-location events.
Share this whitepaper

When Lightning Strikes Twice: Critical Power Resilience Lessons from Business Disasters

In today's digital economy, power outages aren't just inconvenient, they're existential threats. While connectivity often dominates data centre discussions, recent high-profile disasters have reminded businesses that power resilience remains the foundation of business continuity. As organisations increasingly adopt multi-site strategies, understanding the hard-won lessons from those who faced power-related catastrophes has never been more crucial.

The True Cost of Power Failure

When global banking giant TSB experienced a catastrophic power failure at their primary data centre in 2018, what was intended as a routine system upgrade spiraled into a disaster affecting 1.9 million customers. The 10-day service disruption cost them over £330 million in compensation and remediation costs, not to mention the immeasurable reputational damage and subsequent loss of 80,000 customers.

British Airways faced a similar nightmare in May 2017 when a power surge at their data centre near Heathrow Airport led to a global IT outage. The 72-hour disruption caused the cancellation of 726 flights, stranded 75,000 passengers, and cost the airline an estimated £58 million, all because backup power systems at their single primary facility failed to engage properly.

"We had backup systems, but they were all in the same physical location," admitted the BA CTO in a subsequent interview. "The cascading failure of primary and secondary systems taught us that geographic diversity isn't just nice- to have, it's essential."

Beyond Traditional Backup: The Multi-Site Imperative

Traditional N+N redundancy within a single facility provides protection against component failure but remains vulnerable to site-wide disasters. Consider these recent examples:

Manufacturing sector: Toyota lost an estimated $1.2 million per hour during a 2019 regional power grid failure in Japan affecting their supposedly "redundant" power systems, all located within the same industrial zone. Outage impacted production at 14 factories, resulting in 10,000 fwer vehicles being manufactured.

Healthcare Services: NHS Royal London Hospital faced a critical situation in 2018 when flooding damaged both primary and secondary power systems in their basement-level infrastructure room. Despite having generator backup, the single-site design meant all recovery systems were comprised simultaneously, forcing the diversion of emergency patients and cancellation of scheduled surgeries.

Retail Operations: Amazon Web Services experienced a significant outage in US-EAST-1 Region in 2021 when power issues at a data centre in Northern Virginia cascaded through their systems. The five-hour disruption affected major brands like Disney+, Ticketmaster, and Venmo, demonstrating how concentrated power dependencies can impact even the most sophisticated cloud infrastructures.

The Power of Distribution: Key Implementation Strategies

Organisations that successfully weathered major disasters share common approaches to power resilience:

1. Geographic Power Diversity

The most effective multi-site strategies ensure data centres draw power from different regional grids. This approach protected JPMorgan Chase during Hurricane Sandy in 2012. While many Wall Street firms went dark as Lower Manhattan flooded, JPMorgan maintained operations through their strategically located backup facilities in Delaware and Ohio, locations chosen specifically because they operated on entirely separate power grids.

2. Comprehensive Power Assessment

Leading organisations now conduct "power chain isolation analysis" to identify any hidden single points of failure. This methodology helped PayPal strengthen their infrastructure after a 2019 incident revealed that while their facilities appeared independent, they shared upstream connections to the same power transmission station, a vulnerability they promptly addressed through true geographic diversification.

3. Regular Real-World Testing

Businesses that survived major incidents typically practiced full-power failover scenarios regularly. Netflix's renowned "Chaos Monkey" approach, deliberately causing system failures to test resilience, exemplifies this philosophy. When AWS experienced a major outage in 2021, Netflix services remained largely unaffected, demonstrating how their commitment to testing power and service redundancy across multiple AWS regions paid dividends during a real crisis.

Implementation Guide: Building True Power Resilience

For organisations looking to enhance their power resilience strategy:

  1. Assess upstream dependencies: Look beyond your facility walls to understand the complete power delivery chain.
  2. Implement cross-regional backup: Ensure secondary sites aren't vulnerable to the same regional power events.
  3. Consider power source diversity: Google's data centers now complement grid power with on-site renewable generation at many locations, creating true source diversity that has helped them maintain their promised 99.99% uptime despite regional power challenges.
  4. Test realistically: Regular exercises should simulate complete power loss scenarios and measure actual recovery performance.
  5. Plan for extended outages: Modern resilience strategies include provisions for extended power disruptions, including guaranteed fuel delivery contracts and alternative temporary power arrangements.

Conclusion: The Multi-Site Advantage

As businesses become increasingly dependent on uninterrupted digital operations, the lesson is clear: true resilience requires geographic diversity in power infrastructure. The most successful organisations have moved beyond simple component redundancy within a single site to embrace true multi-site resilience, distributing not just their computing resources but their fundamental power dependencies across geographically distinct locations.

The 2021 OVH Cloud data center fire in Strasbourg, France provides perhaps the starkest recent reminder of single-site vulnerability. Despite robust on-site power systems, the catastrophic fire destroyed an entire facility and damaged others nearby, resulting in permanent data loss for some customers who lacked multi-site redundancy. In contrast, customers who had deployed across OVH's geographically dispersed data centers experienced minimal disruption.

For business leaders, the question isn't whether a power disruption will occur, but how effectively their organisation can weather the inevitable storm. As these disasters have demonstrated, the answer increasingly depends on embracing a multi-site approach to power resilience.

Contributing authors
No items found.

Table of contents

By subscribing you agree to our 
Thank you! You've been added to the list.
Oops! Something went wrong while submitting the form.

Our featured resources

We’re consistently contributing to our resource library with white papers, videos, guides, FAQs, glossary terms and more.

Stay up to date with Lunar Digital

Join our newsletter for regular updates

By clicking Sign Up you're confirming that you agree with our Terms and Conditions.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.