top of page

2024 Uptime Institute Annual Outage Analysis and WHY Failures Keep Occurring

Dive into the "2024 Uptime Institute Annual Outage Analysis and WHY Failures Keep Occurring" to uncover the stark realities behind data center outages and their impact on industry standards. Despite Tier-III data centers' theoretical reliability, actual performance significantly lags, with outage rates 1,010 times worse than expected. This analysis delves into the root causes, predominantly highlighting maintenance mismanagement as the primary culprit. With 74% of failures tied to critical infrastructure issues, the need for a robust solution has never been more apparent. The Amerruss Resilience Program emerges as a beacon of hope, offering unparalleled expertise and a proven track record of zero unplanned outages, far exceeding industry expectations. Discover how Amerruss LLC can transform your data center's reliability, ensuring your operations remain uninterrupted in an increasingly digital world.


The executive summary of the Uptime Institute Annual outage analysis for 2024 came out last Friday, which can be viewed as a PDF or covered in a webinar. 


The odds of a Tier-III data center having an unplanned outage in any given year are approximately "1 in 5,555," which translates to once every 5,555 years on average. 


But  actual results of (largely) Tier-III data centers, especially in the colocation space, of “1 in 5.5.”  


The actual results in industry are 1,010 times WORSE than they’re supposed to be, for the given Tier rating.


The actual results in the industry for Tier-III facilities are 75 times WORSE than expected, delivering a “severe” outage.


WHAT SYSTEMS ARE CAUSING THE FAILURES?


Now let’s look at where the outages are happening, in these facilities:

















74% of outages are due to mission-critical infrastructure failures!


One does not need to be an expert in the field, to quickly conclude that there’s a major issue in the critical facilities space.  These numbers are absolutely TERRIBLE.


Going back to a previous article, if the aviation industry performed as wretchedly as the data center industry, we’d see 370 commercial plane crashes per year, or one every day.  With such odds, would YOU fly commercially???


WHY DO THE CRITICAL FACILITIES KEEP FAILING?


I had written before that maintenance mismanagement, NOT human error, is the #1 cause of data center outages.   The latest report actually reinforces that conclusion.  The above image shows 74% of outages are due to failures of critical facilities' infrastructure, but the breakdown of what systems fail, really tells the tale:


UPS FAILURES, or UPS BATTERY FAILURES?


Modern UPS systems are extremely reliable, and have a long service life.  

UPS batteries are extremely reliable, but they do NOT have a long service life, and require extremely diligent monitoring of condition.  To be blunt, THEY are the #1 cause of data center failures, as a specific system.  (Don’t you find it interesting that they’re NOT listed in the causes of outages?)  


UPS batteries have a relatively short life (theoretically 5 years, on average), must be carefully monitored.   The individual battery cells in a string degrade unevenly; some fail within a year or two.  This matters because most batteries today are the VRLA type, and if a single cell fails, the entire battery string fails, similar to a weak link in a chain.  “Bad” cells MUST be swapped out as SOON as they are detected.  


THE OTHER CAUSES


The next four causes of data center outages are all systems that run extremely reliably for decades, provided they are properly maintained, periodically exercised, and repaired if ANY defect found is promptly addressed.  


Those of us who’ve been in the business a long time easily recognize how damning the above graph truly is…  these results can ONLY come from gross maintenance mismanagement, period.


In contrast, the independent audit program we've established at Amerruss LLC charts a new course in ensuring data center resilience.  Through careful monitoring and auditing, our program has consistently guaranteed optimal performance, thereby reducing the need for excessive redundancy and lowering operational costs. 


Our success story with Vanguard, a global financial giant with over $7.5 trillion in assets, epitomizes our capability. Over 6 ½ years, Vanguard's extensive portfolio of more than 60 sites, including Tier-II and Tier-III facilities, experienced zero unplanned outages, a feat considered statistically impossible according to industry standards.


The Amerruss Resilience Program extends a profound value proposition: not only meeting but exceeding industry standards and expectations.  Our program's foundation is built on deep technical expertise, bolstered by real-time feedback through quarterly audits and a proactive approach to maintenance and risk management.


For companies dependent on data centers for their operational continuity and competitive advantage, the choice becomes clear.  The Amerruss Resilience Program emerges as THE model of reliability and security in an industry fraught with uncertainties.


We invite you to engage with us at Amerruss LLC.  Let's explore how we can secure your IT infrastructure against unforeseen disruptions, ensuring your data center operations deliver the resilience your business requires.   Don’t leave your data center's resilience to chance. Reach out, and let us safeguard your IT assets with the utmost standards of reliability and efficiency.






bottom of page