Microsoft suffered a widespread outage on its Azure cloud platform on February 29 and it’s trying to make amends to its customers with a 33 percent credit for “affected billing month(s),” according to the company. Given Azure’s global footprint, the bug may have stretched into the next day, March 1, for some customers.
Azure customers, “regardless of whether their service was impacted,” should see the credits applied to the next billing period.
Although cloud outages are hardly new — Amazon, Google and Microsoft have all suffered “unplanned downtime” — they continue to pose a particularly thorny challenge for cloud providers. A widespread outage on a cloud the size of Microsoft’s can have the knock-on effect of downing the online services of hundreds or thousands of businesses and startups.
Microsoft’s mea culpa provides an uncharacteristically transparent look into the factors that went into the Leap Year outage and the steps that the company is taking to prevent it and similar occurrences in the future.
In an Azure Blog post, Bill Laing, corporate vice president of Microsoft’s server and cloud division, explained how Azure’s infrastructure lost its footing on that ill-fated day. It boils down to a date-based software bug that affected Azure’s Access Control Service, Windows Azure Service Bus, SQL Azure Portal, and Data Sync Services. (Windows Azure Storage or SQL Azure were unaffected.)
According to Laing, the Leap Day outage was caused by how security certificates are managed by the Azure’s virtual machine “guest agents” (GA), “host agents” (HA) and the fabric controllers that oversee clusters of 1,000 servers each. Laing writes, “When the GA creates the transfer certificate, it gives it a one year validity range. It uses midnight UST of the current day as the valid-from date and one year from that date as the valid-to date.”
The problem for that setup is that Leap Day occurs once every four years.
“The leap day bug is that the GA calculated the valid-to date by simply taking the current date and adding one to its year. That meant that any GA that tried to create a transfer certificate on leap day set a valid-to date of February 29, 2013, an invalid date that caused the certificate creation to fail,” he writes.
“When a GA fails to create its certificates, it terminates,” Laing writes. “The HA has a 25-minute timeout for hearing from the GA. When a GA doesn’t connect within that timeout, the HA reinitializes the VM’s OS and restarts it.”
From there, entire clusters were teetering on the brink. After a prolonged period of inaccessibility, the fabric controller called for human intervention, first for the affected servers and eventually for the fabric controller itself. Eventually, large swaths of Azure’s infrastructure was affected and it wasn’t fully brought back online until early on March 1.
Laing says that this trial by fire has given Microsoft clearer insights into Azure’s cloud configuration and management shortcomings. To prevent a Leap Day bug or other mishap from having such a widespread effect in the future, the company is taking new steps to strengthen Azure.
These include improved testing and better code analysis tools that will look out for time-related bugs. Microsoft has already analyzed its own code, says Laing. The company is also working to improve its fault isolation technology to better distinguish whether failures stem from hardware or software — in this case the fabric controllers incorrectly attributed the error to faulty hardware.
Pedro Hernandez is a contributor to the IT Business Edge Network, the network for technology professionals. Follow him on Twitter @ecoINSITE.
Ethics and Artificial Intelligence: Driving Greater Equality
FEATURE | By James Maguire,
December 16, 2020
AI vs. Machine Learning vs. Deep Learning
FEATURE | By Cynthia Harvey,
December 11, 2020
Huawei’s AI Update: Things Are Moving Faster Than We Think
FEATURE | By Rob Enderle,
December 04, 2020
Keeping Machine Learning Algorithms Honest in the ‘Ethics-First’ Era
ARTIFICIAL INTELLIGENCE | By Guest Author,
November 18, 2020
Key Trends in Chatbots and RPA
FEATURE | By Guest Author,
November 10, 2020
FEATURE | By Samuel Greengard,
November 05, 2020
ARTIFICIAL INTELLIGENCE | By Guest Author,
November 02, 2020
How Intel’s Work With Autonomous Cars Could Redefine General Purpose AI
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 29, 2020
Dell Technologies World: Weaving Together Human And Machine Interaction For AI And Robotics
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 23, 2020
The Super Moderator, or How IBM Project Debater Could Save Social Media
FEATURE | By Rob Enderle,
October 16, 2020
FEATURE | By Cynthia Harvey,
October 07, 2020
ARTIFICIAL INTELLIGENCE | By Guest Author,
October 05, 2020
CIOs Discuss the Promise of AI and Data Science
FEATURE | By Guest Author,
September 25, 2020
Microsoft Is Building An AI Product That Could Predict The Future
FEATURE | By Rob Enderle,
September 25, 2020
Top 10 Machine Learning Companies 2021
FEATURE | By Cynthia Harvey,
September 22, 2020
NVIDIA and ARM: Massively Changing The AI Landscape
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
September 18, 2020
Continuous Intelligence: Expert Discussion [Video and Podcast]
ARTIFICIAL INTELLIGENCE | By James Maguire,
September 14, 2020
Artificial Intelligence: Governance and Ethics [Video]
ARTIFICIAL INTELLIGENCE | By James Maguire,
September 13, 2020
IBM Watson At The US Open: Showcasing The Power Of A Mature Enterprise-Class AI
FEATURE | By Rob Enderle,
September 11, 2020
Artificial Intelligence: Perception vs. Reality
FEATURE | By James Maguire,
September 09, 2020
Datamation is the leading industry resource for B2B data professionals and technology buyers. Datamation's focus is on providing insight into the latest trends and innovation in AI, data security, big data, and more, along with in-depth product recommendations and comparisons. More than 1.7M users gain insight and guidance from Datamation every year.
Advertise with TechnologyAdvice on Datamation and our other data and technology-focused platforms.
Advertise with Us
Property of TechnologyAdvice.
© 2025 TechnologyAdvice. All Rights Reserved
Advertiser Disclosure: Some of the products that appear on this
site are from companies from which TechnologyAdvice receives
compensation. This compensation may impact how and where products
appear on this site including, for example, the order in which
they appear. TechnologyAdvice does not include all companies
or all types of products available in the marketplace.