Major Microsoft Azure outage was caused by a simple typo

(Image credit: HJBC / Shutterstock)

A Microsoft Azure DevOps outage in the South Brazil Region, which lasted over 10 hours, was caused thanks to a typo in the code that saw 17 production databases deleted.

Having apologized to impacted customers for the outage, Microsoft has now issued a full post-mortem, sharing details about the investigation that took place from when the outage was first noticed at 12:10 UTC on May 24, until its remedy at 22:31 UTC on the same day.

Microsoft principal software engineering manager Eric Mattingly shared details of the code base upgrade which formed part of Sprint 222. Inside the pull request was a hidden typo bug in the snapshot deletion job, which ended up deleting the Azure SQL Server rather than the individual Azure SQL Database.

Coding error

Mattingly explained: “when the job deleted the Azure SQL Server, it also deleted all seventeen production databases for the scale unit,” confirming that no data had been lost during the accidental process.

> The best database software

> Microsoft unveils bigger and more powerful Azure VMs

> Microsoft Azure accounts hit with phishing attacks to hijack virtual machines

The outage was detected within 20 minutes, at which point the company’s on-call engineers got to work, however according to the event log the root cause was identified at 16:04, almost four hours after the outage had begun.

Microsoft blamed the over ten-hour fix time on the fact that customers themselves are unable to restore Azure SQL Servers, as well as backup redundancy complications and a “complex set of issues with [its] web servers.”

Having learned from its mistake, Microsoft has no promised to roll out Azure Resource Manager Locks to its key resources, in an effort to prevent future accidental deletion.

Despite a same-day fix, customers in the region were left without access to some services for several hours, emphasizing how easy it is for things to go wrong and the importance of having backup plans to reduce reliance on single service providers, including cloud storage and other off-prem infrastructure.

Looking for an alternative? Check out the best CDN providers

With several years’ experience freelancing in tech and automotive circles, Craig’s specific interests lie in technology that is designed to better our lives, including AI and ML, productivity aids, and smart fitness. He is also passionate about cars and the decarbonisation of personal transportation. As an avid bargain-hunter, you can be sure that any deal Craig finds is top value!

See more Computing news

Latest

WWDC 2023 live blog: Apple VR headset, MacBook Air 15, iOS 17 and more

See more latest ►

Coding error

Are you a pro? Subscribe to our newsletter

Most Popular