The Outlook.com Outage: What Really Happened

Microsoft has now taken to the Outlook.com Service Status page to explain, in great detail, what the issue was, why it happened, and what has been done to ensure it (hopefully) doesn't happen again.

Rod Trent

August 22, 2013

3 Min Read
The Outlook.com Outage: What Really Happened

Microsoft's Outlook.com service was unavailable for several days last week. The outage was sporadic and didn't affect all customers all the time. There has been speculation about the true causes of the outage. Even I was left wondering if the outage might have had something to do with upgrades to the system to allow Microsoft to release Skype for Outlook.com this week.

Microsoft has now taken to the Service Status page to explain, in great detail, what the issue was, why it happened, and what has been done to ensure it (hopefully) doesn't happen again.

Here's Microsoft's explanation…

Update and Resolution of Recent Outlook.com Outage We want to apologize to our customers who were affected by the outage on Outlook.com this week. We have restored access to all accounts and have made changes so that the service will be more resilient in the future. We realize that we have a responsibility to the customers who use our services to communicate and share with the people they care most about, and we apologize for letting those customers down this week. Our first priority is to the health of the services, and we will learn from this incident and work to improve the experience of all our customers. As part of that, we would also like to provide more detail about what happened. This incident was a result of a failure in a caching service that interfaces with devices using Exchange ActiveSync, including most smart phones. The failure caused these devices to receive an error and continuously try to connect to our service. This resulted in a flood of traffic that our services did not handle properly, with the effect that some customers were unable to access their Outlook.com email and unable to share their SkyDrive files via email. In order to stabilize the overall email service, we temporarily blocked access via Exchange ActiveSync. This allowed us to restore access to Outlook.com via the web and restore the sharing features of SkyDrive. These parts of the service were fully stabilized within a few hours of the initial incident. A significant backlog of Exchange ActiveSync requests accumulated as we worked to stabilize access. To avoid another flood of traffic, we needed to restore access to Exchange ActiveSync slowly, which meant that some customers remained impacted for a longer period of time. We have learned from this incident, and have made two key changes to harden our systems against future failure – one that involved increasing network bandwidth in the affected part of the system, and one that involved changing the way error handling is done for devices using Exchange ActiveSync. We will continue to monitor the system and make additional changes as needed to keep the service healthy. We are now fully through the backlog and have restored service so all customers should have normal access from all of their devices. We want to apologize to everyone who was affected by the outage, and we appreciate the patience you have shown us as we worked through the issues.

 

The original explanation is posted here:  Outlook.com problem report

 

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like