April 2011: A Bad Month for Uptime and Security in the Cloud
EC2 failed on April 21 around 5 AM Eastern time, and in nearby Virginia. That failure left many customers without their cloud services, including Reddit, Foursquare, Netflix, and other popular Net services, to say nothing of the hundreds of other, smaller cloud customers. Nor was it a brief outage, as it took about four days for EC2 to return to normal.
May 31, 2011
The area of North Carolina where I live was besieged by tornadoes on the evening of April 16, as was much of the rest of the state and nearby ones. (Of course, NC’s troubles were unfortunately dwarfed 11 days later by a second round of tornadoes that swept through Alabama and adjacent, as I’m sure most readers know.) One such cyclonic cloud passed a mere mile south of my house and thankfully my town’s only losses were some property, but it was still a scary sight, and a scary night to live through.
If you’re reading this, you probably also know that many folks in the IT biz were affected with other cloud troubles this last month, but rather than cyclonic, those clouds were clustered, as in Amazon’s Elastic Compute Cluster (EC2). EC2 failed on April 21 around 5 AM Eastern time, and in nearby Virginia. (I had nothing to do with it, honest.) That failure left many customers without their cloud services, including Reddit, Foursquare, Netflix, and other popular Net services, to say nothing of the hundreds of other, smaller cloud customers. Nor was it a brief outage, as it took about four days for EC2 to return to normal.
Amazon is not, however, the only cloud vendor who's experienced trouble in the past. You may recall that Microsoft somehow lost the cloud storage for the "Danger" telephone users for a while, and Google Docs has been down several times for as long as a half-day in the past few years, sometimes without a comment at all from Google. We’re all still living with the fallout from the Epsilon breach that I discussed last month—have you noticed all of the new annoying spam you’ve all been getting recently? And who could have missed that little “oops” from Sony, perhaps the company most strident in declaring the sanctity of their intellectual property, as you may recall that to this day they’ve not apologized for installing rootkits on their customer’s PCs, declaring that “it may be your PC, but it’s our content.” What a shame that a firm with a stance like that would handle your private information so cavalierly, at least if you’re a PS2 user.
I don’t mean for this article to be a simple cloud-beating session Quite to the contrary, I’m not suggesting that Amazon, Microsoft, or Google are doing a lousy job running their clouds. (I’m also not suggesting that Sony or Epsilon was unforgivably negligent or massively incompetent in allowing their breaches. In their case I say that I’m not suggesting it because I am stating it as a fact.) Quite to the contrary, it seems to me that if anyone can run a cloud service as well as is possible, it’d be Amazon, Google, and Microsoft—my guess is that in five years those three will be seen as three of the most trusted names in public cloud services.
No, what worries me enough to write this piece was the very fact that they probably will turn out to be the best cloud providers, and thus their standards of quality will be the benchmark, even if that means four down days a year. I wonder how many companies wanted to get to an important document during Amazon’s four-day blackout but couldn’t. I wonder how many companies desperately needed to pull up a particular spreadsheet residing on cloud storage so that they could retrieve the numbers they needed to meet, say, on April 22 to file a bid with a client, but couldn’t because of the outage, and so have lost some large contract. I wonder if all that money I’ve spent in the past couple of years on the Kindle editions of books and the online versions of purchased movies was money unwisely spent.
As I write this, Amazon’s cloud is aloft again, and so we’re back to talking about world events, new movies, and the like. Cloud apologists will continue to spout statistics pointing out that on the whole cloud services have better uptime than the average data center. But what will the cloud vendors take from this outage? Well, if Amazon’s business isn’t significantly hurt by this outage, I suspect that all cloud vendors will feel just a little less nervous about the occasional denial of service to customers, and that worries me. In comparison, however, it also occurs to me that when I lack network services because one of the servers that I take care of has failed, then I know exactly how hard I’m working to get my stuff back up, and knowing that gives me a feeling of control, a feeling I wouldn’t really have if all I could do to track the progress of an Amazon outage was to keep the “Amazon uptime status page” running in my web browser, pressing F5 every couple of minutes as if it were some sort of cargo-cult fetishistic device in the hopes of seeing little green lights appear where little red ones currently exist. (I also wonder how long I’d keep a job as a network administrator if I couldn’t get my employer’s network up for four days.)
Hey, if nothing else, this may lead to a new corporate excuse: “Gosh, Mr. Minasi, we really wish we could help you… but the cloud is down.” That’ll make a convenient excuse, given what we call clouds when they’re down at ground level, eh?
Related Reading:
About the Author
You May Also Like