If you were hoping to access Reddit, Quora, FourSquare, Hootsuite, SCVNGR, Heroku, Wildfire, parts of the New York Times, ProPublica and about 70 other sites last Thursday, you may have been out of luck.
It was also no fault of their own that they went offline, some for up to 36 hours, it was a cloud problem. More specifically, a problem with Amazon’s cloud services otherwise known as Amazon Elastic Cloud Compute or EC2. Nevertheless, Amazon’s damage control has kept the news pretty quiet, considering the number of sites and people affected. Most people would not even know that Amazon was hosting their favorite site.
In what news services are describing as a killer blow to the blossoming cloud services industry, the unquestionable leader of the pack failed, and with it the promise of scalable, flexible, cost effective and particularly efficient solutions for enterprises lost considerable credibility.
CNN likened it the Titanic of online services sinking. Mashable called it a Cloudgate or Cloudpocalypse. Not wanting to be as over-dramatic as these reputable news services, The Insider is more concerned with the repercussions it will have on a burgeoning sector of our industry.
Most concerns to date have been over the issue of security of data with less concern over the reliability of the systems themselves. It seems quite incredible that Amazon, with all its brilliant, tried and tested technology could have suffered such a high-profile glitch.
The trouble was apparently due to “excessive re-mirroring of its Elastic Block Storage (EBS) volumes.” The crash started at Amazon’s northern Virginia data center, located in one of its East Coast availability zones. In its status log, Amazon said that a “networking event” caused a domino effect across other availability zones in that region, in which many of its storage volumes created new backups of themselves. That filled up Amazon’s available storage capacity and prevented some sites from accessing their data.