Skype goes down, not out

John C. Tanner
15 Feb 2011

There's no way I can prove this, but it's a reasonable assumption that at least some telco executives felt that the Skype blackout of December 22 was an early Christmas present.

Maybe not. After all, telcos may pride themselves on network reliability, but as Asian carriers know from the Boxing Day earthquake of 2006 and the earthquake/typhoon double-punch of 2009, networks can fail far beyond a routine reroute.

However, that was physical damage from natural disasters. Skype's problems were technical. Making matters worse was that it was Skype's own P2P architecture that let it down - the same architecture that Skype has often touted as making it more reliable than proper telco networks.

Oh dear.

To summarize the official explanation from Skype CIO Lars Rabbe:

The failure was related to "supernodes" in the Skype network - computers that serve as phone directories to help Skype users find each other. Due to a cluster of support servers handling offline instant messaging becoming overloaded, and a bug in a widely used version of a Skype for Windows client, between 25% and 30% of Skype's supernodes (i.e. the ones running the same buggy Windows client) crashed. The resulting surge of traffic on the remaining supernodes, exacerbated by millions of users restarting their crashed Windows clients at the same time, essentially forced most of the remaining supernodes to shut down in self defense.

Results: Skype was unavailable for many users for at least 24 hours, and it took Skype engineers two days to build enough extra supernodes to bring everything back to normal.

Rabbe said Skype was taking measures to prevent a similar failure in future - chiefly, working on better ways to get automatic software fixes to its users (evidently the Windows "bug" had been detected before the failure, and Skype already had a fix for it) and improving its software testing procedures, as well finding ways to detect supernode problems more quickly.

Some critics will undoubtedly characterize the episode as a serious blow to Skype's credibility as a communications provider. Overall, though, I don't see Skype suffering too much from this - not as long as it follows through with its goals to make the improvements necessary to keep this type of failure from happening again. In a way, Skype is lucky this happened while its user base was still mainly consumers using the free service. Either way, when Skype next updates its subscriber figures, they're not likely to be lower.

Related content

No Comments Yet! Be the first to share what you think!