Network failures like a scene from Dumb and Dumber

Metaratings
11 Feb 2016
00:00
Article

As featured in DisruptiveViews

It seems no matter what network operators do to have the most up-to-date infrastructure in place, provide the fastest speeds imaginable and offer the best services they can lay their hands on – if a network fails for one minute it all comes to naught.

The latest ‘victim’ of a massive network dropout was Telstra in Australia. Its mobile network is the envy of many. Its transformation projects spanning many years are almost legendary, the vast area it covers to serve a meagre population of around 24 million people is immense and the technology it has deployed to get the most out of every megahertz of spectrum is truly remarkable. Yet ‘human error’ yesterday resulted in a network crash and massive market backlash that can best be described as brutal.

The Service Status page at Telstra’s website was showing red lights across the board in all major cities and at the height of the interruption 4,663 reports of Telstra problems were logged by aussieoutages.com. The largest clusters of service faults were in the major state capitals of Sydney, Brisbane and Melbourne. Perth, Adelaide and Hobart were also affected.

Needless to say, customers vented their frustration via the most popular social networks. One has to wonder what lengths they had to go considering their network was down. Perhaps they used a friend’s phone?

Trending on social networks is a marketer’s dream, normally, but not when it is all negative or making fun of you.

Some emergency services and many businesses were affected. The Sydney Morning Herald reported that “a handful of customers have complained that they are unable to operate their businesses without the network, while others queried whether they would be charged for failed call attempts and interrupted data usage.” One irate customer threatened to not pay his bill this month in protest. Good luck with that.

Telstra’s chief operations officer Kate McKenzie told reporters at a press conference just after 4pm, “We apologise right across our customer base. This is an embarrassing human error.” This occurred after a malfunctioning node on the network was taken offline. Instead of reconnecting the servers to one of network’s operational nodes, a Telstra worker routed customers back to the dud connection point.

McKenzie said the company was working speedily to work out a way of providing affected customers ‘free data,’ but did not disclose how much data would be offered.

When Vodafone’s Australian network was plagued with technical issues in 2010, reportedly caused by a poor upgrade program, it suffered a mass exodus of customers from its network and threats of a class action. But today the most damage comes from negative press and social network lambasting. All those millions spent on marketing and flushed away with a single ‘human error’ may have been better spent on network resilience and disaster mitigation.

Last week, BTs broadband network suffered major downtime that affected BT home and business customers across the UK. The problem left most without a working internet connection, and many were also stripped of email and telephone services.

BT admitted that a faulty router was behind the widespread outage. Media outlets were quick to jump on the news. The Inquirer, for one, headlined with “BT blames broadband blackout on borked (sic) router – Services have now been fully restored after equipment cock-up, says firm.” But again, social media was far more rapid and caustic in putting BT down.

All these ‘incidents’ highlight three things. Firstly, customers will no longer condone loss of connectivity. It wasn’t such a big deal when they were only making phone calls, but today they notice every anomaly because they are constantly using their mobile devices.

Secondly, their ability via social media, to spread the word about any network failure is impossible to stem and extremely costly in negative press.

Thirdly, why are these major outages being caused by really basic and silly issues? What happened to the days of the days of 99.999% network operation? Perhaps our obsession with super-duper speeds and fancy digital services should be balanced by some better built-in network resilience and fallback systems.

Our obsession with external risk and threats may be diverting attention from much more basic internal risks highlighted by these recent events. Maybe it’s just time to get back to basics and concentrate on what CSPs should be doing first and foremost – providing connectivity – all the time.

Related content

Comments
No Comments Yet! Be the first to share what you think!