Cost optimization, convergence and productivity improvement are critical factors driving enterprises and network operators to adopt unified communication (UC), VoIP, telepresence/videoconferencing, cloud-based services, collaboration and virtualization. The underlying IP/MPLS transport is a key ingredient of this migration, since it provides enterprises and network operators with the ability to launch services quickly and cost effectively over managed or contracted cloud architectures. As enterprises and network operators place more critical services on IP networks, the ability to assure the delivery of these services over IP is becoming crucial.
Along with the growth in real-time IP service adoption, we have witnessed the evolution of solutions geared toward assuring these services. However, Heavy Reading's analysis shows significant gaps still exist between the needed capabilities and those currently available in the market. As human beings, not just computers, are typically the ultimate end-users of these real-time services, the emphasis is changing from a focus on "how the network is performing" to "what is the user experience" for these applications.
Business-critical real-time services are defined as those communication and collaboration services that are extremely sensitive to "conversational quality" – where the timing of the transmitted and received information is an essential component of the perceived service quality. Latency, packet drops and jitter typically are used to measure this conversational quality in IP networks – but many factors, ranging from the simple to the complex, contribute to service-quality degradation in IP networks. Examples of simple factors are network congestion, hardware faults and incorrect equipment configuration. Complex factors are usually routing related, typically of short duration, and difficult to isolate: route flaps, BGP peer resets, no routes/black holes, routing loops, etc.
It is understood that the underlying topology of IP networks is constantly changing. Heavy Reading's research clearly points to the fact that the paths through the network formed from the routing protocols are always adapting to real-time events in the network. This instability can be long-lived, persistent, of short duration or periodic. Regardless, all these events can have a significant impact on the paths taken by packets through the network and the ensuing end-user QoE. Network instabilities can be related to software/hardware defects, configuration errors, and transient physical/data-link issues. These triggers result in routing convergence events in which the IP routing protocols are required to recalculate shortest paths. Convergence can be slow, and the packet delivery is neither guaranteed nor deterministic during convergence events. Additionally, IP networks are susceptible to oscillations of routing changes, called route flaps, which cause re-convergence events to occur repeatedly. This results in poor end-to-end network performance, increased packet drops, or delivery of packets out of order, which degrades the efficiency with which packets are forwarded through the network.
Legacy Internet services, such as email, HTTP (Web), and file transfers, are more tolerant than real-time services to delay, latency, and retransmissions, so they are not as significantly affected by these instabilities. Real-time services, however, require guaranteed and deterministic packet delivery. They do not have the luxury of retransmissions and are extremely sensitive to degraded forwarding characteristics of routing instabilities.
Current mechanisms for IP traffic monitoring and data collection are not sufficient for analyzing these issues in real-time services. Monitoring based on log analysis, SNMP polling, periodic KPI gathering, probes, etc., cannot see any of these network events themselves, but only the end result of these events. So, when the service management/operations team tries to determine the root cause, it is almost impossible to accurately determine the causality between the network and the service. This results in skilled IP and services resources spending hours, if not days, trying to manually correlate the multiple network domains. Real-time services represent a significant change in the nature of traffic carried by IP networks. Because real-time services are extremely susceptible to packet loss, delay, jitter, and transient routing changes, assuring real-time services requires extra rigor in monitoring both the communication and the underlying network. It also demands the ability to correlate across multiple network layers and organizational silos to shed insight into service performance and end-user experience.