Planning VoIP Networks

When planning to implement VoIP on a network, a network designer needs to pay particular attention to ensuring that sufficient bandwidth is available to support voice traffic on WAN links. In previous sections you already learned that the codec chosen will impact the bandwidth requirements associated with a voice call. However, other elements that need to be considered include the overhead associated with the RTP, UDP, and IP headers, as well as Layer 2 framing. In order to determine the required bandwidth, two main values first need to be calculated – the size of a voice packet, and the voice packet per second rate.

To calculate the size of a voice packet, you must add together the size of the RTP, UDP, and IP headers, as well as payload size and Layer 2 framing. For example, let’s say that you intend to use the G.729 codec without RTP header compression over a PPP link. In this case, the combined RTP/UDP/IP header size would be 40 bytes, as you learned earlier. The payload size would be 20 bytes, and the PPP framing an additional 6 bytes, adding up to 66 bytes total. Converting this number to bits yields a packet size of 66 x 8, or 528 bits total.
To determine the voice packet per second rate, divide the codec bit rate by payload size of a packet. Earlier in this chapter you learned that the G.729 codec uses a bit rate of 8 kbps, or 8000 bps. The payload of a G.729 voice packet is 20 bytes, or 160 bits. Therefore, the packet per second rate is 50 (8000 divided by 160).

Finally, to determine the bandwidth per call, multiple the total voice packet size by the number of packets per second. In this case, the calculation is 528 bits for the total packet size, multiplied by a packet per second rate of 50, for a total of 26400 bps (26.4 kbps). In other words, a call using the G.729 codec and no header compression requires approximately 26.4 kbps of bandwidth. When header compression is used (assuming a header size of 2 bytes rather than 40), the same calculation yields a bandwidth requirement of 11.2 kbps, so IP RTP header compression is definitely worth exploring. For example, if the bandwidth on the WAN link to be dedicated to VoIP traffic was 256 kbps, the link could handle approximately 9 simultaneous calls without, or 22 with IP RTP header compression. Don’t forget that voice conversations are duplex – in other words, with header compression enabled, a total of 11 simultaneous conversations across the WAN link could occur, or 11 voice data streams in each direction.

When planning a network to carry voice traffic, the numbers above provide a fairly accurate estimation of WAN bandwidth requirements. However, as you learned earlier, a typical voice call may consist of anywhere between 30 and 40 percent silence. In order to take advantage of these silences (and not transmit packets of “silence”), Voice Activity Detection (VAD) can be implemented using Cisco CallManager. When enabled, periods of silence are suppressed, and not packetized for transmission across the network. This can result in substantial bandwidth savings, which would subsequently be available for other network application traffic.

Packet Loss and Echo on VoIP Networks

Packet loss can occur on any network for a variety of reasons including congested links (packets dropped when buffers are full), routing problems, incorrect equipment configuration issues, and more. Because voice traffic uses UDP as its transport protocol, dropped packets are lost, and obviously not resent. On a voice conversation, this is recognized by what appears to be speech that is cut short or clipped – portions of the voice conversation might be lost, where one user speaking “Hello, may I speak to Dan?” might sound more like “Hello, may… Dan?”, or similar.

The codecs outlined in previous articles are typically capable of correcting (via a DSP) for up to 30 ms of lost voice traffic without any noticeable impact on quality. However, Cisco voice packets use a 20 ms payload, so effectively only one packet could be lost at any point in time. In order to avoid packet loss issues, it is important that the underlying IP network is properly designed (including redundancy), and that QoS techniques are implemented effectively.

Another issue that impacts voice calls is one that you have likely already experienced, namely the sound of your own voice echoing back to you a short time after speaking. Echo occurs when part of the voice transmitted “leaks” back on to the return path of a call. To compensate for this, most codecs use built-in echo cancellation techniques. On a Cisco gateway (such as a router), echo cancellation settings are configured by default, but can be tuned in order to compensate for different degrees of echo experienced by users.

Variable Delays Associated with VoIP Traffic

The bullet points below each type of variable delay encountered with VoIP traffic, and how these issues can be compensated for where possible.

  • Queuing Delay. When a WAN interface is congested, traffic must be queued using any of the various methods looked at in this chapter. Although a method like LLQ can prioritize voice traffic, another issue exists. Consider a situation where a router interface currently has no priority traffic waiting to be sent, and begins to forward a large frame containing FTP data. If a voice packet then arrived, it could not be sent until the FTP packet (which is already being serialized) is completed. As such, the voice packet is subject to wait, incurring a delay that might not be reasonable. For example, if the voice packet is stuck behind a 1500 byte frame being sent over a 64 kbps link, it would be subject to a delay of approximate 185 ms, which (in conjunction with other delay factors) would make it well beyond acceptable. To account for queuing delays, a technique called Link Fragmentation with Interleaving is used on links with speeds below 768 kbps. When implemented, a router will fragment larger packets (like FTP in this example) into smaller sizes, and then “interleave” the voice packets onto the link. As such, the voice packets would not need to wait for the entire single FTP packet to be sent. When choosing a fragment size, it should be one that aims for approximately a 10 ms delay but does not fragment voice packets.
  • Dejitter Delay. As mentioned earlier, jitter occurs when packets do not arrive when expected. When dejitter buffers are configured at the receiving end of a voice network, packets that arrive with timing variations are buffered, and then converted to a constant delay. The use of dejitter buffers does add some delay to voice network, so the buffers should generally be kept small. Other QoS techniques looked at earlier in this section help to reduce the overall exposure of voice traffic to jitter issues, but dejitter buffers are a specific solution to help minimize jitter on the receiving end of a VoIP network.

It’s important to keep the concept of “end-to-end” in mind when calculating delay. Don’t forget that if a packet needs to pass through 3 routers, and each router adds a 10 ms delay to the forwarding of the packet based on queuing considerations, that adds an additional 30 ms of delay to the packet between the source and destination. Calculations of end-to-end delay will be looked at in more detail shortly in the planning section.

Predictable Delays Associated with VoIP Traffic

The bullet points below each type of constant or “predictable” delay encountered with VoIP traffic, and how these issues can be compensated for where possible.

  • Processing/Packetization Delay. Processing and packetization delays are a function of the time that it takes to actually create, code, compress, decompress, and decode packets. This is influenced by the codec(s) used on the network, as well as the use of specialized hardware or software during the coding process. For example, a DSP (hardware) will handle these processes more quickly than specialized software.
  • Serialization Delay. Serialization delay is the defined as the amount of time that it takes to physically place a frame onto a serial link. This is influence by both the size of a frame and the speed of a link, and is calculated by dividing the length of a frame by the bit rate. The faster the link and the smaller the frame, the lower the serialization delay. For example, the serialization delay associated with sending a 160 byte (1280 bit) frame across a 128 kbps would be 10 % or 10 ms (1280 bit frame divided by the 128,000 bps link speed).
  • Propagation Delay. Propagation delay is the amount of time that it takes for a signal to be propagated across a network between a sender and a recipient, and is calculated at a rate of 0.0063 km/s. Because physical laws and properties define it, the network designer cannot influence this type of delay.

The Impact of Delay and Jitter on VoIP Networks

While a traditional circuit-switched voice call has a dedicated circuit and bandwidth allocated to it, packet-switched voice calls are subject to two main issues that impact the perceived quality of a call, namely delay and jitter. Delay comes in various forms, impacted by everything from the speed at which a voice packet is created using various codecs to the amount of time that it takes to propagate a signal along a path between a sending and destination node. Of course, a variety of other factors, including congestion, can add to the overall delay of a packet. Recall that in order for a voice call to proceed smoothly, the end-to-end delay should not exceed 150 ms. Later in this section you’ll learn more about the two main types of delay that can impact packet-switched voice connections, namely constant delays and variable delays. Both need to be considered in order to understand how voice traffic is impacted when traversing a packet-switched network.

Although the overall delay impacts the quality of a voice call, another key consideration is the difference between when packets are expected to arrive and when they actually arrive – a concept known as “jitter”. While it may not make a big difference if traditional data packets are received with timing variations between packets, it can serious impact the quality of a voice conversation, where timing is everything. In order to compensate for the fact that voice packets can be received with variable rather than constant timing, VoIP endpoints implement what is known as a “dejitter buffer” in order to change the variable delay back to the expected constant delay expected.

QoS Mechanisms for Improving VoIP Quality (Part 2)

In order for a queuing mechanism like LLQ or IP RTP Priority to queue voice packets into a priority queue correctly, they must be able to identify the traffic as VoIP. With IP RTP Priority, packets are matched and priority queued according to the UDP port numbers used by RTP voice traffic, which fall into the range 16384 to 32767 (even port numbers only) in Cisco implementations. Odd UTP port numbers in this range are used for call control information, and are not prioritized – they are serviced by the WFQ method like all other traffic.

With LLQ, VoIP traffic is typically determined based on either port numbers (through the use of access lists), or through traffic classification mechanisms. If you recall from Chapter 4, IP headers includes a field that can be used to designate a service “type”, also known as Type of Service (ToS) or IP Precedence. Based on the value configured in this field, network equipment like routers can be configured to grant certain types of traffic (like VoIP) a higher priority based on the queuing methods in use. For example, on a network that supports voice traffic, all voice packets could be tagged with an IP Precedence value of 5. Because this setting is configured in the IP header, it will stay with a packet all the way from the source to the destination, helping to ensure end-to-end QoS, again assuming an appropriate queuing mechanism that considers this information is implemented on all intermediary routing equipment. LLQ would be the logical choice in such a scenario. On most networks, VoIP traffic has its IP Precedence value configured at the edge of the network, namely on an IP phone. In some cases, however, the phone might not have this ability, and IP Precedence settings might be added to the packet at the distribution layer according to configured policies.

QoS Mechanisms for Improving VoIP Quality

Implementing QoS mechanisms is another key consideration in order to ensure that VoIP traffic is forwarded across a network in a timely manner. A variety of different queuing mechanisms can be used on WAN interfaces to help prioritize voice traffic in order to ensure that it is serviced in this manner, and not delayed by other traffic that is less time-sensitive. While the four main queuing techniques typically implemented on Cisco router serial interfaces were looked at earlier in this chapter, voice traffic is typically prioritized using one of the three queuing methods listed below.

  • Class-Based Weighted Fair Queuing (CBWFQ). Class-based WFQ works in a manner somewhat similar to traditional WFQ, with the exception that “classified” traffic can be placed into reserved bandwidth queues, ensuring that certain types of traffic (such as VoIP) are allocated a guaranteed amount of bandwidth. A scheduler services the queues based on the bandwidth assigned to them, also known as the “weight”. While CBWFQ ensures that all packets are allocated appropriate bandwidth based on their weight (and that all queues are serviced), it does not implement strict priority. In other words, this queuing method can still result in delays for VoIP traffic.
  • Low Latency Queuing (LLQ). The LLQ queuing method is strongly recommended as the queuing method for use on WAN links that need to support time-sensitive traffic like VoIP. While LLQ functions in a manner very similar to CBWFQ, it does implement one very important additional feature, namely a priority queue. The priority queue is allocated a defined amount of priority bandwidth (weight), and is always serviced first as long as it does not exceed this bandwidth. Other types of traffic can be assigned to reserved queues (or a default queue) with pre-defined weights, ensuring that they are not starved of bandwidth.
  • IP RTP Priority. The IP RTP Priority queuing method presents one of the simplest methods to ensure that VoIP packets are serviced with appropriate priority. When this queuing method is implemented, RTP voice packets (only) are automatically placed into a priority queue, while all other traffic is queued according to WFQ methods. IP RTP Priority can be implemented with a single command, which makes it an easy way to prioritize voice traffic, especially in environments where all other traffic can be handled equally. IP RTP priority does not become active until a WAN interface is experiencing congestion.

LLQ and IP RTP Priority are the two most popular queuing methods for prioritizing VoIP traffic.

VoIP, Network Congestion, and the Importance of QoS Techniques

Network congestion is an issue that can lead to a variety of problems on any data network; when the data network is also supporting voice traffic, these issues are even more serious. For example, WAN interfaces on a router may already be at or very near to capacity, leading to queuing issues that may result in packets being delayed, or even dropped as queues fill up. While this might not be a huge issue for non-interactive and reliable traffic like an FTP transfer, it presents a much greater problem when the network needs to support highly interactive traffic like packet-switched voice. If the level of congestion is high enough, users may not be able to complete their calls, have existing calls dropped, or may experience a variety of delays that make it difficult to participate in a “smooth” conversation.

In order to properly design a network to support voice traffic, WAN links need to be provisioned correctly, and QoS mechanisms need to be implemented in order to ensure that voice traffic is prioritized. When provisioning a WAN link to support multiple services (including voice), the available bandwidth of the link should be provisioned such that total data traffic accounts for a maximum of 75% of the necessary bandwidth, while the remaining 25% is available for additional needs, such as routing protocol requirements. When provisioning or planning a WAN link that will support voice traffic, keep in mind that the codec used will have the biggest influence on the amount of bandwidth used. Multiplying the bandwidth figure associated with a codec by the number of simultaneous phone conversations that need to be supported provides a good indication of how much bandwidth will need to be dedicated to voice traffic only across WAN links. Of course data traffic will also need to be considered, but this will vary in different network environments.

The Importance of Network Quality When Transporting VoIP Traffic

In order to support time-sensitive applications like VoIP, a network should currently be running at a high performance level, and have the capacity to shoulder the load that will be associated with adding an additional (and time-sensitive) service. As such, a network that already suffers from over-utilization, delay, instability, and latency issues would not be a good candidate to take on packet-switched voice traffic in its current state. In many environments, the proposed implementation of packet-switched voice traffic may necessitate upgrades of hardware or network media (including WAN links), as well as a thorough analysis of network protocols and their configuration. For example, the implementation of IP telephones may require the purchase of new or additional access layer switches. Similarly, voice network modules may need to be added to existing or new routers to support connections to a PBX or the PSTN. The network and routing protocols in use should also be considered, since legacy (and possibly unnecessary) protocols may still be in use on the network, thus negatively impacting performance.

The key to determining network quality is to perform a thorough analysis of the current environment, addressing areas that require attention, and then ensuring that the network is performing in a manner that can support the proposed voice traffic. Many of the software tools looked at in other articles can help a network designer to identify the potential problem areas that may exist. Although these recommendations may seem very generic (they are), any existing data network must be properly designed, stable, and performing at a high level if you intend to implement technologies like VoIP.

Transporting VoIP Traffic with UDP and RTP

In the H.323 article you learned that VoIP communications uses a combination of both TCP and UDP at the transport layer. TCP is the transport protocol used for primary call control functions and signaling including call establishment, flow control, codec negotiation, and so forth. Call control functions rely on reliable communications facilities in order to ensure that calls are completed and maintained correctly. In contrast, actual voice data is very time-sensitive, and as such benefits from the lower latency associated with UDP. You should recall that UDP headers do not include sequence numbers, nor include any reliability mechanisms such as acknowledgements. UDP is built for speed, and that’s really the name of the game when it comes to transporting voice traffic over a packet-switched network.
While UDP helps to reduce the delay associated with transporting VoIP traffic across a packet-switched network, the fact that UDP does not include sequence numbers or any type of timing information presents an issue. As such, UDP relies upon an upper layer protocol to provide these features, namely the Real-Time Protocol (RTP). RTP uses UDP at the transport layer, providing both sequencing information so that packets are delivered in the correct order, and timing information so that issues such as network delay can be accounted and compensated for. Some of the techniques used to compensate for delay and other issues on VoIP networks will be explored in more detail in the next section.

One key consideration when looking at the transport of time-sensitive traffic over a packet-switched network is the size of the actual packets transferred. The standard payload of a voice packet on a Cisco network is a 20 ms sample of voice, which is usually in the vicinity of 20 bytes (for example with the G.729 codec), although it can vary depending upon the codec used (G.711 uses a 160-bye payload, while G.726 uses anywhere from 40-60 bytes). In contrast, the combined headers of RTP, UDP, and IP add up to a total of 40 bytes, meaning that “overhead” accounts for approximately 66% of the size of a packet with a 20-byte voice payload, which is not very efficient at all.

While high-speed networks like switched Ethernet can easily facilitate the overhead associated with RTP packets, WAN links are typically much slower, and bandwidth is at a premium. Earlier in this chapter you were briefly introduced to a compression method supported on Cisco equipment, namely RTP header compression, or cRTP. In cases where VoIP traffic needs to traverse slower serial links, enabling cRTP is a great idea, since it compresses the RTP/UDP/IP header size from 40 bytes to anywhere between 2 and 5 bytes. Obviously this is a significant savings in terms of header overhead, reducing it from approximately 66% to anywhere from 9-20% assuming a 20-byte payload.

RTP header compression is enabled on a link-by-link basis on Cisco routers. It is only recommended on links with speeds up to 2 Mbps. In fact, Cisco only supports cRTP on serial interfaces using Frame Relay (Cisco encapsulation), HDLC, and PPP encapsulation, along with ISDN interfaces. RTP header compression is not used on higher-speed interfaces (like Ethernet) because of the tradeoff involved in terms of higher CPU utilization.