VoIP Codecs and Call Quality

It is also important to consider factors like sound quality and delay when choosing a codec. The standard measure of sound quality is known as a mean opinion score (MOS), a subjective measurement that ranges in value from 1 (lowest) to 5 (highest). Similarly, because different codecs use different compression schemes, they are subject to processing delays that will ultimately impact the perceived quality of the voice call. The table below outlines the MOS scores and relative delay (in milliseconds) associated with each codec.

MOS scores and delay associated with different voice codecs.

When choosing a codec, keep in mind that the two elements that it impacts most strongly are sound quality (determined via the MOS score) and required bandwidth.

It should now be clear that the network designer has a number of factors to consider as part of choose a codec for a voice network. While the G.723.1 codec may result in the greatest overall bandwidth savings, it also has a higher degree of complexity, and results in a higher overall delay. Another important consideration is what happens when a single call needs to use multiple codecs. For example, a call might originate on a packet-switched network, but the destination might be on the PSTN. In this case, assuming that the packet-switched network is using the same codec throughout (say G.728), multiple encoding would need to take place – the call would first be encoded using G.728, but would then need to be transcoded to G.711 for transport over the PSTN. When multiple encoding occurs, it not only introduces additional delay, but also lowers the overall MOS quality of the call. As a general rule, the end-to-end delay of a phone call should not exceed 150 ms (as per ITU recommendations), so keep this in mind.

VoIP Codecs, Standards, and Bandwidth Requirements

The ITU-T has defined a number of different standards for the coding and compression of voice traffic. The table below outlines the main standards that you should be familiar with, including their associated data rates. As a rule, don’t worry about the codec names of the standards – focus on the ITU standards associated with the codecs instead, since this is typically how they are referred to. The codecs listed in the tables are ordered according to their data rates.

Voice Codecs and their associated data rates and complexity.

As the table above suggests, using the G.729 codec (which uses a data rate of 8 kbps) will obviously result in significant bandwidth savings over using the G.711 codec, with its 64 kbps data rate. The complexity heading in the table is used to define how many voice calls can function over a single digital signal processor (DSP). In general, a single voice network module will include many DSPs. A medium complexity codec can typically support 4 calls per DSP, while a high complexity codec will support only 2.

When codec data rates are provided, the number is associated with transmissions in a single direction only. In other words, the G.729 codec uses a data rate of 8 kbps from the sender to the receiver. A two-way VoIP conversation across the same circuit would require 16 kbps total – 8 kbps in each direction.

VoIP Coding and Compression Schemes

In previous articles you learned that a traditional voice call over the PSTN uses 64 kbps of bandwidth per call over a dedicated circuit-switched connection. While packet-switching solutions like VoIP do not require a dedicated circuit to operate, they still require sufficient bandwidth in order to function in a manner acceptable to users. Recall from earlier that a significant portion of any phone conversation involves periods of silence. In the world of packet-switched voice, those periods don’t explicitly require any packets to be sent (techniques like Voice Activity Detection or VAD can be configured using software like Cisco CallManager), which can provide bandwidth savings (typically in the range of 30-40%). However, one other significant method is also used to control the amount of bandwidth required in a packet-switched voice implementation – compression.

In much the same way that various compression schemes looked at earlier in this chapter can be used to reduce the size of data packets, compression schemes are also used on a voice network. Through the implementation of different codecs (coders/decoders), different levels of compression for voice traffic can be achieved, each with varying levels of perceived quality to the listener. This is a key consideration when implementing a voice networking solution – while the highest compression rate may produce the greatest bandwidth savings, it may also produce lower overall quality (this is not necessarily the case, as you’ll see shortly). Furthermore, the use of codecs that compress traffic to a higher degree can also add additional delay to the voice network.

Communications Over VoIP Networks

When terminal devices like IP phones wish to communicate, call processing software like Cisco CallManager is typically involved in the process. While some calls will be between two IP phones on the same subnet, some may be on a remote IP network (for example, across a WAN link), while others will be to traditional phones connected to the PSTN. When two users with IP phones on the same subnet need to communicate, a router does not need to be involved, consistent with how IP operates. However, when the users are located on different subnets, a router (or Layer 3 switch) must be involved in order to route the IP-based voice traffic from one subnet to the other. In this case, the router used between the subnets does not need to be voice-enabled. Instead, it will simply route traffic across the network as it would any IP packets. Only the third situation requires a voice-enabled router – when a user on the IP network needs to communicate with users on the PSTN. In this scenario, a router that includes a voice module is needed to convert between IP and voice in one direction, and voice and IP in the other.

In order for a router to properly route voice traffic across an IP network or to an external user connected to the PSTN, dial peers need to be configured on the router. Remember that users using an IP phone will not be dialing the destination IP address that they wish to reach. Instead, they will be dialing a complete phone number or extension number associated with the user they wish to reach. The configuration of dial peers associates a phone number or extension number with an IP address or the voice port to which the call should be forwarded. For example, if a user wishes to reach another user on the IP network at extension “1234”, a dial peer (specifically, a VoIP peer) would be configured on the router mapping that extension to the destination IP address. Similarly, if a user needed to connect to someone on the PSTN, the phone number (usually a small portion of the number) could be configured in a dial peer (known as a plain old telephone service or “POTS” peer) to specify that the traffic should be forwarded out of a voice port on the router, which may be connected to a PBX or directly to a PSTN trunk link.

One of the advantages of configuring dial peers is that an administrator has a high degree of control over the entire call-routing process. For example, in order to reduce costs, an administrator could configure a dial peer such that when a user in the Toronto office needs to connect to PSTN user in Frankfurt, the call is first routed over the IP WAN to the Frankfurt office, where a voice-enabled router dials the (now local) call to the Frankfurt PSTN. An obvious advantage in this scenario is that the long distance changes associated with originating the call in Toronto are reduced to a local call. In the same way that both POTS and VoIP peers can be configured, so can dial peers for both VoFR and VoATM.

Outside of helping to route calls along the correct path, dial peers are also used to apply different attributes to the various “call legs” that a transmission passes over between the source and destination devices. A call leg is simply the logical path between voice gateways (such as a router) or between a gateway and destination device. Examples of attributes that might be applied to a particular call leg include the codec used, QoS settings, and so forth. You will learn more about codecs and QoS settings upcoming articles.

Ultimately, the process of routing a call from a particular source to the correct destination is somewhat similar to a traditional voice call on the PSTN. When a user picks up an IP handset, the local gateway (such as the Cisco router) provides the user with dial tone. As the user keys in the number they wish to reach, these digits are forwarded to the gateway, which collects them until the appropriate dial peer can be identified. Once identified, the call is forwarded along the call leg to the next gateway (or destination or PSTN switch). At the most basic level, this is similar to how a PSTN switch or PBX makes forwarding decisions on a traditional voice network.

H.323 and VoIP Components

In order to appreciate how VoIP networks function, it is important to have an awareness of the base protocols and components used to facilitate the communications process. The primary protocol used to enable multimedia applications like voice and video to function over packet-switched networks is known as H.323, an ITU-T recommendation. Prior to the development of H.323, different vendors used a variety of different standards and proprietary methods of managing multimedia applications on networks, which led to interoperability issues. Today, VoIP equipment and software support the H.323 standard to ensure the highest level of interoperability possible.

The main function of H.323 is not as a transport or network protocol, but rather to perform call control and management functions on a packet-switched network (H.323 is considered a session layer protocol). Within the H.323 specification, two additional signaling methods are required for the transport of voice traffic:

  • H.225. The H.225 specification uses the Q.931 protocol (the same one outlined in the ISDN section of Chapter 11) for call control signaling between two H.323 devices. This includes functions like call setup and termination.
  • H.245. The H.245 specification creates a reliable connection between H.323 devices that is used to exchange information about the codec to be used, the capabilities of the devices (which allows them to determine a common level of compatibility during a session), flow control information, the port numbers to be used, and so forth.

When two H.323 devices attempt to establish a session, H.225 is first used to establish the call (using TCP for reliable transport). H.245 then creates a TCP connection for the purpose of exchanging information about the capabilities of both devices, identify the port numbers to be used, and open a logical channel over which the VoIP traffic will ultimately be passed. Finally, the voice traffic is transferred from one endpoint to another using the appropriate upper-layer protocol (to be identified shortly), which in turn uses the connectionless UDP protocol to transport the actual voice packets across the network. Notice that in this example, TCP is the transport protocol used for call establishment and management, since it is reliable. However, UDP is used for the actual transmission of the voice traffic, since it is time-sensitive.

Note: Remember that H.323 is the primary call control and management protocol used on VoIP networks, and that voice calls are initiated and managed using H.225 and H.245 respectively. H.323 allows the software and hardware of different vendors to interoperate, providing organizations with a high degree of flexibility in developing a solution appropriate to their environment.

H.323 networks consist of four main types of components, as outlined below. Not every network will require each of the components listed, depending upon the specific needs of the organization.

  • Terminals. A terminal is an H.323-compliant end-point such as an IP telephone or a PC running software such as Microsoft NetMeeting. All H.323 terminals must support voice networking, but data capabilities like video support are optional. Two H.323 terminals can communicate directly with one another without any additional components assuming that they know how to reach each other (via IP address, for example).
  • Gateways. A gateway is an optional component on an H.323 network that provides a variety of different services depending upon the needs of an environment. For example, a gateway can be used to allow an H.323-compatible device to communicate with another device that does not support H.323, such as a traditional phone connected to the PSTN. Similarly, a gateway can be used to translate between the codecs used on different H.323 devices if necessary. On a Cisco network, a gateway would be a voice-enabled router or switch.
  • Gatekeepers. A gatekeeper in another optional component on an H.323 network, typically found on larger networks. A gatekeeper is used to register H.323 devices and gateways, allowing them to find and establish sessions with one another as necessary. A gatekeeper also performs functions like call control, bandwidth management, and authorization for H.323 components. Gatekeepers are also capable of making decisions as to how traffic should be forwarded between devices, such as routing calls over a particular WAN link, or the PSTN if necessary. Gatekeepers can also be used to simplify the management of H.323 gateways. When multiple H.323 gateways need to be configured on a large network, it can become time consuming and administratively intense. Instead, a gatekeeper can be configured for an entire zone that includes multiple gateways, and then handle call control functions for all of those gateways in a centralized manner. On a Cisco network, a gatekeeper is typically a server running third-party software, or a Cisco IOS router.
  • Multipoint control units (MCUs). An MCU is another end-point on a LAN that allows multipoint conferences to occur on an H.323 network. For example, this might be an audio conference with three or more participants. MCUs are only required in this capability is required on the H.323 network. On an H.323 network, MCU capabilities might be found on a terminal, gateway, or gatekeeper.

As mentioned previously, when two H.323 terminals are connected to the same network and need to communicate, a gateway is not required. However, when an H.323 terminal (such as an IP telephone) needs to communicate with a non-H.323 device (such as a phone connected to the PSTN), an H.323 gateway is required. In this case, the gateway is typically a voice-enabled Cisco router. A voice-enabled Cisco router is a model that includes a voice module, which uses a digital signal processor (DSP). A DSP is the hardware that translates voice to IP, and vice versa.

Voice Networking Issues and Goals

A traditional voice network relies upon 64 kbps circuit-switched connections between the originator and the recipient of a call. While this dedicated bandwidth helps to ensure the quality of a call, it is also somewhat wasteful. At many points in any conversation, voice traffic is not crossing the circuit, since natural silences occur in human speech (although it certainly depends on who you are talking to). Even so, the circuit is connected, and the bandwidth is not available for other users.

In contrast, when a voice conversation is passed across a network using packet switching, a dedicated circuit is not created. Instead, the voice “data” becomes the payload of a packet or frame, which is subsequently packet-switched across the network according to the technology or protocol used. For example, with VoIP the voice data is the payload of an IP datagram, which is subsequently switched or routed across a network just like any other IP data traffic. Unless explicitly configured to handle VoIP traffic differently (via mechanisms like queuing), routers will treat the packet in the same way as any other IP packet – routing it from the source node to the destination across the “best” possible path. The initial benefit of this method is clear – voice traffic only uses network resources as required, and does not necessarily require the “reservation” of bandwidth resources (or a dedicated circuit) when voice traffic is not being transferred. This is an advantage, but it also presents some challenges. For example, because voice traffic is time-sensitive, techniques like QoS and compression need to be considered and implemented to ensure that packets arrive at their destination in a timely manner.

Through the implementation of technologies like VoIP, companies can also reduce costs, using existing WAN links to transfer packet-based voice traffic between locations, rather than expensive tie trunks. While existing WAN links may have enough excess capacity to handle this additional traffic, it is quite likely that they will need to be upgraded to support the additional traffic that would result from adding packet-switched voice traffic to the network. Although most voice networking vendors (including Cisco) are careful to remind you of the administrative benefits of managing a single converged network rather than separate voice and data networks, the reality is that moving to a single converged network usually requires significant staff training and more importantly, proper planning.

Examples of typical organization goals associated with implementing a VoIP network solution include:

  • Reduce costs associated with traditional PSTN connections and long distance changes.
  • Lower overall total cost of ownership
  • Improve user productivity
  • Reduce reliance on a single vendor for equipment or services
  • Enable new IP-based voice applications to be deployed
  • Move towards a single managed network for voice and data

Understanding VoIP

As you learned in previous articles in this series, traditional telephony relies upon connections to the PSTN via a telecommunications carrier. While companies can implement a PBX internally to reduce the need for access to the PSTN between corporate users, this is still a costly option, and usually requires a significant capital investment and ongoing maintenance expenses for an organization. Furthermore, the addition of new services or features often results in additional costs.

As companies look towards new ways of not only reducing costs but also adding new features to their voice networks, packet-switches telephony is becoming increasingly attractive. Because most companies have already made a significant investment as part of implementing their data network, they are looking for ways to leverage the network to support additional services, such as voice traffic. In the past, the ability to use a single network to provide both data and voice services to users was not practical; not only was the technology to do so still in its infancy, but the bandwidth and quality of service (QoS) techniques necessary to provide what users would consider to be acceptable service were just not available. However, given that many companies have moved to switched high-speed connection all the way to user desktops, this is rapidly changing. Not only is the bandwidth available, but the protocols and associated technologies necessary to implement a “converged” network that handles both voice and data traffic have quickly matured to make the transmission of voice over a data network not only possible, but also a practical solution for many organizations.

To begin, it is important to understand that the transmission of voice traffic over a data network is not limited to Voice over IP (VoIP). While VoIP may be the most popular method of transferring packet-switched voice traffic, it certainly isn’t the only one. Other methods include Voice over Frame Relay (VoFR) and Voice over ATM (VoATM). VoFR does not use IP, and is not an end-to-end solution. Instead, it is typically to create a virtual or emulated tie trunk link between PBXs in remote locations, using Frame Relay PVCs for transport, rather than expensive leased lines. Similarly, voice traffic can also be encapsulated within standard 53-byte cells for transport over ATM networks. Since voice traffic is bursty, VoATM is typically implemented using ATM Adaptation Layer 2 (AAL2) encapsulation, which provides variable bit rate services.

Telephony Value-Added Services

Like any company, a telecommunications carrier is in the business of making money. Given that the market for providing PSTN connections is highly competitive, carriers typically provide a wide range of different services to better meet the need of their customers, and in turn, generate higher revenues. While the types of services offered by carriers vary widely according to market conditions, some of the elements listed below are common value-added services typically offered by PSTN providers.

Centrex. While a PBX is located at the customer premises, Centrex service offers similar capabilities and features, but it maintained and managed by the telecommunications carrier at their locations. Centrex is a great solution for companies that do not want to make the investment associated with acquiring and then managing a corporate voice network. However, Centrex typically results additional fees when services need to be added or changed.

Virtual private voice networks. While companies might choose to interconnect geographically distant PBXs using dedicated tie trunk links, this can be an expensive option, especially if multiple locations need to be interconnected. As an alternative, many service providers offer the ability to interconnect PBXs over the infrastructure of the PSTN, creating a type of virtual private voice network. This option is typically less expensive that implementing dedicated tie trunks.

Voice applications. Many telecommunications carriers offer a variety of programmed voice systems to meet the needs of different customers. For example, a company running a call center might purchase automatic call distribution (ACD) software, allowing calls to be answered automatically, and then queued for an operator. Similarly, carriers may provide a customer with interactive voice response (IVR) systems, allowing callers to provide or obtain information over the phone using voice prompts or by inputting codes as required. IVR systems are commonly implemented by government agencies to gather or distribute information from or to the public. For example, an IVR system might be implemented to allow users to pay their parking tickets by phone with a credit card.

Voice messaging. One very common additional service offered by telecommunications carriers is voice messaging. For home users, this negates the need for a separate answering machine, and for organizations eliminates the need to implement a costly voice mail server platform internally. Similar features offered by carriers include call display, distinctive ring, call forwarding, and so forth. Almost all of the services incur some additional monthly charge.

Voice or video conferencing. Another service typically offered by telecommunications carriers is voice and video conferencing services. While basic three-way voice conferencing is available for a fee on most residential phone lines, corporations often need the ability to support much larger conferences with clients, along with video capabilities as well. Costs associated with these services vary greater according to the specific requirements of the conference (voice only, both voice and video, etc), as well as the geographic distance between participants.

Telephony Network Signalling

Earlier in this section you learned that telephone networks use two basic types of signaling – analog and digital. While this is true of the way in which voice traffic is transported across the network between users, a variety of additional signaling methods are used in the process of establishing and disconnecting a phone call. Examples of signaling include elements you are likely very familiar with, such as dial tone, a phone ringing, and so on. Two major types of signaling exist phone networks, namely subscriber signaling and trunk signaling, as outlined below:

Subscriber signaling. This type of signaling is used between a PSTN switch and a subscriber telephone.

Trunk signaling. This type of signaling is used on trunk links, such as between PSTN switches, between a PBX and PSTN switch, or between PBXs.

These two major signaling types provide signals in four different categories. Supervisor signaling is used to carry out tasks such as initiating a phone call or placing a call on hold. Address signaling is used to forward the dialed number to the PSTN switch or PBX, using methods you are likely already familiar with, such as touch tone or pulse. Call processing signals are also easy to identify, such as the ringing or busy tones encountered as part of trying to establish a connection. Finally, network management signals are used to control how circuits and switches respond when fully loaded, such as routing traffic over another switch connection or circuit.

Depending on whether a PSTN switch is connected to a digital or analog network, different methods are used to transfer signaling information across trunk links. On an analog link, signaling information is sent across the circuit itself. On a digital trunk link (such as a T1 line), signaling information is typically sent using one of two methods, namely channel associated signaling (CAS) or common channel signaling (CCS). In CAS, signaling information (such as for initiating or terminating a call) is sent over the same channel as the voice call will ultimately use. In CCS, signaling information is sent over a dedicated and separate channel from the voice call. In general, CCS is the more popular option because it is faster and supports a wider variety of services, such as call display. A good analogy in this case would be comparing CCS to services like ISDN, where data or voice travels over one channel (a “B” channel) and signaling and call control information travels over another (a “D” channel).

Some of the common signaling systems used on voice networks include ISDN, QSIG, and SS7. Each of these is listed in more detail below.

ISDN. Originally developed as an all-digital phone network capable of allowing home users to transfer voice, data, and video over a single link, ISDN BRI and PRI are still commonly used on voice networks. For example, in situations where a company has PBXs in distant locations, ISDN links are often used to create a type of virtual private voice network between locations across the PSTN.
QSIG. QSIG is a signaling system that was originally developed to allow PBXs from different vendors to interoperate across a network.

SS7. Short for Signaling System 7, this is an international signaling standard that follows the CCS method outlined earlier. In other words, signaling information is sent over a separate channel rather than the channel used for the voice call. Examples of information passed over this control channel include call control (call setup/teardown), network management functions, as well as features like call display, call forwarding, and so forth.

Telephony Phone Numbers

In much the same way that an IP address is used to identify and route traffic to a unique host on a TCP/IP network, phone numbers are used to identify and route traffic to a particular destination phone line. In the world of telephony, these numbers are designated and defined by a combination of international and national organizations. For example, the International Telecommunication Union Telecommunication Standardization Sector (ITU) defines the international numbering system, such as the country codes used in conjunction with a long distance call. Within a country, various governing bodies define a numbering plan based on factors like defined areas, and projected number of lines required, As outlined in the main ITU E.164 specification. In North America, telecommunications providers use a system known as the North American Numbering Plan (NANP). In NANP, phone numbers are 10 digits in length, according to the properties shown below:

3-digit area code. The first 3 digits in a 10-digit phone number identify an area or geographic region. For example, a particular city will often have one or more area codes associated with it. For the sake of clarity, the area code associated with a phone number is usually displayed in parenthesis, for example (416). In many regions, the area code only needs to be dialed if attempting to reach a different area, in which case the digit “1” usually precedes the number. However, in many areas, especially those with multiple area codes associated with a single “local” calling area, all ten digit must be keyed, usually without the preceding “1”, which is used to identify a long distance call.

3-digit CO code. The next 3 digits in a 10-digit phone number usually identify a particular CO switch. As the user dials these numbers, the call is forwarded to the CO switch to which the user line is connected. For example, the CO code “555” is likely associated with a particular switch. When the user keys in “555” on their handset, the call is forwarded through the various CO switches to the switch associated with that CO code. Of course, one CO switch is usually responsible for multiple CO codes. If you consider the 3-digit CO codes used by your phone number and those of close neighbors, this should become clear.

4-digit line code. The last 4 digits in a 10-digit phone number identify a specific phone line connected to the CO switch. For example, in the phone number 555-8936, the “8936” portion identifies the specific line number to which the call should be forwarded on the “555” switch. After the “8936” portion is keyed in by the user, the “555” CO switch sends a ringing tone on the “8936” line.

The numbering plan in use in your home country likely differs from the 10-digit plan specified for North America. However, most countries use an internal numbering system that follows the recommendations outlined in the ITU E.164 specification.