Voice over Internet Protocol

From Citizendium
Jump to navigation Jump to search
This article is a stub and thus not approved.
Main Article
Related Articles  [?]
Bibliography  [?]
External Links  [?]
Citable Version  [?]
This editable Main Article is under development and subject to a disclaimer.

Voice over Internet Protocol is a family of standards that permits carrying voice telephony not over dedicated telephony networks, but over Internet Protocol networks that handle both voice and data. In practice, VoIP also refers to service offerings and internal telephony, and the engineering and operations of them.


Voice Digitizing

When people speak to one another in person, the speech is conveyed as continuously varying (i.e., analog) sound waves. Most of the information in speech, as opposed to music, is carried in the frequency range from 300 to 4000 hertz, and a 4 KiloHertz (KHz) analog channel is considered the basic unit of voice conversation bandwidth when the sound waves are converted to analog electrical signals.

From Bell's invention of the telephone to the early 1960s, the entire telephone system used analog transmission. This did not lend itself to the growing availability of computers and digital electronics, which offered a number of technical advantages. For example, whenever a weak analog signal is amplified, the amplifier adds noise to the signal. A digital signal, however, can be regenerated without adding noise, as long as it does not change the digital representation.

It had been known, since Nyquist's research in 1928, that an analog signal could be accurately converted to digital one if it were sample at twice the highest analog frequency. In the case of the 4 KHz analog signal, that meant that 8000 digital samples per second were needed to represent that signal in digital form, without information loss.

Until the availability of solid-state digital electronics, this knowledge largely remained a research area, except for very specialized applications such as voice encryption, some of which stayed analog. Once the electronic technology was available, digital voice received serious engineering attention. Nyquist's sampling rate alone did not characterize the digital bit stream that could reproduce a voice channel.

Whenever a sample was taken, specifically of the analog sample amplitude at that instant, the next question was the amount of precision needed in an adequate digital channel. Ignoring some historical dead ends and certain overhead functions in telephony, the answer was that the appropriate number of bits in the sample was 8, which gave 256 voltage levels. 8000 samples per second, multiplied by 8 bits, produces a 64 Kbps digital channel as the representation of the analog voice channel. Another term for the stream produced is pulse code modulation, the codes referring to the bit pattern that represented the analog amplitude of the sample.

For some years, when digital telephony traveled over a dedicated digital network, the amount of bandwidth did not present a major engineering challenge. The digital streams were combined, using time division multiplexing, into faster and faster channels carrying multiple voice streams.

The Internet, however, does not offer continuous bit streams, and infinite bandwidth is not available. The next challenge in VoIP was determining if there were more bandwidth-efficient means to digitize voice, such that adequate information could be put into fixed packet sizes.

Real-time transport

For VoIP to be usable, there must be an adequate quality of service (QoS) from one edge of the network to the other. Voice is most sensitive to variability of delay (i.e., jitter) and next to absolute delay. It is relatively tolerant to occasional packet loss, although extremely intolerant to packet reordering and to errored packets.

In practice, QoS for a call is set up with respect to reserved bandwidth, using either the Resource Reservation Protocol for individual call setup, or assigning the call to a Multi-Protocol Label Switching path that has suitable traffic engineering. Such paths are usually created with the traffic engineering extensions to RSVP.

Call control

Session Initiation Protocol

The Session Initiation Protocol (SIP), a modern version of computer networking session protocols, is key to deployed VoIP, where SIP may need to traverse a firewall-like function. Conventional firewalls make assumptions about port numbers, but SIP uses a dynamic range. SIP is the dominant protocol found inside the local multimedia border, although it rapidly is becoming the outside standard.

Session Border Controllers

A specialized class of security gateways called Session Border Controllers (SBC) deal with this problem, which are again controlled violations of the end-to-end principle. They terminate the SIP session coming from "inside", and create a new session to the outside. They may have firewalling or other security capabilities optimized for a session layer protocol.


Between those two session termination points, depending on the particular SBC, quite a number of things can happen. There can be deep packet inspection for security or accounting. If the particular codec being used to convert analog voice to digitized [[[packet]]s on the inside is different than the one expected from the outside (e.g., high-bandwidth G.711 versus low-bandwidth G.729A), the SBC can convert -- "transcode" -- although it is always advisable to avoid transcoding. Transcoding adds delay and may decrease quality.


Encrypted voice is a problem unless the SBC is trusted to encrypt, examine plaintext, and encrypt in a new cryptosystem.

Design and engineering

One of the key aspects of designing a VoIP network is the VoIP delay budget: the end-to-end delay experienced by callers, as well as the per-hop behavior of each internal connection. Another is the VoIP dial plan, which includes telephone number mapping. Call accounting and system management are operationally critical.

Delay budget

  • Coder or transcoder delay
    • Packetization delay
  • Delay for encryption, if used
  • Queuing delay to next hop
  • Serialization delay to next hop
  • Propagation delay to next hop (6 microseconds per terrestrial kilometer)
  • Deserialization delay
  • Dejittering delay at next hop
  • Decoder delay at final hop
  • Decryption delay, if used

Dial plan and number mapping