Multihoming
Multihoming includes a wide range of networking techniques that allow organizations to connect to internal or external destinations through more than one path. There are many reasons for using multiple paths, and perhaps more than one instance of a type of networking equiment, but the most common reasons are fault tolerance and traffic engineering. [1]
When organizations connect to the Internet, or directly to other organizations using the Internet protocol model, the ever-increasing criticality of their applications means that they cannot tolerate a single point of failure that could isolate them from the outside world. Indeed, the same concerns are present in complex internal networks, especially between different locations of the same enterprise. Multihoming also can describe connectivity between Internet service providers (ISP) and "upstream" network service providers (NSP).
Often, organizations, including Internet service providers and telephone companies, have multiple external communications, not just for fault tolerance, but for reasonable load distribution. If a company has external connections to the Internet at Tokyo, New York, and London, it makes more sense for traffic from Yokohama to Rome to enter at Tokyo and leave at London. This sort of traffic engineering not only improves reliability, but makes more cost-effective use of the available paths and improves network performance.
Unfortunately, multihoming covers a variety of mechanisms, including naming/directory services, routing, and physical connectivity. The "home" may be identified by a Domain Name Service (DNS) name, an address from Internet Protocol version 4 or Internet Protocol version 6, a combination of IP address and transmission control protocol "port number", or combinations of these and other techniques. To keep the general term "multihoming" in an appropriately broad context, do not immediately think in terms of the mechanism(s) used for connectivity, but the requirement to have multiple ways to reach a "home" destination.
Such a requirement may have aspects of availability, quality of service, routing policy, capacity, network security, and all these and other factors may combine. Another real-world constraint will be the skill levels of the people that will operate the multihomed system.
Several terms have become overloaded to the point of confusion, including multihoming, virtual private networks, and load balancing. This document attempts to bring some order to the definition of multihoming. It partially overlaps definitions of virtual private networks.
Special applications for multihomed connectivity and its variants
There are other motivations for complex connectivity from enterprises to the Internet. Mergers and acquisitions, where the joined enterprises each had their own Internet access, often mean complex connectivity, at least for a transition period. Consolidation of separate divisional networks also creates this situation. A frequent case arises when a large enterprise decides that Internet access should be available corporate-wide, but their research labs have had Internet access for years -- and it works, as opposed to the new corporate connection that at best is untried.
Many discussions of multihoming focus on the details of implementation, using such techniques as the Border Gateway Protocol (BGP) [RFC number of the Applicability Statement], multiple DNS entries for a server, etc. This document suggests that it is wise to look systematically at the requirements before selecting a means of resilient connectivity.
One implementation technique is not appropriate for all requirements. There are special issues in implementing solutions in the general Internet, because poor implementations can jeopardize the proper function of global routing or DNS. An incorrect BGP route advertisement injected into the global routing system is a problem whether it originates in an ISP or in an enterprise.
Defining the goals
Requirements tend to be driven by one or more of several major goals for server availability and performance. Availability goals are realized with resiliency mechanisms, to avoid user-perceived failures from single failures in servers, routing systems, or media. Performance goals are realized by mechanisms that distribute the workload among multiple machines such that the load is distributed in an useful manner. Like multi-homing, the terms load-balancing and load-sharing have many definitions.
In defining requirements, the servers themselves may either share or balance the load, there may be load-sharing or load-balancing routing paths to them, or the routed traffic may be carried over load-shared or load-balanced media.
Analyzing Application Requirements
Several questions need to be answered in the process of refining goals:
- the administrative model and administrative awareness of endpoints
- availability requirements
- the security model
- addressing requirements
- scope of multihoming
Administrative Model
A key question is: are endpoints predefined in the multihoming process, or will either the client or server end be arbitrary? The simplest model is an intranet, where all endpoints are known in advance and under a single administration, as the network of a business or other enterprise. That business may have multiple physical locations, and may choose to interconnect them with physical facilities, or with simulated facilities in the public Internet.
This case becomes more complex when most endpoints are indeed inside, but some internal computers need to reach external servers on the public Internet, whose identities are nto known in advance.
In an extranet, which is one form of business to business (B2B) communications, internal computers need to talk both to other internal computers, but also to computers in other enterprises. Those external computers, however, are known in advance, and their administrators can pre-plan security, backup, and other operational matters. Indeed, the complexity may be such that the participants create a company just to run the extranet. For example, the VISA corporation is actually owned by its member banks, but charges service fees to the banks issuing cards and the banks processing card charges. That network, for obvious reasons, is strictly isolated from the public Internet. Even the member banks are restricted; while one bank may need to access the credit card authorization computer in another, that first bank is not authorized to connect, through VISA, to another bank's mortgage loan department.
Another kind of B2B communications might be ad hoc, where, for example, a business needs some parts or equipment, and puts out an open bid to suppliers connected to the Internet.
In business-to-consumer (B2C) communications, again, there are the two cases of all endpoints being known in advance. If one is to be an author or editor on Citizendium, for example, the "consumer" author and the "business" Citizendium, before creating or modifying articles, one must go through the administrative process of creating an account and a login identifier.
For the case where an arbitrary user searches the Internet for public information servers, there is no prior relationships. Many server administrator will keep a log of access, should the user deliberately cause harm to the server.
Server function availability
The first goal involves well-defined applications that appear to run on a well-known server name visible on the Internet. There are two broad class of applications: those that are essentially read-only or have other idempotent features such that the same result will be produced if the server receives one, or many, copies of the same transaction. It is easiest to multihome to servers of this type. [2]
Other kinds of servers (e.g., a bank account from which one withdraws), may not exhibit the same behavior if they get multiple copies of a request. An application still may make this idempotent, by remembering when a particular request has run and ignoring additional copies.
As long as the server function either uses read-only data, or has mechanisms that keep databases synchronized, the individual transactions can run on different server instances. run on specific servers visible to the Internet at large. This will be termed "endpoint
multihoming", emphasizing the need for resilience of connectivity to
well-defined endpoints. Solutions here often involve extensions of FNS servers, where a server has knowledge of current workload or other factors to let it choose among several physical server addresses to return to a client request. For example, en.citizendium.com
is a name known to the world, but the functions provided may actually be available on the servers with the relative names en1, en2, and en3
. An enhanced DNS server, doing "round robin" traffic engineering, might send the address of en3
back only for every third request.
Should any of the three servers go down, however, the DNS server can spread the load over the two remaining servers, as a form of application multihoming.
From an application standpoint, this is either a many-to-one topology, many clients to one server, or a many-to-many topology when multiple servers are involved. It can be worthwhile to consider a many-to-few case, when the few are multiple instances of a server function, which may appear as a single server to the general Internet. The idea of many-to-few topology allows for a local optimization of inter-server communications, without affecting the global many-to-one model.
Addresses on interfaces that connect to the general Internet need to be unique in the global Internet routing system, although they may be translated, at the network address or port level, from public to internal space.
General Internet connectivity from the enterprise
The second is high availability of general Internet connectivity for arbitrary enterprise users to the outside. This will be called "internetwork multihoming". Solutions here tend to involve routing mechanisms.
This can be viewed as a few-to-many application topology.
Addresses on interfaces that connect to the general Internet need to be unique in the global Internet routing system, although they may be translated, at the network address or port level, from internal private address to public space.
Use of Internet services to interconnect "intranet" enterprise campuses
The third involves the growing number of situations where contracted IP services are used to interconnect parts of an enterprise with virtual private networks. This will usually involve dedicated or virtual circuits, or some sort of tunneling mechanisms.
Beyond general internal campus multihoming, VPN intranet multihoming is concerned with multiple paths for the VPN tunnels that interconnect the campuses.
Use of Internet services to connect to extranet partners
A fourth category involves use of the Internet to connect with strategic partners. True, this does deal with endpoints, but the emphasis is different than the first case. In the first case, the emphasis is on connectivity from arbitrary points outside the enterprise to points within it. This second case deals with pairs of well-known endpoints.
These endpoints may be linked with dedicated or virtual circuits defined at the physical or data link layer. Tunneling or other virtual private networks may be relevant here as well. There will be coordination issues that do not exist for the third case, where all resources are under common control.
Addresses need to be unique in the different enterprises, but do not need to be unique in the global Internet.
Security Model
Security requirements can include various cryptographic schemes, as well as mechanisms to hinder denial of service attacks. The requirements analyst must determine whether cryptography is needed, and, if so, whether cryptographic trust must be between end hosts or between end hosts and a trusted gateway. Such gateways can be routers or multiported application servers.
Addressing Refinements and Issues
At one time, addressing management might have been treated. With today's issue of conserving IPv4 address space, possibly by using private address space reached through network address translators, and, even more, the transition to IPv6, this can no longer be put aside. Even with IPv4, changing ISPs, as well as mergers and acquistion of companies with separate networks, may require renumbering. [3]
Not all application protocols are unaware of the underlying addressing. Network address translation can break some application protocols, such as File Transfer Protocol, if the translation does not go beyond the network addresses and into translation in the application protocol messages.
There also may be administrative requirements for addressing, such as a service provider that contracts to run a multinet may require addresses to be registered, possibly from the provider's address space.
Consideration also needs to be given to application caches in addition to DNS caches. Firewall proxy servers are a good example where multiple addresses associated with a given destination may not be supported.
Network Address Translation (NAT) is a widespread technique undergoing significant enhancement in the NAT Working Group. The traditional approached either did a one-to-one translation from inside to outside address, or a many-to-one mapping from a large number of addresses on one side to a much smaller number of addresses (with a larger number ot TCP/UDP ports). The traditional approaches, in practice, include:
Inside | Outside |
---|---|
Private address space, static or long-term assignments | Public Internet space allocated to an ISP |
Private address space, static or long-term assignments | Extranet space, either private or managed by the Extranet operator |
Public address space, provider assigned (PA) | Public Internet space allocated to an ISP |
Public address space, provider independent (PI) | Public Internet space allocated to an ISP |
More powerful translation technologies such as Load-Sharing NAT [RFC2391]or Address Layer Gateways (ALG) [RFC2663] may be needed.
Scope of multihoming
Multihoming may be defined between an end host and a router or application gateway, on an end-to-end basis possibly involving virtual servers, among routers, or among elements in a transmission system. Different multihoming scopes may support the same application requirement.
Application Goals
These goals need to be agreed to by the people or organization responsible for the applications. Not to reach fairly formal agreement here can lead to problems of inappropriate expectations.
At the application layer, there will be expectations of connectivity. Not all applications will operate through classical NAT devices. Application designers should proceed on two fronts: following NAT-friendly application design principles [Senie 1999a] and being aware of potential application protocol interactions with NAT technologies [Holdredge 1999a].
The term "service level agreement" often refers to expectations of performance, such as throughput or response time. Ideas here extend the performance-based model to include availability.
Planning and Budgeting
In each of these scenarios, organization managers need to assign some economic cost to outages. Typically, there will be an incident cost and an incremental cost based on the length or scope of the connectivity loss.
Ideally, this cost is then weighted by the probability of outage.
A weighted exposure cost results when the outage cost is multiplied by the probability of the outage.
Resiliency measures modify the probability, but increase the cost of operation.
Operational costs obviously include the costs of redundant mechanisms (i.e., the addititional multihomed paths), but also the incremental costs of personnel to administer the more complex mechanisms -- their training and salaries.
Issues
Performance vs. Robustness: the Cache Conundrum
Goals of many forms of "multi-homing" conflict with goals of improving local performance. For example, DNS queries normally are cached in DNS servers, and in the requesting host. From the performance standpoint, this is a perfectly reasonable thing to do, reducing the need to send out queries.
From the multihoming standpoint, it is far less desirable, as application-level multihoming may be based on rapid changes of the DNS master files. The binding of a given IP address to a DNS name can change rapidly.
Service Level Agreements
Enterprise networks, especially mainframe-based, are accustomed to building and enforcing service level agreements for application performance. A key to being able to do this is total control of the end-to-end communications path.
In the current Internet, the enterprise(s) at one or both ends control their local environments, and have contractual control over connections to their direct service providers.
If service level control is a requirement, and both ends of the path are not under control, the general Internet cannot now provide service level guarantees. The need for control should be reexamined, and, if it still exists, the underlying structure will need to be dedicated resources at the network layer or below. A network service provider may be able to engineer this so that some facilities are shared to reduce cost, but the sharing is planned and controlled. Contracting for virtual private networks is one way to have the provider take responsibility for delivering the guaranteed service level in a multihomed network.
The acid test for multihoming is whether or not it delivers the required availability and performance. Sometimes, there are demands for unnecessary features. For example, consider an enterprise that has two points of connection to a service provider. If a request to some external server goes out the first connection, but the reply comes back on the second connection, as long as the required performance is met, the asymmetrical routing observed is not relevant to the objective and needs to be accepted as simply how the network behaves
Security
ISPs may be reluctant to let user routing advertisements or DNS zone information flow directly into their routing or naming systems. Users should understand that BGP is not intended to be a plug-and-play mechanism; manual configuration often is considered an important part of maintaining integrity. Supplemental mechanisms may be used for additional control, such as registering policies in a routing registry[4] or using egress/ingress filtering.[5]
Challenges may arise when client security mechanisms interact with fault tolerance mechanisms associated with servers. For example, if a server address changes to that of a backup server, a stateful packet screening firewall might not accept a valid return. Similarly, unless servers back one another up in a full mirroring mode, if one end of a TCP-based application connection fails, the user will need to reconnect. As long as another server is ready to accept that connection, there may not be major user impact, and the goal of high availability is realized. High availability and user transparent high availability are not synonymous.
Load Balancing vs. Load Sharing
These terms are often interchanged, but they really mean different things. Load balancing is deterministic, and at a finer level of control than load sharing, which is statistical. Load balancing is generally not something that can be realized in general Internet routing, other than in special and local cases between adjacent AS. A degree of load sharing is achievable in routing, but it may introduce significant resource demands and operational complexity.
Paul Ferguson defines load-balancing as "a true "50/50" sharing of equal paths.[6] This can be done by either (a) round robin per- packet transmission, (b) binding pipes a the lower layers such that bits are either 'bit-striped' across all parallel paths (like the etherchannel stuff), or binding pipes so that SAR functions are done in a method such as multilink PPP. These are fundamentally the same.
"Load-sharing is quite different. It simply implies that no link is sitting idle -- that at least all links get utilized in some fashion. Usually in closest exit routing. The equity of utilization may be massively skewed. It may also resemble something along the lines of 60/40, which is reasonable."
Application Compatibility
Some deployment mechanisms involve network address, or network address and TCP/UDP port, translation (NAT and NAPT). If the application protocols embed IP addresses in their protocol fields, NAT or NAPT may cause protocol failures. Translation mechanisms for such cases may require knowledge of the application protocol, as typified by application proxies in firewalls, or in application gateways with multiple interfaces.
Technologies
Protocol family | Representative mechanisms |
---|---|
Application | DNS load balancing, application redirection |
End-to-end | Transaction load distribution, tunnels |
Internetwork | Interior routing multihoming, liveness protocols with quasi-static routing, BGP multihoming, network address translation, resilient packet ring |
"Sub-IP" | MPLS failover |
Data link | Multilink PPP, IEEE 802.3 link aggregation |
Physical | Autodial |
Transmission media | SONET/SDH |
References
- ↑ Berkowitz, Howard C. (October 1999), To Be Multihomed: Requirements & Definitions, Internet Engineering Task Force, draft-berkowitz-multirqmt-02.txt
- ↑ Fielding R., et al., 9.1 Safe and Idempotent Methods, Hypertext Transfer Protocol -- HTTP/1.1, Internet Engineering Task Force, RFC2616
- ↑ Ferguson, P., and H. Berkowitz (January 1997), Network Renumbering Overview: Why would I want it and what is it anyway?, RFC 2071
- ↑ Alaettinoglu, C. et al. (January 1998.), Routing Policy Specification Language (RPSL), RFC2280
- ↑ Ferguson, P. & D. Senie (May 2000), Network Ingress Filtering: Defeating Denial of Service Attacks which employ IP Source Address Spoofing, RFC2827 & BCP 38
- ↑ Ferguson, P., "Re: Comments on "What is a VPN?"" Message to IETF VPN mailing list, 08 Mar 1998 19:52:29