Talk:Anycasting/Draft

From Citizendium
< Talk:Anycasting
Revision as of 15:22, 2 October 2013 by imported>John Stephenson (moved Talk:Anycasting to Talk:Anycasting/Draft over redirect: Cannot get the banner info on approved-article Talk pages to show with Citable Versions subpages, so moving this whence it came for now)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
This article has a Citable Version.
Main Article
Discussion
Related Articles  [?]
Bibliography  [?]
External Links  [?]
Citable Version  [?]
 
To learn how to update the categories for this article, see here. To update categories, edit the metadata template.
 Definition A technique for increasing load distribution and fault tolerance in networks with multiple copies of a read-only server function, but with the same unicast address. [d] [e]
Checklist and Archives
 Workgroup category Computers [Categories OK]
 Talk Archive none  English language variant American English

An area where review is definitely appropriate

While I've rewritten a bit, I'm now not convinced that the sinkhole case is appropriate for a first article on anycasting. Indeed, I have much to do both on denial of service and sinkhole; Sandy Harris, I know, has some interests here.

One of the reasons that the intertwined issues of sinkholes and anycasting get so complex is that while the network operator, for many reason, is diverting attack traffic to the well-known address of the target of the attack, that same operator may have very legitimate defensive management and problem recovery reasons to communicate with the device under attack. That communication might even be to an alternate, internal-only address of the interface under attack by its well-known IP address, or perhaps through a completely separate channel. The characteristics of such a management communication is much like a critical message in real-world battlefields: an order that may be short, but must get through in spite of jamming, deception, and both deliberate interference and secondary effects of that interference.

I'm wondering if the entire sinkhole section is more proper in a subordinate article, or merely as a link. The more I tried to explain in a short note, the more nuances came to mind. Howard C. Berkowitz 16:00, 18 January 2009 (UTC)

Unclear?

It seems that I do not understand this:

"if defensive code needs to be inserted into the router, yet not diverted away from it by the anycast process."

What is "not yet diverted from" what? Peter Schmitt 23:52, 11 June 2009 (UTC)

You are very right this is unclear, and I wrote it.
As you'll see in my musings earlier on the talk page, I was unsure how deeply to go into the specific sinkhole applications of anycasting, in the main anycasting article. This text needs to be rewritten, but it is really specific to sinkholes (and defending against distributed denial of service attack), and I'll move it to that article for rewrite. Thanks for spotting it.
This article is not intended to be about advanced applications, but simply clarifying the basic mechanism of anycasting. Howard C. Berkowitz 14:45, 12 June 2009 (UTC)

Remarks

The external links of references 3 and 4 are broken.

A short explanation of "anycast(ing)" at the beginning would probably be nice (before discussing various documents about it).

What is CZ policy about external links in the text? Maybe it would be better to have the RFC links (also) in the subpage.

Peter Schmitt 19:04, 7 July 2009 (UTC)

Introduction updated.
I'll contact the authors of references 3 and 4 or see if I can find other linsk.
Unfortunately, there's no good way to prevent the external links for RFCs; Wikimedia treats them as a special case. Howard C. Berkowitz 19:24, 7 July 2009 (UTC)
If there is no reference for the dead links: I think that the article could do without them. What do you think? Peter Schmitt 00:42, 25 August 2009 (UTC)

I converted the two-column reference list to one column, since there are only two references. The first one was spilling over into the second column on my screen, making everything look bad. --Joe Quick 01:09, 14 October 2009 (UTC)

Thanks. Howard C. Berkowitz 01:59, 14 October 2009 (UTC)

Progressing

While there have been some comments elsewhere, my preference, but not a strong one, is to continue the Approval of this version. I think some of the questions that came up are better addressed in a separate article or articles on load sharing or load distribution; I'm not sure if that should be a single article or a set with an introduction to algorithms, and subarticles on such things as routing, transaction/transport, DNS, and other specific load sharing techniques. Howard C. Berkowitz 17:01, 17 October 2009 (UTC)

No need to stop the approval of this version, and wait until another is ready. The additional work has to be done by a constable :-) But if you have some improved formulations which have nothing to do with reorganizing the material, then these could be included here. Peter Schmitt 19:50, 17 October 2009 (UTC)
I prefer the term load distribution to load sharing, and especially load balancing, which gives an unreasonable confidence that things will be deterministic or at least precise. In routing, the first step is deciding whether or not to use a single route with backups, or to have equal cost multipath.
If one does go with equal cost multipath — automatic nonequal simply has not worked in practice — the next question is how to distribute. The original approach was to do "round robin" packet distribution cycling monotonically across the set of paths, but that had the disadvantage of requiring the router to maintain state of the path last used.
Next, as each new destination address came appeared, it was assigned to an output interface, usually selected monotonically. That didn't work well in practice, because not all destinations had the same traffic.
The most successful approach has been to hash source-destination pairs, and then use this to select paths. This is also called flow-based distribution, and also may include the IP protocol type and quality of service flags.
Source-destination hash also works well at the data link layer. At transport and above, there are a number of policies. I'll go get a copy of one of my books, where I took several pages to discuss this, and adapt it. Howard C. Berkowitz 20:02, 17 October 2009 (UTC)

APPROVED Version 1.0

Draft Version 2.0

This is a draft version in which I have added lots of editorial footnotes, to be deleted in the finished version. I'm following Einstein's rule here - Make things as simple as possible, but no simpler. I'm not an expert in routing. Tell me if I've made any of this too simple. --David MacQuigg 17:34, 16 October 2009 (UTC)

This is a copy of my sandbox draft, still full of editorial notes and footnotes. Pick and chose what you like. Mainly, I wanted to emphasize the simpler, more fundamental points, and put them closer to the top in the intro. --David MacQuigg 21:25, 19 October 2009 (UTC)

Let me answer some of Dave's questions in the sandbox draft. " Do we have to assume all the routers in a load-balancing setup are anycast? Should we answer this question or change the figure?"
I'm not completely clear what you mean by a router being anycast. At least in IPv4, the routers don't really know about anycasting. In practice, most router configuration software would complain if you put identical addresses on more than one of its interfaces. If you had three DNS servers with the same address, they'd probably be on three separate routers, each individually configured with the same interface address for the server. There could be a thousand other routers that receive three route advertisements, and, ignoring some very special cases, just look at the route cost and pick the one with least cost. Is that what you mean by "Isn't this how a traditional router works? Why do we need anycast for fault tolerance?"
The fault tolerance question is separate. Let me re-phrase the first question on load balancing. With reference to Figure 1, let's say we want the load from clients 1 and 2 to be balanced at servers A, B, and C. If routers 1, 2, and 3 do not support anycasting, server A will get 100% of the load from client 1, server B will get nothing, and server C will get 100% of the load from client 2.
Now let's assume router 1 supports anycasting, but router 2 does not. How can router 1 include server C in its load balancing, if every packet it sends in that direction goes through router 2, where it must be sent to server B?
The second question is - Why do we need anycasting for fault tolerance? In Figure 3, when server A goes down, doesn't a normal router try to find an alternate path, in this case to server B, which it thinks is just another route to server A? --David MacQuigg 19:36, 20 October 2009 (UTC)
To answer the second part of the immediate question, you want anycast because you don't want the hosts to care about the DNS server address. You have three DNS servers with the same address. If you are familiar with VRRP, that's a similar approach of getting to the default router on a subnet: the hosts don't know there's more than one router with the same IP address. Howard C. Berkowitz 19:07, 20 October 2009 (UTC)
Sorry, I'm not familiar with VRRP. There has to be a way we can explain this topic to students who are not expert in routing protocols. --David MacQuigg 19:36, 20 October 2009 (UTC)
I'll have to write up VRRP, which, to make the nuances worse, is a router but not a routing protocol. :-( Now, the students do have to know some very basic IP principles, such as the local vs. remote assumption: go to the default router if the destination address isn't on your subnet.
Remember also that the hosts are not routing protocol aware.
One of the key points about IPv4 anycast routing is that the routers don't even know it exists. It's a configuration technique by which the same /32 host address is configured on, and advertised by, different routers. Let's say you have a domain in which you code the address, in a NS RR record, of 192.0.2.1/32. All the resolvers will try to get information from this.
Assume standard routing, with only one instance of 192.0.2.1, but sufficient physical diversity that routers on different sides of your campus have different paths (i.e., next hop router addresses) to it. If one external path goes down, the router connected to it will probably find a route that goes intra-campus to the other router and out.
Now assume there are two instances. Routers A and B still have different next hops, but the next hop from router A takes the packet to the server 192.0.2.1 in Tucson while B goes to the 192.0.2.1 in Flagstaff (you're in Phoenix, right?).
From the standpoint of the resolvers, they don't know the difference between the two cases, but the second one protects you against the loss of a server and/or router, while the first doesn't allow for redundant nameservers--the host would have to try a different IP address. I could add additional nameserver instances without the hosts, or routers other than those directly connected to the new instance, knowing anything about it. No configuration changes required! Howard C. Berkowitz 20:21, 20 October 2009 (UTC)
OK, I think I understand the answer to the fault tolerance question. Fault tolerance occurs as a result of normal router action, nothing special about anycasting. That still leaves me puzzled about load balancing. Back to Figure 1, if routers 1, 2, and 3 have only the normal "simplistic routing mechanism", server A will get 100% of the load from client 1, server B will get nothing, and server C will get 100% of the load from client 2. How can we get the load from both clients balanced to all three servers? --David MacQuigg 22:40, 20 October 2009 (UTC)
Balanced? Well, distributed. Figure 1 would have to have a more complex topology and costs to load-distribute; otherwise, it's distributing over 2 and providing additional fault tolerance.
Yes, the individual routers operate normally, but, by the standard rule that IP addresses are unique, they are being deliberately misconfigured as a set — for good reasons, of course. There are some highly specialized cases where you'd need to pay a bit more attention to the routing protocol configuration if the path cost to two servers is equal; I think that's outside our scope here. Howard C. Berkowitz 22:49, 20 October 2009 (UTC)

(undent) oops. I already started Virtual Router Redundancy Protocol a while back. Shall I continue? :-( Howard C. Berkowitz 04:05, 21 October 2009 (UTC)

Ah, I was confusing this with VERP (no not the definition in Urban Dictionary, the protocol Variable Envelope Return Path). The glut of acronyms is getting to be a problem.
Seriously, though, is that how anycast load balancing works? Some kind of backchannel inter-router communication that would allow load balancing (equal distribution of the load) to all three servers in the configuration of Figure 1? Without this, all we get is an imbalanced "distribution" to two servers.
To get the load properly balanced, all three routers would have to be aware of the number of servers on each path. Router 1 would send 1/3 of the packets to server A, and 2/3 to router 2. Router 2 would send 1/2 to server B, and 1/2 router 3. --David MacQuigg 12:07, 21 October 2009 (UTC)
My colleague, Paul Ferguson, pointed out some time ago that "balancing" is an unrealistic term, at least at the lower layers. What you are describing works for a TCP load balancer, but it's simply too much state for routers to consider. At a different level, the best routers can do is to distribute load among the next-hop links; they can't be aware of noncontiguous link load, etc. The state of the art is to set up traffic-engineered paths using MPLS and allocate traffic per flow/forwarding equivalence class to paths. There can be failover on the paths.
Not retaining state in routers goes right to the End-to-End Principle. Now, edge devices, even midboxes such as firewalls or TCP load balancers certainly can. Load-aware DNS servers can. I do have text for some of these.
There's only so much you can do in a router and still keep the needed speed. In fact, there were some experimental routing protocols that were load-aware, and, for reasons I can discuss, just didn't work. Admission control to paths at edge routers, yes. Per-packet in the core, no. Howard C. Berkowitz 14:19, 21 October 2009 (UTC)
I'm thoroughly confused. I've updated footnotes 3 and 4, and I'll try to work on this some more later. --David MacQuigg 00:40, 22 October 2009 (UTC)