Distributed computing: Difference between revisions

From Citizendium
Jump to navigation Jump to search
imported>Robert Tito
mNo edit summary
imported>Nick Johnson
No edit summary
Line 2: Line 2:
[[Category:Mathematics Workgroup|Mathematics]]
[[Category:Mathematics Workgroup|Mathematics]]


In [[computer science]], '''distributed computation''' refers to a strategy for improving the speed of highly [[parallel computation|parallelizable]] tasks by distributing pieces of the problem across many [[computers]] that together form a distributed computer.  Unlike [[cluster (Computer Science)|clusters]], the computers in a distributed computer may be distributed over large [[networks]], and may be owned by many people or institutions.
In [[computer science]], '''distributed computation''' is a strategy for improving the speed of highly [[parallel computation|parallelizable]] tasks by distributing pieces of the problem across many [[computers]] that together form a distributed computer.   
 
Unlike [[cluster (Computer Science)|clusters]], the computers in a distributed computer may be distributed over large [[networks]], and may be owned by many people or institutions.  The computers may even be donated time on home computers which communicated with a central control via the [[internet]].


== Network Topology ==
== Network Topology ==


A distributed computer system generally employs one or more ''master'' computers, and very many ''worker'' computers.  The master computer's role is to break the problem into a series of smaller problems (work loads) and to send these to participating workers.  The workers then perform the work and send the results back to the master computer.
A distributed computer system generally employs one or more ''master'' computers, and very many ''worker'' computers.  The master computer's role is to break the problem into a series of smaller problems (work units) and to send these to participating workers.  The workers then perform the work and send the results back to the master computer.
 
== Goals and advantages ==
There are many different types of distributed computing systems and many challenges to overcome in successfully designing one. The main goal of a distributed computing system is to connect users and resources in a [[Transparency (computing)|transparent]], open, and [[scalable]] way. Ideally this arrangement is drastically more [[fault tolerant]] and more powerful than many combinations of [[stand-alone]] computer systems.
 
=== Openness ===
Openness is the property of distributed systems such that each subsystem is continually open to interaction with other systems (see references).  [[Web Services]] protocols are standards which enable distributed systems to be extended and scaled. In general, an open system that scales has an advantage over a perfectly closed and self-contained system.
 
Consequently, open distributed systems are required to meet the following challenges:
 
; Monotonicity
: Once something is published in an open  system, it cannot be taken back. 
; Pluralism
: Different subsystems of an open distributed system include heterogeneous, overlapping and possibly conflicting information.  There is no central arbiter of truth in open distributed systems.
; Unbounded nondeterminism
: Asynchronously, different subsystems can come up and go down and communication links can come in and go out between subsystems of an open distributed system.  Therefore the time that it will take to complete an operation cannot be bounded in advance (see [[unbounded nondeterminism]]).
 
=== Scalability ===
{{main|Scalability}}
A scalable system is one that can easily be altered to accommodate changes in the number of users, resources and computing entities affected to it. Scalability can be measured in three different dimensions:
 
; Load scalability
: A distributed system should make it easy for us to expand and contract its resource pool to accommodate heavier or lighter loads.
; Geographic scalability
: A geographically scalable system is one that maintains its usefulness and usability, regardless of how far apart its users or resources are.
; Administrative scalability
: No matter how many different organizations need to share a single distributed system, it should still be easy to use and manage.
 
Some loss of performance may occur in a system that allows itself to scale in one or more of these dimensions. There is a limit up to which we can scale/add processors to the system, and above that the performance of the system degrades.
 
== Drawbacks and disadvantages ==
{{see also|Fallacies of Distributed Computing}}
If not planned properly, a distributed system can decrease the overall reliability of computations if the unavailability of a node can cause a disruption of the other nodes.  [[Leslie Lamport]] describes this type of distributed system fragility like this: "You know you have one when the crash of a computer you've never heard of stops you from getting any work done."{{Fact|date=January 2007}}<!-- This is widely attributed to him. I can't find a primary source - though absence of proof is not proof of absence. --~~~~ -->
 
Troubleshooting and diagnosing problems in a distributed system can also become more difficult, because the analysis may now require connecting to remote nodes or inspecting communications being sent between nodes.
 
Not many types of computation are well-suited for distributed environments, due typically to the amount of network communication or synchronization that would be required between nodes.  If bandwidth, latency, or communication requirements are too significant, then the benefits of distributed computing may be negated and the performance may be worse than a non-distributed environment.


Additionally,  the performance of a distributed computer does not grow linearly with the number of computational units supplied; a distributed computer does not become twice as fast as the number of computers doubles.


== Famous Examples ==
== Famous Examples ==


* [[SETI@Home]] is perhaps the most famous example of distributed computation. It is comprised of more than one million computers, of varying [[computer architecture|architectures]] and [[operating systems|platforms]], and is dedicated to computing [[fourier transform|Fourier Transforms]] on data recieved from [[radio telescope|radio telescopes]].
* [[SETI@Home]] is perhaps the most famous example of distributed computation. It is comprised of more than one million computers, of varying [[computer architecture|architectures]] and [[operating systems|platforms]], and is dedicated to computing [[fourier transform|Fourier Transforms]] on data recieved from [[radio telescope|radio telescopes]].

Revision as of 15:09, 26 February 2007


In computer science, distributed computation is a strategy for improving the speed of highly parallelizable tasks by distributing pieces of the problem across many computers that together form a distributed computer.

Unlike clusters, the computers in a distributed computer may be distributed over large networks, and may be owned by many people or institutions. The computers may even be donated time on home computers which communicated with a central control via the internet.

Network Topology

A distributed computer system generally employs one or more master computers, and very many worker computers. The master computer's role is to break the problem into a series of smaller problems (work units) and to send these to participating workers. The workers then perform the work and send the results back to the master computer.

Goals and advantages

There are many different types of distributed computing systems and many challenges to overcome in successfully designing one. The main goal of a distributed computing system is to connect users and resources in a transparent, open, and scalable way. Ideally this arrangement is drastically more fault tolerant and more powerful than many combinations of stand-alone computer systems.

Openness

Openness is the property of distributed systems such that each subsystem is continually open to interaction with other systems (see references). Web Services protocols are standards which enable distributed systems to be extended and scaled. In general, an open system that scales has an advantage over a perfectly closed and self-contained system.

Consequently, open distributed systems are required to meet the following challenges:

Monotonicity
Once something is published in an open system, it cannot be taken back.
Pluralism
Different subsystems of an open distributed system include heterogeneous, overlapping and possibly conflicting information. There is no central arbiter of truth in open distributed systems.
Unbounded nondeterminism
Asynchronously, different subsystems can come up and go down and communication links can come in and go out between subsystems of an open distributed system. Therefore the time that it will take to complete an operation cannot be bounded in advance (see unbounded nondeterminism).

Scalability

For more information, see: Scalability.

A scalable system is one that can easily be altered to accommodate changes in the number of users, resources and computing entities affected to it. Scalability can be measured in three different dimensions:

Load scalability
A distributed system should make it easy for us to expand and contract its resource pool to accommodate heavier or lighter loads.
Geographic scalability
A geographically scalable system is one that maintains its usefulness and usability, regardless of how far apart its users or resources are.
Administrative scalability
No matter how many different organizations need to share a single distributed system, it should still be easy to use and manage.

Some loss of performance may occur in a system that allows itself to scale in one or more of these dimensions. There is a limit up to which we can scale/add processors to the system, and above that the performance of the system degrades.

Drawbacks and disadvantages

See also: Fallacies of Distributed Computing

If not planned properly, a distributed system can decrease the overall reliability of computations if the unavailability of a node can cause a disruption of the other nodes. Leslie Lamport describes this type of distributed system fragility like this: "You know you have one when the crash of a computer you've never heard of stops you from getting any work done."Template:Fact

Troubleshooting and diagnosing problems in a distributed system can also become more difficult, because the analysis may now require connecting to remote nodes or inspecting communications being sent between nodes.

Not many types of computation are well-suited for distributed environments, due typically to the amount of network communication or synchronization that would be required between nodes. If bandwidth, latency, or communication requirements are too significant, then the benefits of distributed computing may be negated and the performance may be worse than a non-distributed environment.

Additionally, the performance of a distributed computer does not grow linearly with the number of computational units supplied; a distributed computer does not become twice as fast as the number of computers doubles.

Famous Examples