Database: Difference between revisions

From Citizendium
Jump to navigation Jump to search
imported>Howard C. Berkowitz
(New page: {{subpages}} {{TOC-right}} A '''database''' is a collection of computer-processable records, organized in some manner beyond a simple sequential file that cannot have records deleted and c...)
 
imported>Howard C. Berkowitz
Line 14: Line 14:


==Physical location==
==Physical location==
Traditional databases have all content, other than [[backup]]s, at one site. This does not preclude having users at multiple locations, only the main copy of data. For a variety of reasons ranging from fast access to local copies (i.e., distributed data) to the economies of not needing all data at all sites, there can be good reason to break away from the centralized model.
Once the data are no longer centralized, administration becomes a greater challenge. There is also a significant synchronization issue of ensuring that each instance of data is identical  in each storage system.
There are useful tradeoffs among the cost of transmission bandwidth, the cost of storage, and the cost of transmission delay in retrieving remote data. [[Multicasting]] and [[peer-to-peer]] techniques should be evaluated for each design.
===Caching===
Especially when the data does not change frequently but will be accessed multiple times, a caching architecture may be attractive. This is quite common in applications such as entertainment content distribution.
In a caching system, the first time data is needed, it is requested from the remote site, but a copy is retained locally. That local copy may have an explicit time for which it remains valid (e.g., as in the [[Domain Name System]]), or there may be a finite cache size, from which the least recently used data is first deleted to make room for more information.
===Distributed===
===Distributed===
Distributed databases put a full copy of information at each location that uses it. As long as the data are kept synchronized, this is extremely fault-tolerant. There will be minimum delays in accessing the data, but, when the data change rapidly, the cost and complexity of update and synchronization become more complex.
Note that a [[distributed file system]] does not necessarily have the complex data organization of a true database, but such filesystems often can provide the infrastructure for a distributed database.
Bibliographic information systems that are updated at periodic intervals, such as [[MEDLINE]], lend themselves to distributed storage.
===Federated===
===Federated===
In a federated database, different parts of the database reside in different locations, which does not preclude having redundant copies of those parts. Some type of [[directory service]] will be necessary to locate data, and this type of directory adds complexity beyond simple distribution. Federated databases still have the synchronization challenges of distributed databases, but also add the need for directory services.
A common commercial application would be for regional divisions of an enterprise to retain the data for their region, even distributing it within the region. When a region needs data that belongs to another region, it will treat the other region's database as a member of the federation.
While it is not a true database, the content of the [[World Wide Web]] has many of the characteristics of a federated database.
==References==
==References==

Revision as of 15:22, 18 October 2008

This article is a stub and thus not approved.
Main Article
Discussion
Related Articles  [?]
Bibliography  [?]
External Links  [?]
Citable Version  [?]
 
This editable Main Article is under development and subject to a disclaimer.

Template:TOC-right A database is a collection of computer-processable records, organized in some manner beyond a simple sequential file that cannot have records deleted and can only have records added to the end. The simplest possible database organization, indexed sequential, has records stored by some collating rule, but has a mechanism for adding records whose sequence may put them between existing records, and a mechanism for deleting records inside the database. More advanced databases have more complex ways of organizing records.

Advanced databases can have capabilities beyond complex organization. They can comply to rules for transaction processing, which require (see ACID properties) that a unit either run to completion before the database is updated, or, if the work cannot be completed, the incomplete results can be "rolled back" without changing the state of the database. Both at their logical level of organization, as well as using physical mechanisms such as Redundant Arrays of Inexpensive Disks (RAID), they can be engineered to tolerate damage to storage media, or even destruction of an entire copy.

To protect against loss of a physical copy, separate copies clearly need to exist at multiple locations. In the simplest approach, a copy remote from the main site may simply be a real-time mirror of the data, or even a sequential backup file. More complex mechanisms either have complete copies at multiple sites, or have parts of the data at different sites.

Organization

Indexed sequential

Hierarchical

Relational

Object-oriented

Physical location

Traditional databases have all content, other than backups, at one site. This does not preclude having users at multiple locations, only the main copy of data. For a variety of reasons ranging from fast access to local copies (i.e., distributed data) to the economies of not needing all data at all sites, there can be good reason to break away from the centralized model.

Once the data are no longer centralized, administration becomes a greater challenge. There is also a significant synchronization issue of ensuring that each instance of data is identical in each storage system.

There are useful tradeoffs among the cost of transmission bandwidth, the cost of storage, and the cost of transmission delay in retrieving remote data. Multicasting and peer-to-peer techniques should be evaluated for each design.

Caching

Especially when the data does not change frequently but will be accessed multiple times, a caching architecture may be attractive. This is quite common in applications such as entertainment content distribution.

In a caching system, the first time data is needed, it is requested from the remote site, but a copy is retained locally. That local copy may have an explicit time for which it remains valid (e.g., as in the Domain Name System), or there may be a finite cache size, from which the least recently used data is first deleted to make room for more information.

Distributed

Distributed databases put a full copy of information at each location that uses it. As long as the data are kept synchronized, this is extremely fault-tolerant. There will be minimum delays in accessing the data, but, when the data change rapidly, the cost and complexity of update and synchronization become more complex.

Note that a distributed file system does not necessarily have the complex data organization of a true database, but such filesystems often can provide the infrastructure for a distributed database.

Bibliographic information systems that are updated at periodic intervals, such as MEDLINE, lend themselves to distributed storage.

Federated

In a federated database, different parts of the database reside in different locations, which does not preclude having redundant copies of those parts. Some type of directory service will be necessary to locate data, and this type of directory adds complexity beyond simple distribution. Federated databases still have the synchronization challenges of distributed databases, but also add the need for directory services.

A common commercial application would be for regional divisions of an enterprise to retain the data for their region, even distributing it within the region. When a region needs data that belongs to another region, it will treat the other region's database as a member of the federation.

While it is not a true database, the content of the World Wide Web has many of the characteristics of a federated database.

References