Document Object Model (XML)

From Citizendium
Jump to navigation Jump to search
This article is developing and not approved.
Main Article
Discussion
Definition [?]
Related Articles  [?]
Bibliography  [?]
External Links  [?]
Citable Version  [?]
 
This editable Main Article is under development and subject to a disclaimer.

The Domain Object Model (XML), sometimes just called the XML DOM, is a high-level interface used by programming language parsers for manipulating documents in eXtensible Markup Language (XML).[1] The W3C Consortium concluded its work on it, but application programming interface (API) continues in other groups.

Formally,

The Document Object Model is a platform- and language-neutral interface that will allow programs and scripts to dynamically access and update the content, structure and style of documents. The document can be further processed and the results of that processing can be incorporated back into the presented page.[1]

What does DOM do?

At the cost of relatively high demands on memory and processing, it can make it much easier for programmers to go directly to parts of the complex XML data struture, without having to traverse the complex tree that makes up XML document.

The DOM aims to make it easy for programmers to access components and to delete, add, or edit their content, attributes and style. In essence, the DOM makes it possible for programmers to write applications that work properly on all browsers and servers and on all platforms. While programmers may need to use different programming languages, they do not need to change their programming model.[2]

Once there, it can do such things as add or delete child nodes, modify remaining nodes, etc., without the programmer having to keep track of every detail of state.

When a document is in DOM structure, it becomes a hierarchy of nodes:

  • parent nodes with child nodes
  • leaf nodes that must be terminal points in the hierarchy.

If, during processing, a parent node gains or loses child nodes, the memory-resident DOM automatically changes the underlying document structure. In a database, of course, these changes would, at some point, need to be committed.

Alternatives

DOM's model is to load the entire XML document into memory, and then traverse the tree as needed. DOM is best when there will be multiple operations against the document, since the resources to load the document amortize over the number of manipulations required. It is far more efficient than SAX for incorporating results of processing back into the document.

The Simple API for XML Processing (SAX), may outperform DOM when the requirement is to make single passes through XML documents, looking for specific content and, perhaps, updating from an ordered source. Its low-level behavior makes it economical of processor and memory resources, but it puts considerable responsibility on the programmer to maintain state.

SAX always loads the entire document and takes one pass through it, which means it cannot backtrack as can DOM.

Application programming interfaces

To maintain the goal of DOM API compatibility across end user scripting languages and professional programming languages (of various vintages), the APIs need to work with a wide range of memory management systems, ranging from those (Java) with automatic garbage collection,[3] to explicit memory management (C, C++).

DOM APIs are more likely to be true interfaces, in a procedural programming sense, than object classes. APIs can hide the details of memory organization in either fully object-oriented applications with their own class structure, or legacy applications that are not OO.

To be sure that the high-level APIs will be available, the language bindings developed by the DOM working Group (ECMAScript/JavaScript and Java) does not deal with memory management, but groups dealing with other language bindings will need to find their solutions.

C and C++ have a model of memory that is too different from that of DOM, for there to be direct bindings. One approach was taken by the Apache HTTP Server project's Xerces-C libraries support the DOM approach to XML parsing.[4]. Apache's Xerces-C libraries usethe DOM approach for XML parsing.

References