Document Tree

Central Data Structure: Document Tree

The central data structure is the document tree. It is the same data structure commonly found in digital document systems (HTML calls it a "DOM"), with the standard set of navigation and tree management functions (adding, removing, querying children, and so on). Moreso than most such data structures, the document tree aims to remain simple and lightweight, as free as possible of special cases for efficiency or particular document formats.

Similar to other systems but perhaps with more militancy, the document tree directly represents the structure of a document. In the canonical example of a structured book, the tree root is labeled "book" and has children labeled "chapter", chapters are divided into sections, which in turn are divided into subsections, and so on. This is a straight parse tree for XML, a normalized parse tree for SGML, and a corrected parse tree for HTML (which adheres to the DTD, if possible given the errors in the page). For scanned paper, for which extraction of semantic information is an open research problem, the most structured representation might be physically based, with the page divided into regions (of type text or image), with text regions divided into paragraphs, paragraphs into lines, and lines into words.

Internal nodes, which have child nodes, capture structural hierarchy. The usual root of the tree is an internal node type called Document, which has a URL, a style sheet, scrollable content, and holds the list of document-specific behaviors, if any.

Medium-specific qualities of a document format are encapsulated in leaves: depending on its subtype a leaf can paint an image, a word from a manual page, or a scanned paper word (clipped image). Formats that exercise more typographical may need more work to present leaves based on logical units. For example, T_EX DVI uses subscripts and superscripts and kerning frequently, as in "L^AT_EX2e" and mathematics. While the DVI format has no notion of baseline or word, and simply moves to a place and paints, all the parts of such words should be grouped together, with whatever spans and special leaf types are necessary. With logical units as the bedrock, behaviors such as search, full-text search, general cursor movement , select and paste, and so on, do not each have to check for special cases. (Presently, each word of text is given its own leaf node. This simplifies linebreaking and aligning words of a line flush to both margins, and aids hit detection; however, very long unpaginated documents use a lot of memory.)

The division of document content into medium-independent internal nodes and medium-dependent leaves allows developers to write behaviors against an idealized abstract document tree and have the behavior operate on any concrete document format.

Node Properties
name (tag name in XML)
links to parent and (internal node only) children
bounding box, baseline
horizontal and vertical alignment
float side (left/right/none)
observers
Special Cases for Efficiency
dirty bit
span summary

Node Properties
name (tag name in XML)
links to parent and (internal node only) children
bounding box, baseline
horizontal and vertical alignment
float side (left/right/none)
observers
Special Cases for Efficiency
dirty bit
span summary

All nodes have the following properties, as summarized at right: name (the tag name in XML), baseline, horizontal alignment (top/bottom/middle/none), vertical alignment (left/right/center/none), and float side (left/right/none). Nodes carry a list of "observer" behaviors, which as described in the sections on high- and low-level communications notify those behaviors of any activity in the subtree. Two exceptions to the policy of no special cases are made for the sake of efficiency, the formatting dirty bit and the span summary.

In addition to representing the document's structure, tree nodes contain physical layout information, namely the bounding box of their contents. This allows behaviors to easily move between structural and physical representations. A search behavior could, say, exploit structure to examine only captions, and then display the results physically, scrolling and highlighting the hits. Another behavior could convert from a format expressed in semantic markup, such as XML, into a physically-based format, such as PDF.

Thus, in many respects the document tree is quite ordinary. But as the examples below demonstrate it is powerful, and since developers can introduce new node types as easily as behaviors, which is a level of extension seldom found in scripting languages, this versatility can be expanded.

Example: GUI.

A typical editor's graphical user interface, whether for text or CAD or other document type, has a menubar, a set of buttons on various toolbars and palettes, various other controls, and finally a large "canvas" area for display and direct manipulation of the document. This canvas is highly specialized, similar to other widgets only in that it is a rectangle that can be drawn upon.

The Multivalent Browser reverses this relationship. All widgets are simply specialized nodes, and everything is displayed in a master document tree — the graphical user interface (GUI) as well as the content of documents per se — both descending from a common absolute root node. Some "words" happen to have specialized behavior: click on some words and they display a different document (that's a hyperlink), click on other words and they fire a semantic event (that's a button). Rather than a parallel set of widget layout functions, the GUI uses the same layout already developed for documents, such as HTML's table. Because GUI widgets are ordinary nodes, they are controlled by style sheets.

We have implemented as node types a set of essentials widgets: button, checkbutton, radiobutton, scrollbar, scrolled pane, menu, type-in box, and dialog box. For HTML forms, where content and user interface are mixed, this approach results in an especially clean representation.

Example: embedded documents.

Since document content and widgets use the same uniform model, one type can easily embed the other. In other widget toolkits, button labels and dialog box text is usually displayed by a simple text layout, probably with an images allowed in fixed places, but it is limited.

Multivalent widgets can embed full HTML documents (or manual pages or scanned paper, should that ever prove useful), giving multifont text with images, even video and annotations. This same embedding is used in editable notes, which by simple reuse of existing node types are scrolled, multifont, annotatable. Likewise, dialog boxes that ask for input are HTML forms, whose values are processed internally rather than being sent to a server. Preferences setting will be implemented analogously. Furthermore, all buttons can be embedded in menus as opposed to requiring a parallel set of menu-specific versions as in other GUI toolkits. Through simple node reuse, long menus are scrollable.

Example: visual layers.

Internal nodes control the display of their subtrees. Ordinarily, nodes simply transform the coordinate space to be relative to itself. The scrolled pane node further adjusts for the current settings of vertical and horizontal scrollbars.

Visual layers are implemented simply by adding an internal node under a document's content root, and all of its children are drawn on top of document content. Dialog boxes and menus use this method. The simplest visual layer makes no further coordinate transformations, leaving its children at absolute locations on the document. Another type of node maintains its children at a fixed location on the screen by reversing the scrollbar settings. Portals and zooming could be implemented similarly.