As described in "Behaviors",
a
Media adaptors treat their data in one of three ways: as a byte stream, as a character stream, or as a random-access file. New media adaptors should subclass, respectively, multivalent.MediaAdaptorByte, multivalent.MediaAdaptorReader, or multivalent.MediaAdaptorFile.
The essential method for media adaptors is parse(INode)
.
In fact, often a media adaptor is little more than subclassing
and implementing this method.
A media adaptor can assume that its input has been established externally,
either a java.io.InputStream byte stream, java.io.Reader character stream,
or java.io.File, which is created when a MediaAdaptorFile first invokes
getFile()
. Then the parse(INode)
method is invoked
with a node of the document tree to which to attach the content
of the document being read.
parse(INode)
is usually a big loop
that reads from the document format
and creates a runtime representation in multivalent.Node
s.
As far as possible, the runtime document tree should be slavishly
faithful, preserving all information found in the file format.
Structure or hierarchy should be represented as internal nodes;
content such as text or images should be represented as leaves.
Presentation / appearance should be captured as a stylesheet if possible,
or as multivalent.Span
s if not.
If spans start and end points can be separated by an arbitrary distance,
as in HTML, multivalent.Span
's
open(Node)
and close(Node)
can be a
convenient way to attach spans to content.
Metadata, such as author and dates, should be stored in the closest
containing multivalent.Document
.
Most media adaptors make a "top node" or "document root"
of their own, underneath the passed Node,
and give it a tag/name that's that same as the document format;
this is a convenient way for the associated stylesheet
to affect the entire document.
Documents such as HTML that produce a long scroll should be created
in their entirety. Paginated documents, such as DVI and PDF, should
supply the page count to the prevailing document under the
Document.ATTR_PAGECNT
attribute, and should produce the single page
specified by the Document.ATTR_PAGE
attribute.
If encountering an unfixable/unrecoverable parsing error,
usually due to an invalid data format, throw a multivalent.ParseException
.
java.io.IOException
s are not parsing errors, and
should be reported as I/O errors.
The node passed as a parameter to parse(Node)
can be
used to obtain the prevailing/enclosing
multivalent.Document
and
multivalent.Browser
, via Node.getDocument()
and Node.getBrowser()
. However, media adaptors can be
used outside of a browser environment, as to supply parsed text
full-text indexing, and so media adaptors should not rely on the node
being connected to a larger tree.
It is recommended that media adaptors construct document trees that
directly and fully represent the document format. However, it can be
expedient to write a quick-and-dirty converter into another a document
format, such as Perl POD to HTML. In that case, the converter can
generated the target format and throw it to
MediaAdaptor.parseHelper()
to convert that to a document
tree.
Media adaptors are packaged like other behaviors, in JARs. Media adaptors usually hook into the system in a different way, however. Media adaptors want to be invoked when a document of the right type is encountered. The relationship between a document MIME type and/or file suffix and a media adaptor is established in sys/Preferences.txt. At startup the system will read its own sys/Preference.txt startup file, then those in all JARs in the same directory in an undefined order, then the one found in a user's home directory.
In sys/Preference.txt, the mediadaptor
command maps from MIME type and/or suffix to genre.
The remap
command maps from genre
to Java class. For instance, this is DVI.jar's
sys/Preferences.txt:
mediaadaptor dvi DVI mediaadaptor application/x-dvi DVI remap DVI tex.dvi.DVI
Most document formats also use a stylesheet.
At this time stylesheets that are automatically instantiated
are written in CSS.
The stylesheet is stored in the sys/stylesheet directory
under the same name as the genre, plus the suffix .css.
For example, DVI.jar has its stylesheet at
sys/stylesheet.DVI.css.
Stylesheets, especially those for media adaptors that translate
to another format and use parseHelper()
,
can import other stylesheets using the CSS @import
statement.
Stylesheets in Multivalent.jar can be retrieved
with the systemresource protocol,
as in systemresource:/sys/stylesheet/HTML.css.
The hub for the media adaptor is stored in sys/hub under the genre name, as in sys/hub/HTML.hub.