Generic document structure objects

Two parts:

data structures representing the document structure
text formatting

It's important to keep the distinction between the two. Representations for the first are independant of representations for the second.

I'd like the first (data structures) to be something that can easily and directly translate between languages, container formats, databases, etc. The second (formating) should primarily be unified across all data structure types and formats, but there might be alternative representations for XHTML.

We can address the formatting first:

bold text

tags for XHTML

italic text

tags for XHTML

underline text

tags for XHTML

strikethrough text

~~tags for XHTML~~

monospaced text

<m> tags for XHMTL? </m>

enumerated lists

tags in XHTML

itemized lists

tags in XHTML

description lists

tags in XHTML

tex inline math

$ "tag" in TeX

references (both internal and external)

Ensure that we allow unicode characters and maybe XHTML character entities?

The data structure should be easy to put into YAML, JSON, XML. It should be directly representable by nested hashes and arrays in javascript and perl.

Object types:

Document: contains information about the entire document and points to contained Sections (as pre-matter, main body, and appendices).
Section: Contains content in the form of Paragraphs, Figures, Tables, Code Fragments, Equations.
Paragraph: Contains text content.
Figure: Contains image content.
Table: Contains tabular format (data).
Code: Contains code listings.
Equation: Contains non-inline TeX equations (or mathml I suppose).

For each item, it could contain verbatim text, formatted text, data, image blobs, URIs to find the information, etc., as appropriate for the type.

Most of these objects have id fields, specifying a way to refer to the given section, figure, etc.
Most have title fields, which would be used to identify the section, etc, in tables of contents, figures, bibliographies, etc., as well as titles in the section text itself.
A url field may be present which would be used to get the data for the section.
Alternatively, a blob field could be present, where the section's data would be directly included in the data structure.
If either the url or blob field is present, a type field would specify how to interpret the data at the URL, or the data in the blob. The default type depends on the type of object and the fields that the data is in.
Some objects may have caption fields to show by the figure/table/etc.
Some objects may contain other fields to represent the data in a different way.

Generic document structure objects

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools