next up previous
Next: Comments and Suggestions Up: The Donatus XML-RPC Interface Previous: The Donatus WTAG Document

The Donatus Morphology Document Type

Morphology documents have a root element <morphology>. Their namespace is http://archimedes.fas.harvard.edu/ns/morphology/2. (The final portion of the URI path indicates the version of the Morphology Document grammar.)

Two types entries appear below <morphology>; these correspond to context-free and context-sensitive morphological analyses.

  1. Context-free: the element is <lemma>, which takes two attributes, form (the standard citation form) and lang (the ISO 639 language specifier). E.g.:
    <lemma form="actio" lang="la">
    <definition>a putting in motion</definition>
    <variant form="actio">
    <analysis desc="N fem nom/voc sg" xlink:type="simple"/>
    <variant form="actionem">
    <analysis desc="N fem acc sg" xlink:type="simple"/>
    <variant form="actiones">
    <analysis desc="N fem acc pl" xlink:type="simple"/>
    <analysis desc="N fem nom/voc pl" xlink:type="simple"/>
    <variant form="actionibus">
    <analysis desc="N fem abl pl" xlink:type="simple"/>
    <analysis desc="N fem dat pl" xlink:type="simple"/>
    <variant form="actionis">
    <analysis desc="N fem gen sg" xlink:type="simple"/>
    </lemma>

  2. Context-sensitive: the element is <context-form>. Thus
    <context-form lang="de" xlink:href="dex.xml#s2">
    <tokens>
    <token count="1" form="baut"/>
    <token count="2" form="auf"/>
    </tokens>
    <analysis xlink:href="dex.morph.xml#de000029"/>
    </context-form>
    This format allows not just for context-sensitive morphology, but also for multiwords and lexical constituents that are realized discontinuously.

    The entry contains two XLinks: (1) to the container of the source text in which the morphological form in context is displayed; (2) a link to the morphological analysis (either in this file, or in another). For the link to the morphological analysis, Donatus will assign a document-unique identifier to the <analysis> element. Linking to the <analysis> uniquely specifies both the <variant> and <lemma>, which will be the parent and grandparent respectively of the <analysis> element.

    Words (and their parts) are referred to by a token-of-a-type counting scheme. Thus the XML fragment above refers to the boldfaced words in the text under the container with the ID s2: Die Technik setzt die Natur vielfach voraus und baut auf ihr auf. The analysis ID de000029 links to the boldfaced element in the following XML fragment:

    <lemma form="aufbauen" lang="de">
    <definition>to build</definition>
    <variant form="baut...auf">
    <analysis desc="3SIE,2PIE" id="de000029"/>
    </variant>
    </lemma

Morphology files returned by Donatus will include an internal DTD that is the canonical definition of this document type.


next up previous
Next: Comments and Suggestions Up: The Donatus XML-RPC Interface Previous: The Donatus WTAG Document
Malcolm D. Hyman 2004-04-07