next up previous
Next: The Donatus WTAG Document Up: The Donatus XML-RPC Interface Previous: XML-RPC

Subsections

API Methods

The current version of the Donatus XML-RPC API defines two methods:

donatus.analyze(wtag-data)

Here wtag-data is a simplified XML format used to abstract the linguistic content of a document from its structure. Words are pretokenized and normalized in this format. The format is described in the next section of this document. The wtag-data must be base 64 encoded and passed to Donatus as a parameter of the XML-RPC datatype base64.

This method returns a struct with three members:

  1. morphData (type: base64): morphological data of the Donatus morphology document type (described below), base 64 encoded.

  2. unparsedURL (type: string): a URL from which unanalyzed forms may be obtained. These data will be of the Arboreal termlist document type. They should be retrieved immediately, because they are considered transient data; after a short time, the URL may no longer be valid, as the data will be flushed from the server to conserve disk space. (The expiration time for these data is set by the server administrator.)

  3. code (type: i4/int): result code for the operation, in the range 0..3, with the following semantics:

    0 -- success
    1 -- invalid parameters (the method was not called correctly)
    2 -- server misconfiguration (the Donatus server is not correctly configured)
    3 -- transient server error

    A client may wish to try the call again in the case of result 3, which may occur if Donatus times out in accessing certain resources. Result type 2 should be considered irrecoverable, and type 1 indicates a fault on the part of the client.

donatus.addEntries(lang, user-id, {lemma, infl-form,
morph-label}
[...])

Here lang is a language identifier, as assigned in ISO 639-1/639-2 ``Codes for the Representation of Names of Languages.''1 The user-id parameter is used for the purpose of generating metadata that track revisions to the system. No validation will be performed on this ID. The remaining data consist of one or more morphological triples consisting of lemma, infl-form, morph-label; that is, a lemma (or ``basic form'' or ``citation form'' or ``headword), an inflected form, and a morphological labeling (or analysis) of the inflected form. (More abstractly, the morphological triple may be considered a mapping between an inflected form and the pair {lemma, morph-label}, where morph-label specifies the set of morphosemantic/morphosemantic features that are realized on infl-form.) An instance of a morphological triple for Latin is (muscipulum, muscipulo, N neut abl/dat sg); i.e. the Latin form muscipulo is the ablative or dative singular form of the neuter noun muscipulum `mousetrap'. Donatus makes no assumptions about the form of morphological labels (morph-label), and allows the structure of these to vary arbitrarily across various languages and backends.

In order to avoid the overhead of multiple method calls (a problem that is, needless to say, exacerbated when dealing with remote network calls), Donatus allows the bundling of entries; multiple morphological triples may be submitted in a single call. Note that, since a morphological triple involves three distinct parameters, a call to donatus.addEntries must always be accompanied by 3n + 2 parameters, where $n \geq 1$.

This method returns a struct with the members code (see the explanation of result codes above) and message (a string describing the action taken by Donatus, suitable for display in an end-user application, e.g. as a message dialog).


next up previous
Next: The Donatus WTAG Document Up: The Donatus XML-RPC Interface Previous: XML-RPC
Malcolm D. Hyman 2004-04-07