next up previous
Next: Application and Evaluation Up: The Challenge of the Previous: The Challenge

Software Platform

We describe a software platform designed as a first step towards meeting these challenges. This software platform has been developed in the context of the Archimedes Project, an international initiative to create a digital research library for the history of mechanics funded by the National Science Foundation in the United States. Although our software has been developed with a view to solving the problems arising in the course of work on this project, it is not tied to the requirements of any particular area of historical scholarship. Our software platform has three principal components, all of which are freely available in the Internet at http://archimedes.fas.harvard.edu:

  1. The Pollux system provides a unified means of access to dictionaries, or any other reference work that is organized by alphabetized headwords, in any natural language. The software is designed to make it possible for users to add new lexica with a minimum of effort.

  2. The Donatus system provides a unified frontend to a variety of morphological analysis software and databases.4 Morphological services are provided both through a Remote Procedure Call (RPC) interface that can be utilized by specialized user applications and through a CGI interface that is accessible in any web browser. Morphological data can be represented in XML, allowing them to be cached on client systems and to be processed by a wide range of software. Backend systems that have already been incorporated include Morpheus, a morphological analyzer for ancient Greek, Latin, and Italian developed by the Perseus Project;5 the CELEX Linguistic Database for Dutch, English, and German developed by the Center for Linguistic Information of the Max Planck Institute for Psycholinguistics in Nijmegen;6 and the Xerox finite-state morphological analyzer for Arabic developed by the Xerox Research Centre Europe (XRCE) in Grenoble.7 Work is currently underway to integrate a morphological analyzer for Sanskrit that is being developed at Brown University and one for Sumerian being developed at the University of Pennsylvania. In addition to providing access to pre-existing linguistic data, Donatus allows for the dynamic extension of morphological datasets by a user.

  3. The Arboreal user agent is a powerful and flexible tool for content-based access to and annotation of XML texts. Arboreal includes special features for working with parallel versions of texts, morphology and terminology, and linked images. Integrated language support is currently provided for Latin, Greek, Arabic, Chinese, languages written in cuneiform, and major western European languages. Arboreal supports many standards and is designed as a cross-platform tool that can be used on many different computing systems. Distributions are currently available for Mac OS X, Windows, and GNU/Linux. We envision Arboreal as a prototype for the next-generation web user agent, which closely integrates content browsing and content creation. Arboreal allows for highly flexible navigation of any XML document, using two document views that are presented side-by-side. One pane displays a tree view of the document; the user can control the level of detail shown in the tree by expanding and collapsing nodes and sets of nodes. The other pane offers a detail view of the portions that are selected in the tree. Both views are customizable through a document description language (DDL), which we aim to extend in the next phase of the project. Powerful search capabilities are available, including regular expression searching, lemmatized searching (which takes advantage of morphological data generated by the Donatus system), XPath queries, and the ability to search in an orthographically normalized representation of the text (e.g. a query for the Latin word `uectis' will also find `vectis'). Arboreal can be customized to work with any natural language, by supplying a description of the language in an XML langspec. This description makes possible language-specific features such as word detection and allows for various language-specific views to be defined (e.g. Romanization of a non-Roman script, or fully-voweled vs. non-voweled Arabic script). Close integration with the linguistic services provided by Donatus and the Pollux reference system is provided, allowing the user to access morphological analyses and dictionary entries for any word in a text.


next up previous
Next: Application and Evaluation Up: The Challenge of the Previous: The Challenge
Malcolm D. Hyman 2004-03-12