Class ExplicitWordTokenizer

java.lang.Object
  |
  +--ExplicitWordTokenizer

public class ExplicitWordTokenizer
extends java.lang.Object

Adds explicit tags around word tokens in a source document by employing language-specific knowledge of word segmentation. The normalized form a word is added on an attribute.

Version:
$2004-06-26 04:04:58 mdh$
Author:
Malcolm D. Hyman

Constructor Summary
ExplicitWordTokenizer(ContentOwner owner)
           
 
Method Summary
 void flushOutputProgram(org.w3c.dom.Node p, org.w3c.dom.Document outDoc)
          Executes stored instructions that generate new nodes in a target DOM.
 org.w3c.dom.Document getGeneratedDoc()
          Returns the generated document, with explicitly tagged words.
 java.lang.StringBuffer grabText(org.w3c.dom.Node n, java.lang.StringBuffer buf)
          Grabs all text below an element and stores it in a working buffer.
 void processContainer(org.w3c.dom.Node n, org.w3c.dom.Node p, org.w3c.dom.Document outDoc)
          Recursively process nodes beneath a container, adding explicit word tags.
 void recursivelyProcess(org.w3c.dom.Node n, org.w3c.dom.Node p, org.w3c.dom.Document outDoc)
          Recursively process nodes, adding explicit word tags.
 org.w3c.dom.Document tagWords(org.w3c.dom.Document inDoc)
          Explicitly tags words in containers, using language-specific word tokenization behaviors.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ExplicitWordTokenizer

public ExplicitWordTokenizer(ContentOwner owner)
Method Detail

getGeneratedDoc

public org.w3c.dom.Document getGeneratedDoc()
Returns the generated document, with explicitly tagged words.
Returns:
result document

tagWords

public org.w3c.dom.Document tagWords(org.w3c.dom.Document inDoc)
Explicitly tags words in containers, using language-specific word tokenization behaviors.
Parameters:
inDoc - source DOM

recursivelyProcess

public void recursivelyProcess(org.w3c.dom.Node n,
                               org.w3c.dom.Node p,
                               org.w3c.dom.Document outDoc)
Recursively process nodes, adding explicit word tags.
Parameters:
n - node to process
p - parent of n
outDoc - generated document

processContainer

public void processContainer(org.w3c.dom.Node n,
                             org.w3c.dom.Node p,
                             org.w3c.dom.Document outDoc)
Recursively process nodes beneath a container, adding explicit word tags.
Parameters:
n - node to process
p - parent of n
outDoc - generated document

flushOutputProgram

public void flushOutputProgram(org.w3c.dom.Node p,
                               org.w3c.dom.Document outDoc)
Executes stored instructions that generate new nodes in a target DOM.
Parameters:
p - node under which generated nodes will be inserted
outDoc - generated document

grabText

public java.lang.StringBuffer grabText(org.w3c.dom.Node n,
                                       java.lang.StringBuffer buf)
Grabs all text below an element and stores it in a working buffer.
Parameters:
n - ancestor node of text
buf - working buffer