Class ExplicitWordTokenizer
java.lang.Object
|
+--ExplicitWordTokenizer
- public class ExplicitWordTokenizer
- extends java.lang.Object
Adds explicit tags around word tokens in a source document by employing
language-specific knowledge of word segmentation. The normalized form
a word is added on an attribute.
- Version:
- $2004-06-26 04:04:58 mdh$
- Author:
- Malcolm D. Hyman
|
Method Summary |
void |
flushOutputProgram(org.w3c.dom.Node p,
org.w3c.dom.Document outDoc)
Executes stored instructions that generate new nodes in a target
DOM. |
org.w3c.dom.Document |
getGeneratedDoc()
Returns the generated document, with explicitly tagged words. |
java.lang.StringBuffer |
grabText(org.w3c.dom.Node n,
java.lang.StringBuffer buf)
Grabs all text below an element and stores it in a working buffer. |
void |
processContainer(org.w3c.dom.Node n,
org.w3c.dom.Node p,
org.w3c.dom.Document outDoc)
Recursively process nodes beneath a container, adding explicit
word tags. |
void |
recursivelyProcess(org.w3c.dom.Node n,
org.w3c.dom.Node p,
org.w3c.dom.Document outDoc)
Recursively process nodes, adding explicit word tags. |
org.w3c.dom.Document |
tagWords(org.w3c.dom.Document inDoc)
Explicitly tags words in containers, using language-specific word
tokenization behaviors. |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
ExplicitWordTokenizer
public ExplicitWordTokenizer(ContentOwner owner)
getGeneratedDoc
public org.w3c.dom.Document getGeneratedDoc()
- Returns the generated document, with explicitly tagged words.
- Returns:
- result document
tagWords
public org.w3c.dom.Document tagWords(org.w3c.dom.Document inDoc)
- Explicitly tags words in containers, using language-specific word
tokenization behaviors.
- Parameters:
inDoc - source DOM
recursivelyProcess
public void recursivelyProcess(org.w3c.dom.Node n,
org.w3c.dom.Node p,
org.w3c.dom.Document outDoc)
- Recursively process nodes, adding explicit word tags.
- Parameters:
n - node to processp - parent of noutDoc - generated document
processContainer
public void processContainer(org.w3c.dom.Node n,
org.w3c.dom.Node p,
org.w3c.dom.Document outDoc)
- Recursively process nodes beneath a container, adding explicit
word tags.
- Parameters:
n - node to processp - parent of noutDoc - generated document
flushOutputProgram
public void flushOutputProgram(org.w3c.dom.Node p,
org.w3c.dom.Document outDoc)
- Executes stored instructions that generate new nodes in a target
DOM.
- Parameters:
p - node under which generated nodes will be insertedoutDoc - generated document
grabText
public java.lang.StringBuffer grabText(org.w3c.dom.Node n,
java.lang.StringBuffer buf)
- Grabs all text below an element and stores it in a working buffer.
- Parameters:
n - ancestor node of textbuf - working buffer