Class Amalgamation

java.lang.Object
  |
  +--Amalgamation
All Implemented Interfaces:
java.lang.Cloneable

public class Amalgamation
extends java.lang.Object
implements java.lang.Cloneable

An Amalgamation is formed by concatenating all the TEXT_NODEs that are descendants of a node for which spec.isContainer() or spec.isSubcontainer() returns true. An Amalgamation of a container, however, does not contain text in subcontainers of that container. An amalgamation contains text in one language only.

Version:
$03/02/22 06:44:12 mdh$
Author:
Malcolm D. Hyman

Field Summary
 org.w3c.dom.Node aNode
           
 java.lang.String lang
           
 boolean markGrapheme
           
 boolean notFullyPopulated
           
 int[] offsets
           
 DocSpec spec
           
 java.lang.String text
           
 
Constructor Summary
Amalgamation(org.w3c.dom.Node node, DocSpec spec, java.lang.String lang, boolean forceRecursive)
           
 
Method Summary
 java.lang.Object clone()
          Creates a clone (deep copy) of this Amalgamation.
 void filterText()
          Translates text using a display filter.
 java.lang.String getLanguage()
          Returns the language code for the Amalgamation.
 org.w3c.dom.Node getNode()
          Returns the node on which this is an Amalgamation.
 java.lang.String getNodeText(org.w3c.dom.Node node, boolean recursive)
          Returns the contents of all TEXT_NODEs in the subtree under node.
 int[] getOffsetTable()
          Returns the offset table that maps offsets in a source string onto offsets in the text of this Amalgamation.
 java.lang.String getOriginalText()
          Returns the original (unnormalized, unfiltered) text for this amalgamation.
 Span getSpan(int start, int end)
          Returns a span on a source string corresponding to the range start..end in text.
 int length()
          Returns the length of text.
 void normalizeOrthography()
          Convenience method for doing orthographic normalization with no rule suffix.
 void normalizeOrthography(java.lang.String suffix)
          Rewrites text using a set of orthographic normalization rules.
 void setOffset(int tOfs, int sOfs)
          Adds a reference that maps an offset sOfs in a source string onto an offset tOfs in the text of this Amalgamation.
 java.lang.String toString()
          Returns the text of this Amalgamation.
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

text

public java.lang.String text

lang

public java.lang.String lang

offsets

public int[] offsets

notFullyPopulated

public boolean notFullyPopulated

markGrapheme

public boolean markGrapheme

aNode

public org.w3c.dom.Node aNode

spec

public DocSpec spec
Constructor Detail

Amalgamation

public Amalgamation(org.w3c.dom.Node node,
                    DocSpec spec,
                    java.lang.String lang,
                    boolean forceRecursive)
Method Detail

getNodeText

public java.lang.String getNodeText(org.w3c.dom.Node node,
                                    boolean recursive)
Returns the contents of all TEXT_NODEs in the subtree under node.
Parameters:
node - root of subtree
recursive - work recursively on subtree below given node

setOffset

public void setOffset(int tOfs,
                      int sOfs)
Adds a reference that maps an offset sOfs in a source string onto an offset tOfs in the text of this Amalgamation.
Parameters:
tOfs - offset in text
sOfs - offset in source string

getOffsetTable

public int[] getOffsetTable()
Returns the offset table that maps offsets in a source string onto offsets in the text of this Amalgamation.
Returns:
offset table

toString

public java.lang.String toString()
Returns the text of this Amalgamation.
Overrides:
toString in class java.lang.Object
Returns:
text

getSpan

public Span getSpan(int start,
                    int end)
Returns a span on a source string corresponding to the range start..end in text.
Parameters:
start - start offset in text
end - end offset in text
Returns:
span on source string

length

public int length()
Returns the length of text.
Returns:
length of text.

normalizeOrthography

public void normalizeOrthography(java.lang.String suffix)
Rewrites text using a set of orthographic normalization rules.

WARNING: Do not call normalizeOrthography() and then set offsets with setOffset(). Doing so will likely trash the offset table!

Parameters:
suffix - orthographic rule suffix

normalizeOrthography

public void normalizeOrthography()
Convenience method for doing orthographic normalization with no rule suffix.

filterText

public void filterText()
Translates text using a display filter.

getOriginalText

public java.lang.String getOriginalText()
Returns the original (unnormalized, unfiltered) text for this amalgamation.
Returns:
original text

getLanguage

public java.lang.String getLanguage()
Returns the language code for the Amalgamation.
Returns:
language code as per ISO 639

getNode

public org.w3c.dom.Node getNode()
Returns the node on which this is an Amalgamation.
Returns:
aNode

clone

public java.lang.Object clone()
Creates a clone (deep copy) of this Amalgamation.
Overrides:
clone in class java.lang.Object