Class OrthographicRules

java.lang.Object
  |
  +--OrthographicRules

public class OrthographicRules
extends java.lang.Object

This class provides pluggable rules for orthographic normalization, with offset tracking.

Version:
$2007-04-30 03:30:17 mdh$
Author:
Malcolm D. Hyman

Field Summary
static java.lang.String IT_CONS
           
static java.lang.String IT_VOWELS
           
 int[] offsets
           
 
Constructor Summary
OrthographicRules(java.lang.String ruleset)
          Constructor.
 
Method Summary
 int[] getOffsetTable()
          Returns the offset table.
static void main(java.lang.String[] argv)
          We provide main() so that our services will be available outside Java (i.e., so we can run as a Un*x-style filter).
 java.lang.String normalize(java.lang.String s)
          Applies the normalization rules in ruleset to s, without offset tracking.
 java.lang.String normalize(java.lang.String s, int[] offsets)
          Applies the normalization rules in ruleset to s, with offset tracking.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

offsets

public int[] offsets

IT_VOWELS

public static final java.lang.String IT_VOWELS

IT_CONS

public static final java.lang.String IT_CONS
Constructor Detail

OrthographicRules

public OrthographicRules(java.lang.String ruleset)
Constructor.
Parameters:
ruleset - name of rule set to apply
Method Detail

normalize

public java.lang.String normalize(java.lang.String s,
                                  int[] offsets)
Applies the normalization rules in ruleset to s, with offset tracking.

WARNING: Arboreal will not work properly if a normalization substitution replaces a source character with more than two target characters! This is simply a BUG, and should be fixed. Fortunately, however, one does not often need such a replacement.

FIXME: If the orthographic rules eliminate all characters in a word, word counting in ContentRenderPane will not work correctly!

Parameters:
s - source string
offsets - character offset table
Returns:
normalized string

normalize

public java.lang.String normalize(java.lang.String s)
Applies the normalization rules in ruleset to s, without offset tracking.
Parameters:
s - source string
Returns:
normalized string

getOffsetTable

public int[] getOffsetTable()
Returns the offset table.
Returns:
offset table

main

public static void main(java.lang.String[] argv)
We provide main() so that our services will be available outside Java (i.e., so we can run as a Un*x-style filter).