org.apache.uima.internal.util
Class XMLUtils

java.lang.Object
  extended by org.apache.uima.internal.util.XMLUtils

public abstract class XMLUtils
extends java.lang.Object

Some utilities for working with XML.


Constructor Summary
XMLUtils()
           
 
Method Summary
static int checkForNonXmlCharacters(char[] ch, int start, int length, boolean xml11)
          Check the input character array for non-XML characters.
static int checkForNonXmlCharacters(java.lang.String s)
          Check the input string for non-XML 1.0 characters.
static int checkForNonXmlCharacters(java.lang.String s, boolean xml11)
          Check the input string for non-XML characters.
static org.w3c.dom.Element getChildByTagName(org.w3c.dom.Element aElem, java.lang.String aName)
          Gets the first child of the given Element with the given tag name.
static org.w3c.dom.Element getFirstChildElement(org.w3c.dom.Element aElem)
          Gets the first child of the given Element.
static java.lang.String getText(org.w3c.dom.Element aElem)
          Gets the text of this Element.
static java.lang.String getText(org.w3c.dom.Element aElem, boolean aExpandEnvVarRefs)
          Gets the text of this Element.
static void normalize(java.lang.String aStr, java.lang.StringBuffer aResultBuf)
          Normalizes the given string for output to XML.
static void normalize(java.lang.String aStr, java.lang.StringBuffer aResultBuf, boolean aNewlinesToSpaces)
          Normalizes the given string for output to XML.
static java.lang.Object readPrimitiveValue(org.w3c.dom.Element aElem)
          Reads a primitive value from its standard DOM representation.
static void writeNormalizedString(java.lang.String aStr, java.io.Writer aWriter, boolean aNewlinesToSpaces)
          Normalizes the given string for output to XML, and writes the normalized string to the given Writer.
static void writePrimitiveValue(java.lang.Object aObj, org.xml.sax.ContentHandler aContentHandler)
          Writes a standard XML representation of the specified Object, in the form:
<className>string value%lt;/className%gt;
static void writePrimitiveValue(java.lang.Object aObj, java.io.Writer aWriter)
          Writes a standard XML representation of the specified Object, in the form:
<className>string value%lt;/className%gt;
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

XMLUtils

public XMLUtils()
Method Detail

normalize

public static void normalize(java.lang.String aStr,
                             java.lang.StringBuffer aResultBuf)
Normalizes the given string for output to XML. This converts all special characters, e.g. <, %gt;, &, to their XML representations, e.g. &lt;, &gt;, &amp;. The normalized string is appended to the specified StringBuffer.

Parameters:
aStr - input string
aResultBuf - the StringBuffer to which the normalized string will be appended

normalize

public static void normalize(java.lang.String aStr,
                             java.lang.StringBuffer aResultBuf,
                             boolean aNewlinesToSpaces)
Normalizes the given string for output to XML. This converts all special characters, e.g. <, %gt;, &, to their XML representations, e.g. &lt;, &gt;, &amp;. Also may convert newlines to spaces, depending on the aNewlinesToSpaces parameter. The normalized string is appended to the specified StringBuffer.

Parameters:
aStr - input string
aNewlinesToSpaces - iff true, newlines (\r and \n) will be converted to spaces
aResultBuf - the StringBuffer to which the normalized string will be appended

writeNormalizedString

public static void writeNormalizedString(java.lang.String aStr,
                                         java.io.Writer aWriter,
                                         boolean aNewlinesToSpaces)
                                  throws java.io.IOException
Normalizes the given string for output to XML, and writes the normalized string to the given Writer. Normalization converts all special characters, e.g. <, %gt;, &, to their XML representations, e.g. &lt;, &gt;, &amp;. Also may convert newlines to spaces, depending on the aNewlinesToSpaces parameter.

Parameters:
aStr - input string
aWriter - a Writer to which the normalized string will be written
aNewlinesToSpaces - iff true, newlines (\r and \n) will be converted to spaces
Throws:
java.io.IOException - if an I/O failure occurs when writing to aWriter

writePrimitiveValue

public static void writePrimitiveValue(java.lang.Object aObj,
                                       java.io.Writer aWriter)
                                throws java.io.IOException
Writes a standard XML representation of the specified Object, in the form:
<className>string value%lt;/className%gt;

where className is the object's java class name without the package and made lowercase, e.g. "string","integer", "boolean" and string value is the result of Object.toString().

This is intended to be used for Java Strings and wrappers for primitive value classes (e.g. Integer, Boolean).

Parameters:
aObj - the object to write
aWriter - a Writer to which the XML will be written
Throws:
java.io.IOException - if an I/O failure occurs when writing to aWriter

writePrimitiveValue

public static void writePrimitiveValue(java.lang.Object aObj,
                                       org.xml.sax.ContentHandler aContentHandler)
                                throws org.xml.sax.SAXException
Writes a standard XML representation of the specified Object, in the form:
<className>string value%lt;/className%gt;

where className is the object's java class name without the package and made lowercase, e.g. "string","integer", "boolean" and string value is the result of Object.toString().

This is intended to be used for Java Strings and wrappers for primitive value classes (e.g. Integer, Boolean).

Parameters:
aObj - the object to write
aContentHandler - the SAX ContentHandler to which events will be sent
Throws:
org.xml.sax.SAXException - if the ContentHandler throws an exception

getChildByTagName

public static org.w3c.dom.Element getChildByTagName(org.w3c.dom.Element aElem,
                                                    java.lang.String aName)
Gets the first child of the given Element with the given tag name.

Parameters:
aElem - the parent element
aName - tag name of the child to retrieve
Returns:
the first child of aElem with tag name aName, null if there is no such child.

getFirstChildElement

public static org.w3c.dom.Element getFirstChildElement(org.w3c.dom.Element aElem)
Gets the first child of the given Element.

Parameters:
aElem - the parent element
Returns:
the first child of aElem, null if it has no children.

readPrimitiveValue

public static java.lang.Object readPrimitiveValue(org.w3c.dom.Element aElem)
Reads a primitive value from its standard DOM representation. (This is the representation produced by writePrimitiveValue(Object, ContentHandler).

This is intended to be used for Java Strings and wrappers for primitive value classes (e.g. Integer, Boolean).

Parameters:
aElem - the element representing the value
Returns:
the value that was read, null if a primitive value could not be constructed from the element

getText

public static java.lang.String getText(org.w3c.dom.Element aElem)
Gets the text of this Element. Leading and trailing whitespace is removed.

Parameters:
aElem - the element
Returns:
the text of aElem

getText

public static java.lang.String getText(org.w3c.dom.Element aElem,
                                       boolean aExpandEnvVarRefs)
Gets the text of this Element. Leading and trailing whitespace is removed. Environment variable references of the form <envVarRef%gt;PARAM_NAME</envVarRef> may be expanded.

Parameters:
aElem - the element
aExpandEnvVarRefs - whether to expand environment variable references. Defaults to false.
Returns:
the text of aElem

checkForNonXmlCharacters

public static final int checkForNonXmlCharacters(java.lang.String s)
Check the input string for non-XML 1.0 characters. If non-XML characters are found, return the position of first offending character. Else, return -1.

From the XML 1.0 spec:

   Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] // any Unicode
    character, excluding the surrogate blocks, FFFE, and FFFF.  
 

And from the UTF-16 spec:

Characters with values between 0x10000 and 0x10FFFF are represented by a 16-bit integer with a value between 0xD800 and 0xDBFF (within the so-called high-half zone or high surrogate area) followed by a 16-bit integer with a value between 0xDC00 and 0xDFFF (within the so-called low-half zone or low surrogate area).

Parameters:
s - Input string
Returns:
The position of the first invalid XML character encountered. -1 if no invalid XML characters found.

checkForNonXmlCharacters

public static final int checkForNonXmlCharacters(java.lang.String s,
                                                 boolean xml11)
Check the input string for non-XML characters. If non-XML characters are found, return the position of first offending character. Else, return -1.

The definition of an XML character is different for XML 1.0 and 1.1. This method will check either version, depending on the value of the xml11 argument.

From the XML 1.0 spec:

   Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] // any Unicode
    character, excluding the surrogate blocks, FFFE, and FFFF.  
 

From the XML 1.1 spec:

  Char     ::=    [#x1-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
 

And from the UTF-16 spec:

Characters with values between 0x10000 and 0x10FFFF are represented by a 16-bit integer with a value between 0xD800 and 0xDBFF (within the so-called high-half zone or high surrogate area) followed by a 16-bit integer with a value between 0xDC00 and 0xDFFF (within the so-called low-half zone or low surrogate area).

Parameters:
s - Input string
xml11 - true to check for invalid XML 1.1 characters, false to check for invalid XML 1.0 characters. The default is false.
Returns:
The position of the first invalid XML character encountered. -1 if no invalid XML characters found.

checkForNonXmlCharacters

public static final int checkForNonXmlCharacters(char[] ch,
                                                 int start,
                                                 int length,
                                                 boolean xml11)
Check the input character array for non-XML characters. If non-XML characters are found, return the position of first offending character. Else, return -1.

Parameters:
ch - Input character array
start - offset of first char to check
length - nunmber of chars to check
xml11 - true to check for invalid XML 1.1 characters, false to check for invalid XML 1.0 characters. The default is false.
Returns:
The position of the first invalid XML character encountered. -1 if no invalid XML characters found.
See Also:
checkForNonXmlCharacters(String, boolean)


Copyright © 2011. All Rights Reserved.