net.sf.saxon.dotnet
Class DotNetRegexTranslator

java.lang.Object
  extended by net.sf.saxon.regex.RegexTranslator
      extended by net.sf.saxon.regex.SurrogateRegexTranslator
          extended by net.sf.saxon.dotnet.DotNetRegexTranslator

public class DotNetRegexTranslator
extends SurrogateRegexTranslator

This class translates XML Schema regex syntax into .NET regex syntax. Author: James Clark, Thai Open Source Software Center Ltd. See statement at end of file. Modified by Michael Kay (a) to integrate the code into Saxon, (b) to support XPath additions to the XML Schema regex syntax, (c) to target the .NET regex syntax instead of JDK 1.4

This version of the regular expression translator treats each half of a surrogate pair as a separate character, translating anything in an XPath regex that can match a non-BMP character into a Java regex that matches the two halves of a surrogate pair independently. This approach doesn't work under JDK 1.5, whose regex engine treats a surrogate pair as a single character.

This translator is currently used for Saxon on .NET 1.1. It's almost the same as the JDK 1.4 version, except that it avoids use of the "&&" operator, which isn't available on .NET.


Nested Class Summary
 
Nested classes/interfaces inherited from class net.sf.saxon.regex.SurrogateRegexTranslator
SurrogateRegexTranslator.BackReference, SurrogateRegexTranslator.CharClass, SurrogateRegexTranslator.CharRange, SurrogateRegexTranslator.Complement, SurrogateRegexTranslator.Dot, SurrogateRegexTranslator.Empty, SurrogateRegexTranslator.Property, SurrogateRegexTranslator.SimpleCharClass, SurrogateRegexTranslator.SingleChar, SurrogateRegexTranslator.WideSingleChar
 
Nested classes/interfaces inherited from class net.sf.saxon.regex.RegexTranslator
RegexTranslator.Range
 
Field Summary
 
Fields inherited from class net.sf.saxon.regex.SurrogateRegexTranslator
categoryCharClasses, subCategoryCharClasses
 
Fields inherited from class net.sf.saxon.regex.RegexTranslator
ALL, captures, caseBlind, curChar, currentCapture, eos, ignoreWhitespace, inCharClassExpr, isXPath, length, NONE, NOT_ALLOWED_CLASS, pos, regExp, result, SOME, SURROGATES1_CLASS, SURROGATES2_CLASS, xmlVersion
 
Constructor Summary
DotNetRegexTranslator()
          Create a regular expression translator for the .NET platform
 
Method Summary
 int getNumberOfCapturedGroups()
          Get the number of captured groups for this regular expression
static void main(java.lang.String[] args)
          Convenience main method for testing purposes.
 java.lang.String translate(java.lang.CharSequence regExp, int xmlVersion, boolean xpath, boolean ignoreWhitespace, boolean caseBlind)
          Translates a regular expression in the syntax of XML Schemas Part 2 into a regular expression in the syntax of java.util.regex.Pattern.
protected  boolean translateAtom()
           
 
Methods inherited from class net.sf.saxon.regex.RegexTranslator
absorbSurrogatePair, advance, copyCurChar, expect, highSurrogateRanges, isAsciiAlnum, isBlock, isJavaMetaChar, lowSurrogateRanges, makeException, makeException, parseQuantExact, recede, sortRangeList, translateBranch, translateQuantifier, translateQuantity, translateRegExp, translateTop
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DotNetRegexTranslator

public DotNetRegexTranslator()
Create a regular expression translator for the .NET platform

Method Detail

translate

public java.lang.String translate(java.lang.CharSequence regExp,
                                  int xmlVersion,
                                  boolean xpath,
                                  boolean ignoreWhitespace,
                                  boolean caseBlind)
                           throws RegexSyntaxException
Translates a regular expression in the syntax of XML Schemas Part 2 into a regular expression in the syntax of java.util.regex.Pattern. The translation assumes that the string to be matched against the regex uses surrogate pairs correctly. If the string comes from XML content, a conforming XML parser will automatically check this; if the string comes from elsewhere, it may be necessary to check surrogate usage before matching.

Parameters:
regExp - a String containing a regular expression in the syntax of XML Schemas Part 2
xmlVersion - the version of XML in use - this affects the meanings of the \i and \c character class escapes
xpath - a boolean indicating whether the XPath 2.0 F+O extensions to the schema regex syntax are permitted
ignoreWhitespace - true if the x flag is set, allowing ignorable whitespace in the regex
caseBlind - true if the i flag is set, allowing case blind comparisons
Returns:
a String containing a regular expression in the syntax of java.util.regex.Pattern
Throws:
RegexSyntaxException - if regexp is not a regular expression in the syntax of XML Schemas Part 2, or XPath 2.0, as appropriate
See Also:
Pattern, XML Schema Part 2

getNumberOfCapturedGroups

public int getNumberOfCapturedGroups()
Get the number of captured groups for this regular expression

Returns:
the number of captured groups

translateAtom

protected boolean translateAtom()
                         throws RegexSyntaxException
Specified by:
translateAtom in class RegexTranslator
Throws:
RegexSyntaxException

main

public static void main(java.lang.String[] args)
                 throws RegexSyntaxException
Convenience main method for testing purposes. Note that the actual testing is done using the Java regex engine.

Parameters:
args: - (1) the regex, (2) xpath|schema, (3) target string to be matched
Throws:
RegexSyntaxException