org.apache.uima.examples.cpe
Class PersonTitleDBWriterCasConsumer
java.lang.Object
org.apache.uima.resource.Resource_ImplBase
org.apache.uima.resource.ConfigurableResource_ImplBase
org.apache.uima.collection.CasConsumer_ImplBase
org.apache.uima.examples.cpe.PersonTitleDBWriterCasConsumer
- All Implemented Interfaces:
- CasObjectProcessor, CasProcessor, CasConsumer, ConfigurableResource, Resource
public class PersonTitleDBWriterCasConsumer
- extends CasConsumer_ImplBase
A simple CAS consumer that creates a Derby (Cloudscape) database in the file system. You can
obtain this database from http://incubator.apache.org/derby/ *
This CAS Consumer takes one parameters:
OutputDirectory
- path to directory which is the "System" directory for the
derby DB.
It deletes all the databases at the system location (!!!), Creates a new database (takes the most
time - order of 10+ seconds) creates a table in the database to hold instances of the PersonTitle
annotation Adds entries for each PersonTitle annotation in each CAS to the database
To use - add derby.jar to the classpath when you start the CPE GUI - run the CPE Gui and select
the Name Recognizer and Person Title Annotator aggregate. - a good sample collection reader is
the FileSystemCollectionReader, and - a good sample data is the /examples/data
The processing is set up to handle multiple CASes. The end is indicated by using the
CollectionProcessComplete call.
Batching of updates to the database is done. The batch size is set to 50. The larger size takes
more Java heap space, but perhaps runs more efficiently.
The Table is populated with a slightly denormalized form of the data: the URI of the document is
included with every record.
Method Summary |
void |
collectionProcessComplete(ProcessTrace arg0)
Completes the processing of an entire collection. |
void |
initialize()
This method is called during initialization, and does nothing by default. |
void |
processCas(CAS aCAS)
Processes the CasContainer which was populated by the TextAnalysisEngines. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
PARAM_OUTPUTDIR
public static final java.lang.String PARAM_OUTPUTDIR
- Name of configuration parameter that must be set to the path of a directory into which the
Derby Database will be written.
- See Also:
- Constant Field Values
MAX_URI_LENGTH
public static final int MAX_URI_LENGTH
- See Also:
- Constant Field Values
MAX_TITLE_LENGTH
public static final int MAX_TITLE_LENGTH
- See Also:
- Constant Field Values
DB_LOAD_BATCH_SIZE
public static final int DB_LOAD_BATCH_SIZE
- See Also:
- Constant Field Values
PersonTitleDBWriterCasConsumer
public PersonTitleDBWriterCasConsumer()
initialize
public void initialize()
throws ResourceInitializationException
- Description copied from class:
CasConsumer_ImplBase
- This method is called during initialization, and does nothing by default. Subclasses should
override it to perform one-time startup logic.
- Overrides:
initialize
in class CasConsumer_ImplBase
- Throws:
ResourceInitializationException
- if a failure occurs during initialization.
processCas
public void processCas(CAS aCAS)
throws ResourceProcessException
- Processes the CasContainer which was populated by the TextAnalysisEngines.
In this case, the CAS is assumed to contain annotations of type PersonTitle, created with the
PersonTitleAnnotator. These Annotations are stored in a database table called PersonTitle.
- Parameters:
aCAS
- CasContainer which has been populated by the TAEs
- Throws:
ResourceProcessException
- if there is an error in processing the Resource- See Also:
CasObjectProcessor.processCas(org.apache.uima.cas.CAS)
collectionProcessComplete
public void collectionProcessComplete(ProcessTrace arg0)
throws ResourceProcessException,
java.io.IOException
- Description copied from interface:
CasProcessor
- Completes the processing of an entire collection.
- Specified by:
collectionProcessComplete
in interface CasProcessor
- Overrides:
collectionProcessComplete
in class CasConsumer_ImplBase
- Parameters:
arg0
- an object that records information, such as timing, about this method's execution.
- Throws:
ResourceProcessException
- if an exception occurs during processing
java.io.IOException
- if an I/O failure occurs- See Also:
CasProcessor.collectionProcessComplete(org.apache.uima.util.ProcessTrace)
Copyright © 2011. All Rights Reserved.