|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.commons.csv.CSVParser
public class CSVParser
Parses CSV files according to the specified configuration.
Because CSV appears in many different dialects, the parser supports many
configuration settings by allowing the specification of a CSVStrategy
.
Parsing of a csv-string having tabs as separators, '"' as an optional value encapsulator, and comments starting with '#':
String[][] data = (new CSVParser(new StringReader("a\tb\nc\td"), new CSVStrategy('\t','"','#'))).getAllValues();
Parsing of a csv-string in Excel CSV format
String[][] data = (new CSVParser(new StringReader("a;b\nc;d"), CSVStrategy.EXCEL_STRATEGY)).getAllValues();
Internal parser state is completely covered by the strategy and the reader-state.
see package documentation for more details
Nested Class Summary | |
---|---|
(package private) static class |
CSVParser.Token
Token is an internal token representation. |
Field Summary | |
---|---|
private CharBuffer |
code
|
private static java.lang.String[] |
EMPTY_STRING_ARRAY
Immutable empty String array. |
private ExtendedBufferedReader |
in
|
private static int |
INITIAL_TOKEN_LENGTH
length of the initial token (content-)buffer |
private java.util.ArrayList |
record
A record buffer for getLine(). |
private CSVParser.Token |
reusableToken
|
private CSVStrategy |
strategy
|
protected static int |
TT_EOF
Token (which can have content) when end of file is reached. |
protected static int |
TT_EORECORD
Token with content when end of a line is reached. |
protected static int |
TT_INVALID
Token has no valid content, i.e. |
protected static int |
TT_TOKEN
Token with content, at beginning or in the middle of a line. |
private CharBuffer |
wsBuf
|
Constructor Summary | |
---|---|
CSVParser(java.io.InputStream input)
Deprecated. use CSVParser(Reader) . |
|
CSVParser(java.io.Reader input)
CSV parser using the default CSVStrategy . |
|
CSVParser(java.io.Reader input,
char delimiter)
Deprecated. use CSVParser(Reader,CSVStrategy) . |
|
CSVParser(java.io.Reader input,
char delimiter,
char encapsulator,
char commentStart)
Deprecated. use CSVParser(Reader,CSVStrategy) . |
|
CSVParser(java.io.Reader input,
CSVStrategy strategy)
Customized CSV parser using the given CSVStrategy |
Method Summary | |
---|---|
private CSVParser.Token |
encapsulatedTokenLexer(CSVParser.Token tkn,
int c)
An encapsulated token lexer Encapsulated tokens are surrounded by the given encapsulating-string. |
java.lang.String[][] |
getAllValues()
Parses the CSV according to the given strategy and returns the content as an array of records (whereas records are arrays of single values). |
java.lang.String[] |
getLine()
Parses from the current point in the stream til the end of the current line. |
int |
getLineNumber()
Returns the current line number in the input stream. |
CSVStrategy |
getStrategy()
Obtain the specified CSV Strategy |
private boolean |
isEndOfFile(int c)
|
private boolean |
isEndOfLine(int c)
Greedy - accepts \n and \r\n This checker consumes silently the second control-character... |
private boolean |
isWhitespace(int c)
|
protected CSVParser.Token |
nextToken()
Convenience method for nextToken(null) . |
protected CSVParser.Token |
nextToken(CSVParser.Token tkn)
Returns the next token. |
java.lang.String |
nextValue()
Parses the CSV according to the given strategy and returns the next csv-value as string. |
private int |
readEscape(int c)
|
CSVParser |
setStrategy(CSVStrategy strategy)
Deprecated. the strategy should be set in the constructor CSVParser(Reader,CSVStrategy) . |
private CSVParser.Token |
simpleTokenLexer(CSVParser.Token tkn,
int c)
A simple token lexer Simple token are tokens which are not surrounded by encapsulators. |
protected int |
unicodeEscapeLexer(int c)
Decodes Unicode escapes. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
private static final int INITIAL_TOKEN_LENGTH
protected static final int TT_INVALID
protected static final int TT_TOKEN
protected static final int TT_EOF
protected static final int TT_EORECORD
private static final java.lang.String[] EMPTY_STRING_ARRAY
private final ExtendedBufferedReader in
private CSVStrategy strategy
private final java.util.ArrayList record
private final CSVParser.Token reusableToken
private final CharBuffer wsBuf
private final CharBuffer code
Constructor Detail |
---|
public CSVParser(java.io.InputStream input)
CSVParser(Reader)
.
CSVStrategy
.
input
- an InputStream containing "csv-formatted" streampublic CSVParser(java.io.Reader input)
CSVStrategy
.
input
- a Reader containing "csv-formatted" inputpublic CSVParser(java.io.Reader input, char delimiter)
CSVParser(Reader,CSVStrategy)
.
CSVStrategy
except for the delimiter setting.
input
- a Reader based on "csv-formatted" inputdelimiter
- a Char used for value separationpublic CSVParser(java.io.Reader input, char delimiter, char encapsulator, char commentStart)
CSVParser(Reader,CSVStrategy)
.
input
- a Reader based on "csv-formatted" inputdelimiter
- a Char used for value separationencapsulator
- a Char used as value encapsulation markercommentStart
- a Char used for comment identificationpublic CSVParser(java.io.Reader input, CSVStrategy strategy)
CSVStrategy
input
- a Reader containing "csv-formatted" inputstrategy
- the CSVStrategy used for CSV parsingMethod Detail |
---|
public java.lang.String[][] getAllValues() throws java.io.IOException
The returned content starts at the current parse-position in the stream.
java.io.IOException
- on parse error or input read-failurepublic java.lang.String nextValue() throws java.io.IOException
java.io.IOException
- on parse error or input read-failurepublic java.lang.String[] getLine() throws java.io.IOException
java.io.IOException
- on parse error or input read-failurepublic int getLineNumber()
protected CSVParser.Token nextToken() throws java.io.IOException
nextToken(null)
.
java.io.IOException
protected CSVParser.Token nextToken(CSVParser.Token tkn) throws java.io.IOException
tkn
- an existing Token object to reuse. The caller is responsible to initialize the
Token.
java.io.IOException
- on stream access errorprivate CSVParser.Token simpleTokenLexer(CSVParser.Token tkn, int c) throws java.io.IOException
tkn
- the current tokenc
- the current character
java.io.IOException
- on stream access errorprivate CSVParser.Token encapsulatedTokenLexer(CSVParser.Token tkn, int c) throws java.io.IOException
tkn
- the current tokenc
- the current character
java.io.IOException
- on invalid stateprotected int unicodeEscapeLexer(int c) throws java.io.IOException
c
- current char which is discarded because it's the "\\" of "\\uXXXX"
java.io.IOException
- on wrong unicode escape sequence or read errorprivate int readEscape(int c) throws java.io.IOException
java.io.IOException
public CSVParser setStrategy(CSVStrategy strategy)
CSVParser(Reader,CSVStrategy)
.
public CSVStrategy getStrategy()
private boolean isWhitespace(int c)
private boolean isEndOfLine(int c) throws java.io.IOException
java.io.IOException
private boolean isEndOfFile(int c)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |