|
|||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||||
java.lang.Objectcom.ricebridge.util.PropSpec
com.ricebridge.csvman.CsvSpec
Stores settings for reading and writing CSV files.
This class describes the format of a CSV file and is used by the CSV parser to control the interpretation of the CSV data. The default settings are for the Microsoft Excel CSV format.
Predefined settings for various uses are available from the
CsvManager class. For example, CsvManager.makeUnixSpec()
returns a CsvSpec object that understands UNIX style backslash
escapes (\n for newline and so on).
To use a predefined CsvSpec or to specify your own,
use the CsvManager.setCsvSpec(com.ricebridge.csvman.CsvSpec) method.
CsvSpec objects allow you to control the interpretation of your
CSV file in a number of ways:
setSeparator(java.lang.String)setQuote(char)setEscape(char)setEndOfLine(java.lang.String)setUseComment(boolean)setUseQuote(boolean)setDoubleQuote(boolean)setNumFields(int)setTrimType(com.ricebridge.csvman.TrimType)setIgnoreBadLines(boolean)setIgnoreEmptyLines(boolean)setStartLine(long)setEscapeMap(java.util.HashMap)setMergeSeparators(boolean)setEncoding(java.lang.String)setCollectBadLines(boolean)setDataFieldMaxLength(int)setAllowQuotedLineEnds(boolean)InputStream - setCloseInputStream(boolean)When outputting CSV data, the following options are also provided:
setSeparator(java.lang.String)setQuoteType(com.ricebridge.csvman.QuoteType)setStartLine(long)setEncoding(java.lang.String)OutputStream after each line - setFlushEachLine(boolean)OutputStream - setCloseOutputStream(boolean)This class is designed to use sensible defaults where necessary. This means that setting an option may alter other options so that the settings remain consistent. The most common use-cases are used as the basis for deciding which settings have precedence. The method documentation for each setting explains any side effects.
Convenience methods are available in the CsvManager class so that it is not always
necessary to use the CsvSpec class. Each CsvManager instance contains
a CsvSpec object with the default Excel settings.
The toString() method provides a human-readable description of the settings and
the validate() method can be called to validate the current settings by checking their
consistency. A CsvManagerException is thrown if the settings are not consistent.
CsvManager,
CsvManagerException,
Serialized Form| Constructor Summary | |
CsvSpec()
Create a new CsvSpec object with Excel defaults. |
|
| Method Summary | |
boolean |
getAllowQuotedLineEnds()
Get the allow quoted line ends setting. |
boolean |
getCloseInputStream()
Get the close input stream setting. |
boolean |
getCloseOutputStream()
Get the close output stream setting. |
boolean |
getCollectBadLines()
Get the collect bad lines setting. |
char |
getComment()
Get the character used to start comments. |
boolean |
getCommentWithinLine()
Get the use comments setting. |
int |
getDataFieldMaxLength()
Get the data field maximum length setting. |
boolean |
getDoubleQuote()
Get the value of the double quote setting. |
String |
getEncoding()
Get the encoding setting. |
long |
getEndLine()
Get the end line (the last line to load) value. |
String |
getEndOfLine()
Get the end-of-line characters. |
char |
getEscape()
Get the escape character. |
HashMap |
getEscapeMap()
Get a copy of the current escape mappings. |
boolean |
getFlushEachLine()
Get the flush each line setting. |
boolean |
getIgnoreBadLines()
Get the status of the ignore bad lines setting. |
boolean |
getIgnoreEmptyLines()
Get the status of the ignore empty lines setting. |
boolean |
getMergeSeparators()
Get the status of the merge separators setting. |
int |
getNumFields()
Get the expected number of data fields per line. |
char |
getQuote()
Get the quote character. |
QuoteType |
getQuoteType()
Get the quote type. |
String |
getSeparator()
Get the separator characters. |
long |
getStartLine()
Get the start line (the first line to load) value. |
String |
getTrim()
Get the trim characters. |
TrimType |
getTrimType()
Get the trim type setting. |
boolean |
getUseComment()
Get the use comments setting. |
boolean |
getUseEscape()
Get the value of the use escape setting. |
boolean |
getUseEscapeMap()
Get the value of the use escape map setting. |
boolean |
getUseQuote()
Get the value of the use quote setting. |
boolean |
getVerbatimEndOfLine()
Get the verbatim end-of-line setting. |
void |
setAllowQuotedLineEnds(boolean pAllowQuotedLineEnds)
Allow line end characters inside quoted fields (default: true). |
void |
setCloseInputStream(boolean pCloseInputStream)
Close the InputStream after reading the CSV data (default: true). |
void |
setCloseOutputStream(boolean pCloseOutputStream)
Close the OutputStream after saving the CSV data (default: true). |
void |
setCollectBadLines(boolean pCollectBadLines)
Collect BadLine objects for later inspection (default: true). |
void |
setComment(char pComment)
Set the character to use at the start of comments (default: #). |
void |
setCommentWithinLine(boolean pCommentWithinLine)
Allow comments within lines. |
void |
setDataFieldMaxLength(int pDataFieldMaxLength)
Set the maximum allowed length in characters of a data field, or set to zero for no limit (default: 0). |
void |
setDoubleQuote(boolean pDoubleQuote)
Enable double quotes to escape a quote character (default: true). |
void |
setEncoding(String pEncoding)
Specify the character encoding to use for input and output (default: System.getProperty("file.encoding")). |
void |
setEndLine(long pEndLine)
Set the end line (the last line to load) value. |
void |
setEndOfLine(String pEndOfLine)
Set the end-of-line characters (default: \r\n). |
void |
setEscape(char pEscape)
Set the escape character (default: \ (back slash)). |
void |
setEscapeMap(HashMap pEscapeMap)
Set the mapping of escaped characters to other characters. |
void |
setFlushEachLine(boolean pFlushEachLine)
Flush the OutputStream after each line of CSV data (default: false). |
void |
setIgnoreBadLines(boolean pIgnoreBadLines)
Ignore lines with syntax errors (default: false). |
void |
setIgnoreEmptyLines(boolean pIgnoreEmptyLines)
Ignore lines that have no data (default: false). |
void |
setMergeSeparators(boolean pMergeSeparators)
Treat adjacent separator characters as one (default false). |
void |
setNumFields(int pNumFields)
Set the expected number of data fields per line. |
void |
setNumLines(long pNumLines)
Set the number of lines to load. |
void |
setQuote(char pQuote)
Set quote character (default: " (double quote)). |
void |
setQuoteType(QuoteType pQuoteType)
This setting controls the use of quotes when saving a data file (default: AsNeeded). |
void |
setSeparator(String pSeparator)
Set the separator characters (default: , (comma)). |
void |
setStartLine(long pStartLine)
Set the start line (the first line to load) value. |
void |
setTrim(String pTrim)
Set the trim characters (default: space and tab). |
void |
setTrimType(TrimType pTrimType)
Set the trim type (default: Full; both start and end of data field).. |
void |
setUseComment(boolean pUseComment)
Ignore lines that are comments. |
void |
setUseEscape(boolean pUseEscape)
Enable escaping of quote characters in data fields (default: false). |
void |
setUseEscapeMap(boolean pUseEscapeMap)
Enable use of the escape mappings (default: false). |
void |
setUseQuote(boolean pUseQuote)
Enable quoting of data fields (default: true). |
void |
setVerbatimEndOfLine(boolean pVerbatimEndOfLine)
Output the end-of-line characters exactly as specified, without platform-specific conversions (default: false). |
String |
toString()
Generate a human-friendly description of the current settings in a properties style format. |
void |
validate()
Check that settings are consistent. |
| Methods inherited from class com.ricebridge.util.PropSpec |
copyPropSpec, getBooleanProperty, getProperty, setProperty, setProperty |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Constructor Detail |
public CsvSpec()
CsvSpec object with Excel defaults.
| Method Detail |
public void setSeparator(String pSeparator)
, (comma)).
The separator character separates data fields on a single line.
You can set more than one separator character - just put them
all into the the pSeparator argument. Each separator character
is then recognised as a data field separator. For example ,;\t
will separate on comma, semi-colon and tab.
By default, each individual separator character defines a new data field,
so that repeated separators, like ,, will create two data fields.
Use setMergeSeparators(boolean) to change this behaviour and create only one data field.
Note: trim characters may not be the same as separator characters. Separator characters will
be automatically removed from the set of trim characters, both when calling this method and setTrim(java.lang.String)
pSeparator - separator charactersgetSeparator(),
setTrim(java.lang.String)public String getSeparator()
setSeparator(java.lang.String)public void setQuote(char pQuote)
" (double quote)).
The quote character defines the start and end of single data fields. Quoting data fields is usually optional, but data fields must be quoted when they contain the separator character or end of line characters as part of their value.
Quotes can be escaped by using two quote characters together.
This method of escaping the quote character is enabled by default,
but you can disable it by calling setDoubleQuote(false).
Alternatively, the quote character can be escaped by preceding it with
the escape character (default: \ (back slash)). Use
setEscape(char) to set the escape character. The escape character may not be the same as the quote character.
This method of escaping is not enabled by default, but a call to either setEscape or
setUseEscape(boolean) will enable it.
You can also disable quoting completely. In this case data files are assumed to contain data which does not require escaping, that is, data which does not contain the separator or end-of-line characters.
When saving data, you can specify how data fields are to be quoted by
using setQuoteType(com.ricebridge.csvman.QuoteType). The quote type setting is not used when loading data files.
Note: setting a quote character will also set the use quote setting to true.
The escape character cannot be the same as the quote character and separator characters may not be quote characters.
pQuote - quote charactergetQuote(),
setUseQuote(boolean),
setDoubleQuote(boolean),
setQuoteType(com.ricebridge.csvman.QuoteType),
setEscape(char)public char getQuote()
setQuote(char)public void setUseQuote(boolean pUseQuote)
true).
When true the data file may contain quoted data fields.
When false the data file is assumed not to require quoted data fields. In this case the
quote character is treated as a normal character, and no data fields should contain separator characters,
unless they are escaped.
pUseQuote - use quote settingsetQuote(char),
getUseQuote(),
setUseEscape(boolean),
setDoubleQuote(boolean),
setUseEscapeMap(boolean)public boolean getUseQuote()
setUseQuote(boolean)public void setDoubleQuote(boolean pDoubleQuote)
true).
When true the quote character can be escaped by preceding it with another quote character,
so that two quote characters are interpreted as one quote character. Thus
"" becomes " for example.
When false two adjacent quote characters inside a quoted field will
trigger an error in loading the data file. Normally you would use setUseEscape(true)
or setEscape(char) when disabling the double quote setting, so that
quotes can be escaped by the escape character instead.
pDoubleQuote - double quote settingsetQuote(char),
getDoubleQuote(),
setUseEscape(boolean),
setEscape(char)public boolean getDoubleQuote()
getDoubleQuote()public void setQuoteType(QuoteType pQuoteType)
See the QuoteType class for a description of the quoting types available.
The default is AsNeeded; quotes are only placed
around data fields that contain quote or separator characters.
pQuoteType - quote typeQuoteType,
getQuoteType()public QuoteType getQuoteType()
setQuoteType(com.ricebridge.csvman.QuoteType)public void setEscape(char pEscape)
\ (back slash)).
Escaping is not active by default, see setQuote.
The escape character escapes quote characters inside quoted data fields.
When a data field contains a quote character as part of its value
then this quote character can be escaped by placing the escape
character immediately before it. For example, \" becomes ".
The escape character can also be used to escape other characters. In fact the
default action for the escape character is simply to allow the following character
to be literally part of the data field. Thus separator characters can also be escaped, like so: \,,
and the escape character itself can be escaped, like so: \\. This means
that you do not necessarily need to use the quote character for fields that contain separator
characters.
You can also define your own escape characters, so that the character following the
escape character can be mapped into a different character. A default set of mapping is
provided: \r maps to return, \n maps to newline, \t maps to tab,
and \b maps to bell. To enable these mappings, call setUseEscapeMap(true).
To define your own escape map using setEscapeMap(java.util.HashMap), simply pass in a HashMap of
Character to Character mappings.
Note: setting an escape character will also set use escape to true.
The escape character cannot be the same as the quote character.
Also, separator characters may not be escape characters.
pEscape - escape charactergetEscape(),
setUseEscape(boolean),
setUseEscapeMap(boolean),
setEscapeMap(java.util.HashMap),
setDoubleQuote(boolean)public char getEscape()
setEscape(char)public void setUseEscape(boolean pUseEscape)
false).
When true any characters in a data field preceded by the escape character are inserted
literally into the data field value.
When false the escape character has no effect on the data field value.
See setEscape(char) for more details.
Note: setting an escape character will also set the use escape setting to true.
Note: when set to true, the use escape map setting is also set to true.
pUseEscape - use escape settingsetEscape(char),
getUseEscape()public boolean getUseEscape()
setUseEscape(boolean)public void setEscapeMap(HashMap pEscapeMap)
When the escape character is enabled with setUseEscape(boolean) or setEscape(char)
then all characters following the escape characters are interpreted literally.
For example, \t becomes t, and so on.
It is possible to specify alternative mappings for specific characters, so that
\t can become tab instead, for example. This is achieved by providing a
HashMap of Character to Character mappings.
Any characters not in the map continue to be interpreted literally.
This feature allows you to handle UNIX style escape sequences in your
data. When you call setUseEscapeMap(true), the following
default sequences are provided:
\r maps to return\n maps to newline\t maps to tab\b maps to bellYou can specify your own by using this method.
Note: calling this method sets use escape map to true.
pEscapeMap - mapping of escaped charactersgetEscapeMap(),
setEscape(char),
setUseEscape(boolean)public HashMap getEscapeMap()
setEscapeMap(java.util.HashMap)public void setUseEscapeMap(boolean pUseEscapeMap)
false).
When true escape mappings are performed on data fields.
When false escape mappings are not performed on data fields.
Note: when set to true, the use escape setting is also set to true.
pUseEscapeMap - use escape map settingsetEscapeMap(java.util.HashMap),
getUseEscapeMap()public boolean getUseEscapeMap()
setUseEscape(boolean)public void setEndOfLine(String pEndOfLine)
\r\n).
The end-of-line of characters mark the end of a data line. In order to enable multiple end-of-line markers for various platforms, the end-of-line characters are interpreted as follows: any combination and any number of non-repeating adjacent end-of-line characters mark the end of a line. As soon as any one end-of-line character is repeated, this is interpreted as a new line.
This means that with the default the set of characters (\r\n),
Windows (\r\n), UNIX (\n), and Mac (\r) line ends are all recognised.
You can set your own end-of-line characters for special cases
When data is output, the end-of-line setting is placed verbatim at the end of each line. This means that by default
\r\n is output at the end of each line. Use the CsvManager.makeUnixSpec() and CsvManager.makeMacSpec()
factory methods to get CsvSpec objects with appropriate line feeds for your platform, or simple set your own:
setEndOfLine("\n");, for example.
pEndOfLine - end-of-line character sequencegetEndOfLine()public String getEndOfLine()
setEndOfLine(java.lang.String)public void setMergeSeparators(boolean pMergeSeparators)
false).
When true, separator characters that are next to each other are merged
into one separator character. Thus ,, will only create one data field.
When false, separator characters that are next to each other are remain separate,
so that ,, will create two data fields.
This setting operates on the entire list of separator characters so
that ,; will also only create one data field if the separators
are set to ,;.
pMergeSeparators - set true to merge separatorssetSeparator(java.lang.String),
getMergeSeparators()public boolean getMergeSeparators()
setMergeSeparators(boolean)public void setNumFields(int pNumFields)
When the number of data fields is less than the expected number, the remaining fields are returned empty. When the number of data fields is greater than the expected number, the extra data fields are appended to the list of data fields, so that no data is lost. In this case the expected number of data fields is increased to the greater number found, and remains so for the rest of the loading of the file.
If the number of data fields is zero (the default), the expected number is set to to the number of fields found on the first line. If longer lines are subsequently encountered, the number of fields is set to the longest line found so far.
Empty lines are returned with the expected number of empty data fields, not zero data fields.
pNumFields - number of expected data fieldsgetNumFields(),
setIgnoreEmptyLines(boolean)public int getNumFields()
setNumFields(int)public void setStartLine(long pStartLine)
By default, the start line is set to 1, the first line. The start line value starts from 1, not 0, and identifies the line from which to start loading data.
For example, if the CSV file contains a header line with field names, and you wish to ignore this line, the set the start line value to 2, which means that data will be loaded from the second line.
Use the setEndLine(long) method to stop loading data at a certain line. Use the setNumLines(long)
method to load an exact number of lines.
The start line setting also affects saved data. When you save, data lines will only be written from the start line.
Note: if you have previously called setNumLines then the end line is adjusted to
account for this. See setNumLines(long) for more detail.
pStartLine - start line value (from 1, not 0)setEndLine(long),
setNumLines(long)public long getStartLine()
setStartLine(long)public void setEndLine(long pEndLine)
By default the end line is set to -1. This means that there is no end line and all lines will be loaded. You can reset the end line to -1 for this default behaviour.
To restrict the number of lines to load, use this method to specify the last line to load. For example, in a CSV file containing 10 lines, you can specify the end line as 5 and then only lines 1 to 5 will be loaded.
The end line value is inclusive. This means that setting an end line value of, say, 10, means that the end line will be the tenth line found. The end value, (and also the start value) starts from 1, not 0. This also means that the number of lines read is equal to end line - start line + 1, that is, if the start line is 1 and the end line is 3, then lines 1, 2 and 3 are read: 3 - 1 + 1 = 3.
The start line can also be specified with setStartLine(long), so that only subset of the lines are loaded.
Instead of setEndLine you can also use setNumLines(long). This is just a convenience method
to set the end line relative to the start line.
The end line setting also affects saved data. When you save, data lines will only be written until the end line (inclusive).
Note: calling this method will reset the number of lines to load setting to 0, see setNumLines(long) for more details.
pEndLine - end line value (inclusive)public long getEndLine()
setEndLine(long)public void setNumLines(long pNumLines)
This is a convenience method that can be used instead of setEndLine(long).
By default, the number of lines to load is 0 and the start and end line values are used instead.
When using this method, the end line is set from the number of lines specified.
For example, if the start line is 1 (the default), and the number of lines (pNumLines) is 3,
then 3 and only 3 lines are loaded, (lines 1,2 and 3) and the end line is 3.
Thus, calling this method has the same effect as calling setEndLine with pStartLine+pNumLines-1.
Note: subsequent calls to setStartLine(long) will automatically adjust the end line so that the
specified number of lines are loaded. But a call to setEndLine will override this behaviour and reset the
number of lines to load to 0.
pNumLines - number of linessetEndLine(long),
setStartLine(long)public void setIgnoreBadLines(boolean pIgnoreBadLines)
false).
When true any lines that cannot be parsed are ignored and
processing continues. You can access the list of failed lines with the
CsvManager.getBadLines() method, and get the number of bad lines
with the CsvManager.getBadLineCount() method.
When false, processing is terminated if a bad line
is encountered, and a CsvManagerException
with code bad_line is thrown. The method CsvManagerException.getBadLine()
returns the offending line.
pIgnoreBadLines - ignore bad lines setting
CsvManagerExceptionsetIgnoreBadLines(boolean),
CsvManager.getBadLines(),
CsvManager.getBadLineCount(),
CsvManagerExceptionpublic boolean getIgnoreBadLines()
setIgnoreBadLines(boolean)public void setIgnoreEmptyLines(boolean pIgnoreEmptyLines)
false).
When true any lines that contain no data fields are
ignored. No data structures with empty data fields are created.
When false any lines that contain no data fields are
recognised. A data structure with the default number of empty data fields is created.
You can set the default number of data fields using the setNumFields(int) method.
Note that the default number of data fields may be larger than initially set if data lines
are encountered that contain more than the default number of data fields. In this case the
default number of fields is permanently increased to handle these larger lines.
pIgnoreEmptyLines - ignore empty lines settinggetIgnoreEmptyLines(),
setNumFields(int)public boolean getIgnoreEmptyLines()
setIgnoreEmptyLines(boolean)public void setTrim(String pTrim)
Unwanted whitespace or other characters can be trimmed from
the start and end of a data field. Use the setTrimType(com.ricebridge.csvman.TrimType)
method to control the trimming operation.
Note: trim characters may not be that same as separator characters.
Both this method and the setSeparator(java.lang.String) method remove any separator
characters from the trim characters.
public String getTrim()
setTrim(java.lang.String)public void setTrimType(TrimType pTrimType)
The trim type is set using an instance of the value object TrimType.
This setting controls the trimming of whitespace or other characters from the
start and end of the data field. The following trim types are available:
pTrimType - trim type settinggetTrimType(),
setTrim(java.lang.String)public TrimType getTrimType()
setTrimType(com.ricebridge.csvman.TrimType)public void setCloseInputStream(boolean pCloseInputStream)
InputStream after reading the CSV data (default: true).
When true, the InputStream attached to
the original source of the data (such as a file) is closed when all of the
CSV data has been read
When false, the InputStream is not closed.
pCloseInputStream - close input stream settingsetCloseOutputStream(boolean)public boolean getCloseInputStream()
setCloseInputStream(boolean)public void setCloseOutputStream(boolean pCloseOutputStream)
OutputStream after saving the CSV data (default: true).
When true, the OutputStream to which the CSV data is written
is closed after all of the data is written.
When false, the OutputStream is not closed.
pCloseOutputStream - close output stream settingsetCloseInputStream(boolean)public boolean getCloseOutputStream()
setCloseOutputStream(boolean)public void setFlushEachLine(boolean pFlushEachLine)
OutputStream after each line of CSV data (default: false).
When true, the OutputStream to which the CSV data is written
is flushed after each line of data.
When false, the OutputStream is not flushed.
pFlushEachLine - flush each line settinggetFlushEachLine()public boolean getFlushEachLine()
setFlushEachLine(boolean)public void setVerbatimEndOfLine(boolean pVerbatimEndOfLine)
false).
When true, the end-of-line characters, as set with setEndOfLine(java.lang.String)
are output verbatim at the end of each line, exactly as specified.
When
false, the end-of-line character string is converted to the correct platform
specific end-of-line character sequence. This only applies to the case where the end-of-line characters are
specified as \r\n. In all other cases, the end-of-line characters are output exactly as specified.
For the special case \r\n, if the runtime platform is an Apple Mac, then \r is output,
if the output platform is UNIX (or non-Windows) then \n is output, and if the output platform is Windows,
then \r\n is output.
pVerbatimEndOfLine - verbatim end-of-line settinggetVerbatimEndOfLine()
public boolean getVerbatimEndOfLine()
setVerbatimEndOfLine(boolean)public void setEncoding(String pEncoding)
System.getProperty("file.encoding")).
CSV files can be encoded in a number of different formats. Normally, the default platform format is assumed, but you may have to handle CSV files containing international characters and symbols. In this case, you need to specify the character encoding of the file.
The pEncoding parameter is the same as the parameter to
String.getBytes().
The most common case you will encounter will probably be loading or saving CSV data as Unicode,
in either UTF-8 or UTF-16 format. For these cases, simply use setEncoding("UTF8") or
setEncoding("UTF-16"). Note that the encoding String is the code used by the java.io
package, as defined on the list of
Java supported encodings.
pEncoding - encoding settinggetEncoding()public String getEncoding()
setEncoding(java.lang.String)public void setCollectBadLines(boolean pCollectBadLines)
BadLine objects for later inspection (default: true).
When true, BadLine objects created during parsing are stored
internally by the CsvManager object, and can be obtained from the CsvManager.getBadLines() method.
When false, BadLine objects are not stored, although they are still counted in the statistics.
Set this setting to false when you expect very large numbers of bad lines in the CSV file, in order to
avoid running out of memory. In this case, you are better off saving the bad lines directly to a log file using a custom
LineListener.
pCollectBadLines - collect bad lines settinggetCollectBadLines(),
setIgnoreBadLines(boolean)public boolean getCollectBadLines()
setCollectBadLines(boolean)public void setDataFieldMaxLength(int pDataFieldMaxLength)
If you are processing very large files that may contain very long data fields, you can use this setting
as a safety limit to avoid running out of memory. When an individual data field is too long, a
CsvManagerException is thrown and processing halts (unless setIgnoreBadLines(boolean) is true).
If you do not wish to limit the length of data fields, set this setting to zero. Data fields will then be as long as your memory allocation allows. This is the default setting.
WARNING: a non-zero value may lead to data loss, as any characters after the length limit are not saved. You are therefore advised to use a large enough value for this setting, based on some multiple of the expected size of your data, and your memory constraints.
This setting also sets the maximum length of the original line recorded for BadLine error objects.
When the number of fields is set before parsing with setNumFields(int), then the number of fields plus one is multiplied by
this setting, otherwise the value of this setting is used (there is a large internal minimum however,
so this only happens with large values.)
pDataFieldMaxLength - data field maximum length settinggetDataFieldMaxLength()public int getDataFieldMaxLength()
setDataFieldMaxLength(int)public void setAllowQuotedLineEnds(boolean pAllowQuotedLineEnds)
true).
Most variations of the CSV format allow you to insert newline characters (we call them line end characters here to cover non-standard line separators) verbatim inside quoted fields. For well-formed CSV files this is not a problem, but if you expect syntax errors then these quoted newlines can become a serious issue as they cause data corruption.
If you set this setting to false, then line end characters are not allowed inside quoted fields.
This means that the common syntax error where there is no closing quote is captured. In the normal case, the following line
is swallowed by the opening quote as the parser assumes that it is still part of the last data field of the preceding line,
and will not stop until another quote is found.
When to use this setting: when you have to support quoted fields that contain commas, that should never contain newline characters.
WARNING: if you use this setting with a normal CSV file that contains syntax errors, then quoted fields containing line end characters will cause the next line of data to be corrupt, as it will start in the middle of the bad line, directly following the disallowed line end character. There is no way to programmatically detect the correct end of the line, so you will have to check the data yourself. Do not use this setting without writing code to deal with this case.
pAllowQuotedLineEnds - allow quoted line ends settinggetAllowQuotedLineEnds()public boolean getAllowQuotedLineEnds()
setAllowQuotedLineEnds(boolean)public void setUseComment(boolean pUseComment)
false).
By enabling this option, you can ignore lines in the CSV file
that start with a # character. You can also change this character with
the setComment method.
Example:
# header comment # this line and line above are considered empty real,data,here
When the # character occurs part way through the line, you can also treat it
as a comment by enabling the setCommentWithinLine setting.
IMPORTANT: commented lines are returned as empty lines by
default. To completely ignore commented lines, you should also set
setIgnoreEmptyLines to true.
pUseComment - use comments settingsetComment(char),
setCommentWithinLine(boolean)public boolean getUseComment()
setUseComment(boolean)public void setComment(char pComment)
#).
In order to use comments, you will need to activate them using the
setUseComment method.
pComment - comment charactersetUseComment(boolean)public char getComment()
setComment(char),
setUseComment(boolean)public void setCommentWithinLine(boolean pCommentWithinLine)
false).
By enabling this option, you can ignore text in the CSV file that follows a # character that occurs part way through the line. Everything after the # is ignored. However, if the # is inside a quoted field, then it is treated as a normal data character, and not as the start of a comment.
Example:
this,data,returned # this data ignored
To use this setting, the setUseComment,
must also be true/code>.
pCommentWithinLine - comment within line settingsetUseComment(boolean)public boolean getCommentWithinLine()
setCommentWithinLine(boolean)public String toString()
toString in class PropSpecpublic void validate()
A CsvManagerException describing the problem is thrown if settings are inconsistent.
|
|||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||||