com.ricebridge.csvman
Class CsvSpec

java.lang.Object
  extended bycom.ricebridge.util.PropSpec
      extended bycom.ricebridge.csvman.CsvSpec
All Implemented Interfaces:
Serializable

public class CsvSpec
extends PropSpec
implements Serializable

Stores settings for reading and writing CSV files.

This class describes the format of a CSV file and is used by the CSV parser to control the interpretation of the CSV data. The default settings are for the Microsoft Excel CSV format.

Predefined settings for various uses are available from the CsvManager class. For example, CsvManager.makeUnixSpec() returns a CsvSpec object that understands UNIX style backslash escapes (\n for newline and so on).

To use a predefined CsvSpec or to specify your own, use the CsvManager.setCsvSpec(com.ricebridge.csvman.CsvSpec) method.

CsvSpec objects allow you to control the interpretation of your CSV file in a number of ways:

When outputting CSV data, the following options are also provided:

This class is designed to use sensible defaults where necessary. This means that setting an option may alter other options so that the settings remain consistent. The most common use-cases are used as the basis for deciding which settings have precedence. The method documentation for each setting explains any side effects.

Convenience methods are available in the CsvManager class so that it is not always necessary to use the CsvSpec class. Each CsvManager instance contains a CsvSpec object with the default Excel settings.

The toString() method provides a human-readable description of the settings and the validate() method can be called to validate the current settings by checking their consistency. A CsvManagerException is thrown if the settings are not consistent.

See Also:
CsvManager, CsvManagerException, Serialized Form

Constructor Summary
CsvSpec()
          Create a new CsvSpec object with Excel defaults.
 
Method Summary
 boolean getAllowQuotedLineEnds()
          Get the allow quoted line ends setting.
 boolean getCloseInputStream()
          Get the close input stream setting.
 boolean getCloseOutputStream()
          Get the close output stream setting.
 boolean getCollectBadLines()
          Get the collect bad lines setting.
 char getComment()
          Get the character used to start comments.
 boolean getCommentWithinLine()
          Get the use comments setting.
 int getDataFieldMaxLength()
          Get the data field maximum length setting.
 boolean getDoubleQuote()
          Get the value of the double quote setting.
 String getEncoding()
          Get the encoding setting.
 long getEndLine()
          Get the end line (the last line to load) value.
 String getEndOfLine()
          Get the end-of-line characters.
 char getEscape()
          Get the escape character.
 HashMap getEscapeMap()
          Get a copy of the current escape mappings.
 boolean getFlushEachLine()
          Get the flush each line setting.
 boolean getIgnoreBadLines()
          Get the status of the ignore bad lines setting.
 boolean getIgnoreEmptyLines()
          Get the status of the ignore empty lines setting.
 boolean getMergeSeparators()
          Get the status of the merge separators setting.
 int getNumFields()
          Get the expected number of data fields per line.
 char getQuote()
          Get the quote character.
 QuoteType getQuoteType()
          Get the quote type.
 String getSeparator()
          Get the separator characters.
 long getStartLine()
          Get the start line (the first line to load) value.
 String getTrim()
          Get the trim characters.
 TrimType getTrimType()
          Get the trim type setting.
 boolean getUseComment()
          Get the use comments setting.
 boolean getUseEscape()
          Get the value of the use escape setting.
 boolean getUseEscapeMap()
          Get the value of the use escape map setting.
 boolean getUseQuote()
          Get the value of the use quote setting.
 boolean getVerbatimEndOfLine()
          Get the verbatim end-of-line setting.
 void setAllowQuotedLineEnds(boolean pAllowQuotedLineEnds)
          Allow line end characters inside quoted fields (default: true).
 void setCloseInputStream(boolean pCloseInputStream)
          Close the InputStream after reading the CSV data (default: true).
 void setCloseOutputStream(boolean pCloseOutputStream)
          Close the OutputStream after saving the CSV data (default: true).
 void setCollectBadLines(boolean pCollectBadLines)
          Collect BadLine objects for later inspection (default: true).
 void setComment(char pComment)
          Set the character to use at the start of comments (default: #).
 void setCommentWithinLine(boolean pCommentWithinLine)
          Allow comments within lines.
 void setDataFieldMaxLength(int pDataFieldMaxLength)
          Set the maximum allowed length in characters of a data field, or set to zero for no limit (default: 0).
 void setDoubleQuote(boolean pDoubleQuote)
          Enable double quotes to escape a quote character (default: true).
 void setEncoding(String pEncoding)
          Specify the character encoding to use for input and output (default: System.getProperty("file.encoding")).
 void setEndLine(long pEndLine)
          Set the end line (the last line to load) value.
 void setEndOfLine(String pEndOfLine)
          Set the end-of-line characters (default: \r\n).
 void setEscape(char pEscape)
          Set the escape character (default: \ (back slash)).
 void setEscapeMap(HashMap pEscapeMap)
          Set the mapping of escaped characters to other characters.
 void setFlushEachLine(boolean pFlushEachLine)
          Flush the OutputStream after each line of CSV data (default: false).
 void setIgnoreBadLines(boolean pIgnoreBadLines)
          Ignore lines with syntax errors (default: false).
 void setIgnoreEmptyLines(boolean pIgnoreEmptyLines)
          Ignore lines that have no data (default: false).
 void setMergeSeparators(boolean pMergeSeparators)
          Treat adjacent separator characters as one (default false).
 void setNumFields(int pNumFields)
          Set the expected number of data fields per line.
 void setNumLines(long pNumLines)
          Set the number of lines to load.
 void setQuote(char pQuote)
          Set quote character (default: " (double quote)).
 void setQuoteType(QuoteType pQuoteType)
          This setting controls the use of quotes when saving a data file (default: AsNeeded).
 void setSeparator(String pSeparator)
          Set the separator characters (default: , (comma)).
 void setStartLine(long pStartLine)
          Set the start line (the first line to load) value.
 void setTrim(String pTrim)
          Set the trim characters (default: space and tab).
 void setTrimType(TrimType pTrimType)
          Set the trim type (default: Full; both start and end of data field)..
 void setUseComment(boolean pUseComment)
          Ignore lines that are comments.
 void setUseEscape(boolean pUseEscape)
          Enable escaping of quote characters in data fields (default: false).
 void setUseEscapeMap(boolean pUseEscapeMap)
          Enable use of the escape mappings (default: false).
 void setUseQuote(boolean pUseQuote)
          Enable quoting of data fields (default: true).
 void setVerbatimEndOfLine(boolean pVerbatimEndOfLine)
          Output the end-of-line characters exactly as specified, without platform-specific conversions (default: false).
 String toString()
          Generate a human-friendly description of the current settings in a properties style format.
 void validate()
          Check that settings are consistent.
 
Methods inherited from class com.ricebridge.util.PropSpec
copyPropSpec, getBooleanProperty, getProperty, setProperty, setProperty
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

CsvSpec

public CsvSpec()
Create a new CsvSpec object with Excel defaults.

Method Detail

setSeparator

public void setSeparator(String pSeparator)
Set the separator characters (default: , (comma)).

The separator character separates data fields on a single line. You can set more than one separator character - just put them all into the the pSeparator argument. Each separator character is then recognised as a data field separator. For example ,;\t will separate on comma, semi-colon and tab.

By default, each individual separator character defines a new data field, so that repeated separators, like ,, will create two data fields. Use setMergeSeparators(boolean) to change this behaviour and create only one data field.

Note: trim characters may not be the same as separator characters. Separator characters will be automatically removed from the set of trim characters, both when calling this method and setTrim(java.lang.String)

Parameters:
pSeparator - separator characters
See Also:
getSeparator(), setTrim(java.lang.String)

getSeparator

public String getSeparator()
Get the separator characters.

See Also:
setSeparator(java.lang.String)

setQuote

public void setQuote(char pQuote)
Set quote character (default: " (double quote)).

The quote character defines the start and end of single data fields. Quoting data fields is usually optional, but data fields must be quoted when they contain the separator character or end of line characters as part of their value.

Quotes can be escaped by using two quote characters together. This method of escaping the quote character is enabled by default, but you can disable it by calling setDoubleQuote(false).

Alternatively, the quote character can be escaped by preceding it with the escape character (default: \ (back slash)). Use setEscape(char) to set the escape character. The escape character may not be the same as the quote character. This method of escaping is not enabled by default, but a call to either setEscape or setUseEscape(boolean) will enable it.

You can also disable quoting completely. In this case data files are assumed to contain data which does not require escaping, that is, data which does not contain the separator or end-of-line characters.

When saving data, you can specify how data fields are to be quoted by using setQuoteType(com.ricebridge.csvman.QuoteType). The quote type setting is not used when loading data files.

Note: setting a quote character will also set the use quote setting to true. The escape character cannot be the same as the quote character and separator characters may not be quote characters.

Parameters:
pQuote - quote character
See Also:
getQuote(), setUseQuote(boolean), setDoubleQuote(boolean), setQuoteType(com.ricebridge.csvman.QuoteType), setEscape(char)

getQuote

public char getQuote()
Get the quote character.

See Also:
setQuote(char)

setUseQuote

public void setUseQuote(boolean pUseQuote)
Enable quoting of data fields (default: true).

When true the data file may contain quoted data fields.

When false the data file is assumed not to require quoted data fields. In this case the quote character is treated as a normal character, and no data fields should contain separator characters, unless they are escaped.

Parameters:
pUseQuote - use quote setting
See Also:
setQuote(char), getUseQuote(), setUseEscape(boolean), setDoubleQuote(boolean), setUseEscapeMap(boolean)

getUseQuote

public boolean getUseQuote()
Get the value of the use quote setting.

See Also:
setUseQuote(boolean)

setDoubleQuote

public void setDoubleQuote(boolean pDoubleQuote)
Enable double quotes to escape a quote character (default: true).

When true the quote character can be escaped by preceding it with another quote character, so that two quote characters are interpreted as one quote character. Thus "" becomes " for example.

When false two adjacent quote characters inside a quoted field will trigger an error in loading the data file. Normally you would use setUseEscape(true) or setEscape(char) when disabling the double quote setting, so that quotes can be escaped by the escape character instead.

Parameters:
pDoubleQuote - double quote setting
See Also:
setQuote(char), getDoubleQuote(), setUseEscape(boolean), setEscape(char)

getDoubleQuote

public boolean getDoubleQuote()
Get the value of the double quote setting.

See Also:
getDoubleQuote()

setQuoteType

public void setQuoteType(QuoteType pQuoteType)
This setting controls the use of quotes when saving a data file (default: AsNeeded).

See the QuoteType class for a description of the quoting types available. The default is AsNeeded; quotes are only placed around data fields that contain quote or separator characters.

Parameters:
pQuoteType - quote type
See Also:
QuoteType, getQuoteType()

getQuoteType

public QuoteType getQuoteType()
Get the quote type.

See Also:
setQuoteType(com.ricebridge.csvman.QuoteType)

setEscape

public void setEscape(char pEscape)
Set the escape character (default: \ (back slash)).

Escaping is not active by default, see setQuote.

The escape character escapes quote characters inside quoted data fields. When a data field contains a quote character as part of its value then this quote character can be escaped by placing the escape character immediately before it. For example, \" becomes ".

The escape character can also be used to escape other characters. In fact the default action for the escape character is simply to allow the following character to be literally part of the data field. Thus separator characters can also be escaped, like so: \,, and the escape character itself can be escaped, like so: \\. This means that you do not necessarily need to use the quote character for fields that contain separator characters.

You can also define your own escape characters, so that the character following the escape character can be mapped into a different character. A default set of mapping is provided: \r maps to return, \n maps to newline, \t maps to tab, and \b maps to bell. To enable these mappings, call setUseEscapeMap(true).

To define your own escape map using setEscapeMap(java.util.HashMap), simply pass in a HashMap of Character to Character mappings.

Note: setting an escape character will also set use escape to true. The escape character cannot be the same as the quote character. Also, separator characters may not be escape characters.

Parameters:
pEscape - escape character
See Also:
getEscape(), setUseEscape(boolean), setUseEscapeMap(boolean), setEscapeMap(java.util.HashMap), setDoubleQuote(boolean)

getEscape

public char getEscape()
Get the escape character.

See Also:
setEscape(char)

setUseEscape

public void setUseEscape(boolean pUseEscape)
Enable escaping of quote characters in data fields (default: false).

When true any characters in a data field preceded by the escape character are inserted literally into the data field value.

When false the escape character has no effect on the data field value.

See setEscape(char) for more details.

Note: setting an escape character will also set the use escape setting to true.

Note: when set to true, the use escape map setting is also set to true.

Parameters:
pUseEscape - use escape setting
See Also:
setEscape(char), getUseEscape()

getUseEscape

public boolean getUseEscape()
Get the value of the use escape setting.

See Also:
setUseEscape(boolean)

setEscapeMap

public void setEscapeMap(HashMap pEscapeMap)
Set the mapping of escaped characters to other characters.

When the escape character is enabled with setUseEscape(boolean) or setEscape(char) then all characters following the escape characters are interpreted literally. For example, \t becomes t, and so on.

It is possible to specify alternative mappings for specific characters, so that \t can become tab instead, for example. This is achieved by providing a HashMap of Character to Character mappings. Any characters not in the map continue to be interpreted literally.

This feature allows you to handle UNIX style escape sequences in your data. When you call setUseEscapeMap(true), the following default sequences are provided:

  • \r maps to return
  • \n maps to newline
  • \t maps to tab
  • \b maps to bell

You can specify your own by using this method.

Note: calling this method sets use escape map to true.

Parameters:
pEscapeMap - mapping of escaped characters
See Also:
getEscapeMap(), setEscape(char), setUseEscape(boolean)

getEscapeMap

public HashMap getEscapeMap()
Get a copy of the current escape mappings.

See Also:
setEscapeMap(java.util.HashMap)

setUseEscapeMap

public void setUseEscapeMap(boolean pUseEscapeMap)
Enable use of the escape mappings (default: false).

When true escape mappings are performed on data fields.

When false escape mappings are not performed on data fields.

Note: when set to true, the use escape setting is also set to true.

Parameters:
pUseEscapeMap - use escape map setting
See Also:
setEscapeMap(java.util.HashMap), getUseEscapeMap()

getUseEscapeMap

public boolean getUseEscapeMap()
Get the value of the use escape map setting.

See Also:
setUseEscape(boolean)

setEndOfLine

public void setEndOfLine(String pEndOfLine)
Set the end-of-line characters (default: \r\n).

The end-of-line of characters mark the end of a data line. In order to enable multiple end-of-line markers for various platforms, the end-of-line characters are interpreted as follows: any combination and any number of non-repeating adjacent end-of-line characters mark the end of a line. As soon as any one end-of-line character is repeated, this is interpreted as a new line.

This means that with the default the set of characters (\r\n), Windows (\r\n), UNIX (\n), and Mac (\r) line ends are all recognised.

You can set your own end-of-line characters for special cases

When data is output, the end-of-line setting is placed verbatim at the end of each line. This means that by default \r\n is output at the end of each line. Use the CsvManager.makeUnixSpec() and CsvManager.makeMacSpec() factory methods to get CsvSpec objects with appropriate line feeds for your platform, or simple set your own: setEndOfLine("\n");, for example.

Parameters:
pEndOfLine - end-of-line character sequence
See Also:
getEndOfLine()

getEndOfLine

public String getEndOfLine()
Get the end-of-line characters.

See Also:
setEndOfLine(java.lang.String)

setMergeSeparators

public void setMergeSeparators(boolean pMergeSeparators)
Treat adjacent separator characters as one (default false).

When true, separator characters that are next to each other are merged into one separator character. Thus ,, will only create one data field.

When false, separator characters that are next to each other are remain separate, so that ,, will create two data fields.

This setting operates on the entire list of separator characters so that ,; will also only create one data field if the separators are set to ,;.

Parameters:
pMergeSeparators - set true to merge separators
See Also:
setSeparator(java.lang.String), getMergeSeparators()

getMergeSeparators

public boolean getMergeSeparators()
Get the status of the merge separators setting.

See Also:
setMergeSeparators(boolean)

setNumFields

public void setNumFields(int pNumFields)
Set the expected number of data fields per line.

When the number of data fields is less than the expected number, the remaining fields are returned empty. When the number of data fields is greater than the expected number, the extra data fields are appended to the list of data fields, so that no data is lost. In this case the expected number of data fields is increased to the greater number found, and remains so for the rest of the loading of the file.

If the number of data fields is zero (the default), the expected number is set to to the number of fields found on the first line. If longer lines are subsequently encountered, the number of fields is set to the longest line found so far.

Empty lines are returned with the expected number of empty data fields, not zero data fields.

Parameters:
pNumFields - number of expected data fields
See Also:
getNumFields(), setIgnoreEmptyLines(boolean)

getNumFields

public int getNumFields()
Get the expected number of data fields per line.

See Also:
setNumFields(int)

setStartLine

public void setStartLine(long pStartLine)
Set the start line (the first line to load) value.

By default, the start line is set to 1, the first line. The start line value starts from 1, not 0, and identifies the line from which to start loading data.

For example, if the CSV file contains a header line with field names, and you wish to ignore this line, the set the start line value to 2, which means that data will be loaded from the second line.

Use the setEndLine(long) method to stop loading data at a certain line. Use the setNumLines(long) method to load an exact number of lines.

The start line setting also affects saved data. When you save, data lines will only be written from the start line.

Note: if you have previously called setNumLines then the end line is adjusted to account for this. See setNumLines(long) for more detail.

Parameters:
pStartLine - start line value (from 1, not 0)
See Also:
setEndLine(long), setNumLines(long)

getStartLine

public long getStartLine()
Get the start line (the first line to load) value.

See Also:
setStartLine(long)

setEndLine

public void setEndLine(long pEndLine)
Set the end line (the last line to load) value.

By default the end line is set to -1. This means that there is no end line and all lines will be loaded. You can reset the end line to -1 for this default behaviour.

To restrict the number of lines to load, use this method to specify the last line to load. For example, in a CSV file containing 10 lines, you can specify the end line as 5 and then only lines 1 to 5 will be loaded.

The end line value is inclusive. This means that setting an end line value of, say, 10, means that the end line will be the tenth line found. The end value, (and also the start value) starts from 1, not 0. This also means that the number of lines read is equal to end line - start line + 1, that is, if the start line is 1 and the end line is 3, then lines 1, 2 and 3 are read: 3 - 1 + 1 = 3.

The start line can also be specified with setStartLine(long), so that only subset of the lines are loaded.

Instead of setEndLine you can also use setNumLines(long). This is just a convenience method to set the end line relative to the start line.

The end line setting also affects saved data. When you save, data lines will only be written until the end line (inclusive).

Note: calling this method will reset the number of lines to load setting to 0, see setNumLines(long) for more details.

Parameters:
pEndLine - end line value (inclusive)

getEndLine

public long getEndLine()
Get the end line (the last line to load) value.

See Also:
setEndLine(long)

setNumLines

public void setNumLines(long pNumLines)
Set the number of lines to load.

This is a convenience method that can be used instead of setEndLine(long). By default, the number of lines to load is 0 and the start and end line values are used instead. When using this method, the end line is set from the number of lines specified. For example, if the start line is 1 (the default), and the number of lines (pNumLines) is 3, then 3 and only 3 lines are loaded, (lines 1,2 and 3) and the end line is 3. Thus, calling this method has the same effect as calling setEndLine with pStartLine+pNumLines-1.

Note: subsequent calls to setStartLine(long) will automatically adjust the end line so that the specified number of lines are loaded. But a call to setEndLine will override this behaviour and reset the number of lines to load to 0.

Parameters:
pNumLines - number of lines
See Also:
setEndLine(long), setStartLine(long)

setIgnoreBadLines

public void setIgnoreBadLines(boolean pIgnoreBadLines)
Ignore lines with syntax errors (default: false).

When true any lines that cannot be parsed are ignored and processing continues. You can access the list of failed lines with the CsvManager.getBadLines() method, and get the number of bad lines with the CsvManager.getBadLineCount() method.

When false, processing is terminated if a bad line is encountered, and a CsvManagerException with code bad_line is thrown. The method CsvManagerException.getBadLine() returns the offending line.

Parameters:
pIgnoreBadLines - ignore bad lines setting
Throws:
CsvManagerException
See Also:
setIgnoreBadLines(boolean), CsvManager.getBadLines(), CsvManager.getBadLineCount(), CsvManagerException

getIgnoreBadLines

public boolean getIgnoreBadLines()
Get the status of the ignore bad lines setting.

See Also:
setIgnoreBadLines(boolean)

setIgnoreEmptyLines

public void setIgnoreEmptyLines(boolean pIgnoreEmptyLines)
Ignore lines that have no data (default: false).

When true any lines that contain no data fields are ignored. No data structures with empty data fields are created.

When false any lines that contain no data fields are recognised. A data structure with the default number of empty data fields is created. You can set the default number of data fields using the setNumFields(int) method. Note that the default number of data fields may be larger than initially set if data lines are encountered that contain more than the default number of data fields. In this case the default number of fields is permanently increased to handle these larger lines.

Parameters:
pIgnoreEmptyLines - ignore empty lines setting
See Also:
getIgnoreEmptyLines(), setNumFields(int)

getIgnoreEmptyLines

public boolean getIgnoreEmptyLines()
Get the status of the ignore empty lines setting.

See Also:
setIgnoreEmptyLines(boolean)

setTrim

public void setTrim(String pTrim)
Set the trim characters (default: space and tab).

Unwanted whitespace or other characters can be trimmed from the start and end of a data field. Use the setTrimType(com.ricebridge.csvman.TrimType) method to control the trimming operation.

Note: trim characters may not be that same as separator characters. Both this method and the setSeparator(java.lang.String) method remove any separator characters from the trim characters.


getTrim

public String getTrim()
Get the trim characters.

See Also:
setTrim(java.lang.String)

setTrimType

public void setTrimType(TrimType pTrimType)
Set the trim type (default: Full; both start and end of data field)..

The trim type is set using an instance of the value object TrimType. This setting controls the trimming of whitespace or other characters from the start and end of the data field. The following trim types are available:

  • None - no trimming is done
  • Start - only the start of the data field is trimmed
  • End - only the end of the data field is trimmed
  • Full - both start and end of the data field are trimmed

Parameters:
pTrimType - trim type setting
See Also:
getTrimType(), setTrim(java.lang.String)

getTrimType

public TrimType getTrimType()
Get the trim type setting.

See Also:
setTrimType(com.ricebridge.csvman.TrimType)

setCloseInputStream

public void setCloseInputStream(boolean pCloseInputStream)
Close the InputStream after reading the CSV data (default: true).

When true, the InputStream attached to the original source of the data (such as a file) is closed when all of the CSV data has been read

When false, the InputStream is not closed.

Parameters:
pCloseInputStream - close input stream setting
See Also:
setCloseOutputStream(boolean)

getCloseInputStream

public boolean getCloseInputStream()
Get the close input stream setting.

See Also:
setCloseInputStream(boolean)

setCloseOutputStream

public void setCloseOutputStream(boolean pCloseOutputStream)
Close the OutputStream after saving the CSV data (default: true).

When true, the OutputStream to which the CSV data is written is closed after all of the data is written.

When false, the OutputStream is not closed.

Parameters:
pCloseOutputStream - close output stream setting
See Also:
setCloseInputStream(boolean)

getCloseOutputStream

public boolean getCloseOutputStream()
Get the close output stream setting.

See Also:
setCloseOutputStream(boolean)

setFlushEachLine

public void setFlushEachLine(boolean pFlushEachLine)
Flush the OutputStream after each line of CSV data (default: false).

When true, the OutputStream to which the CSV data is written is flushed after each line of data.

When false, the OutputStream is not flushed.

Parameters:
pFlushEachLine - flush each line setting
See Also:
getFlushEachLine()

getFlushEachLine

public boolean getFlushEachLine()
Get the flush each line setting.

See Also:
setFlushEachLine(boolean)

setVerbatimEndOfLine

public void setVerbatimEndOfLine(boolean pVerbatimEndOfLine)
Output the end-of-line characters exactly as specified, without platform-specific conversions (default: false).

When true, the end-of-line characters, as set with setEndOfLine(java.lang.String) are output verbatim at the end of each line, exactly as specified.

When false, the end-of-line character string is converted to the correct platform specific end-of-line character sequence. This only applies to the case where the end-of-line characters are specified as \r\n. In all other cases, the end-of-line characters are output exactly as specified. For the special case \r\n, if the runtime platform is an Apple Mac, then \r is output, if the output platform is UNIX (or non-Windows) then \n is output, and if the output platform is Windows, then \r\n is output.

Parameters:
pVerbatimEndOfLine - verbatim end-of-line setting
See Also:
getVerbatimEndOfLine()

getVerbatimEndOfLine

public boolean getVerbatimEndOfLine()
Get the verbatim end-of-line setting.

See Also:
setVerbatimEndOfLine(boolean)

setEncoding

public void setEncoding(String pEncoding)
Specify the character encoding to use for input and output (default: System.getProperty("file.encoding")).

CSV files can be encoded in a number of different formats. Normally, the default platform format is assumed, but you may have to handle CSV files containing international characters and symbols. In this case, you need to specify the character encoding of the file.

The pEncoding parameter is the same as the parameter to String.getBytes().

The subject of character encodings is beyond the scope of this documentation. However Sun maintains a web page on Java internationalization which you will find very useful.

The most common case you will encounter will probably be loading or saving CSV data as Unicode, in either UTF-8 or UTF-16 format. For these cases, simply use setEncoding("UTF8") or setEncoding("UTF-16"). Note that the encoding String is the code used by the java.io package, as defined on the list of Java supported encodings.

Parameters:
pEncoding - encoding setting
See Also:
getEncoding()

getEncoding

public String getEncoding()
Get the encoding setting.

See Also:
setEncoding(java.lang.String)

setCollectBadLines

public void setCollectBadLines(boolean pCollectBadLines)
Collect BadLine objects for later inspection (default: true).

When true, BadLine objects created during parsing are stored internally by the CsvManager object, and can be obtained from the CsvManager.getBadLines() method.

When false, BadLine objects are not stored, although they are still counted in the statistics.

Set this setting to false when you expect very large numbers of bad lines in the CSV file, in order to avoid running out of memory. In this case, you are better off saving the bad lines directly to a log file using a custom LineListener.

Parameters:
pCollectBadLines - collect bad lines setting
See Also:
getCollectBadLines(), setIgnoreBadLines(boolean)

getCollectBadLines

public boolean getCollectBadLines()
Get the collect bad lines setting.

See Also:
setCollectBadLines(boolean)

setDataFieldMaxLength

public void setDataFieldMaxLength(int pDataFieldMaxLength)
Set the maximum allowed length in characters of a data field, or set to zero for no limit (default: 0).

If you are processing very large files that may contain very long data fields, you can use this setting as a safety limit to avoid running out of memory. When an individual data field is too long, a CsvManagerException is thrown and processing halts (unless setIgnoreBadLines(boolean) is true).

If you do not wish to limit the length of data fields, set this setting to zero. Data fields will then be as long as your memory allocation allows. This is the default setting.

WARNING: a non-zero value may lead to data loss, as any characters after the length limit are not saved. You are therefore advised to use a large enough value for this setting, based on some multiple of the expected size of your data, and your memory constraints.

This setting also sets the maximum length of the original line recorded for BadLine error objects. When the number of fields is set before parsing with setNumFields(int), then the number of fields plus one is multiplied by this setting, otherwise the value of this setting is used (there is a large internal minimum however, so this only happens with large values.)

Parameters:
pDataFieldMaxLength - data field maximum length setting
See Also:
getDataFieldMaxLength()

getDataFieldMaxLength

public int getDataFieldMaxLength()
Get the data field maximum length setting.

See Also:
setDataFieldMaxLength(int)

setAllowQuotedLineEnds

public void setAllowQuotedLineEnds(boolean pAllowQuotedLineEnds)
Allow line end characters inside quoted fields (default: true).

Most variations of the CSV format allow you to insert newline characters (we call them line end characters here to cover non-standard line separators) verbatim inside quoted fields. For well-formed CSV files this is not a problem, but if you expect syntax errors then these quoted newlines can become a serious issue as they cause data corruption.

If you set this setting to false, then line end characters are not allowed inside quoted fields. This means that the common syntax error where there is no closing quote is captured. In the normal case, the following line is swallowed by the opening quote as the parser assumes that it is still part of the last data field of the preceding line, and will not stop until another quote is found.

When to use this setting: when you have to support quoted fields that contain commas, that should never contain newline characters.

WARNING: if you use this setting with a normal CSV file that contains syntax errors, then quoted fields containing line end characters will cause the next line of data to be corrupt, as it will start in the middle of the bad line, directly following the disallowed line end character. There is no way to programmatically detect the correct end of the line, so you will have to check the data yourself. Do not use this setting without writing code to deal with this case.

Parameters:
pAllowQuotedLineEnds - allow quoted line ends setting
See Also:
getAllowQuotedLineEnds()

getAllowQuotedLineEnds

public boolean getAllowQuotedLineEnds()
Get the allow quoted line ends setting.

See Also:
setAllowQuotedLineEnds(boolean)

setUseComment

public void setUseComment(boolean pUseComment)
Ignore lines that are comments. (default: false).

By enabling this option, you can ignore lines in the CSV file that start with a # character. You can also change this character with the setComment method.

Example:

  # header comment
  # this line and line above are considered empty
  real,data,here
  

When the # character occurs part way through the line, you can also treat it as a comment by enabling the setCommentWithinLine setting.

IMPORTANT: commented lines are returned as empty lines by default. To completely ignore commented lines, you should also set setIgnoreEmptyLines to true.

Parameters:
pUseComment - use comments setting
See Also:
setComment(char), setCommentWithinLine(boolean)

getUseComment

public boolean getUseComment()
Get the use comments setting.

See Also:
setUseComment(boolean)

setComment

public void setComment(char pComment)
Set the character to use at the start of comments (default: #).

In order to use comments, you will need to activate them using the setUseComment method.

Parameters:
pComment - comment character
See Also:
setUseComment(boolean)

getComment

public char getComment()
Get the character used to start comments.

See Also:
setComment(char), setUseComment(boolean)

setCommentWithinLine

public void setCommentWithinLine(boolean pCommentWithinLine)
Allow comments within lines. (default: false).

By enabling this option, you can ignore text in the CSV file that follows a # character that occurs part way through the line. Everything after the # is ignored. However, if the # is inside a quoted field, then it is treated as a normal data character, and not as the start of a comment.

Example:

  this,data,returned # this data ignored
  

To use this setting, the setUseComment, must also be true/code>.

Parameters:
pCommentWithinLine - comment within line setting
See Also:
setUseComment(boolean)

getCommentWithinLine

public boolean getCommentWithinLine()
Get the use comments setting.

See Also:
setCommentWithinLine(boolean)

toString

public String toString()
Generate a human-friendly description of the current settings in a properties style format.

Overrides:
toString in class PropSpec

validate

public void validate()
Check that settings are consistent.

A CsvManagerException describing the problem is thrown if settings are inconsistent.



Copyright © 2003-2006 Ricebridge