|
$15 Gift Certificate for every bug you find.

This IP address to country database is provided as a CSV file that can easily be read by CSV Manager.
Got a question for us?
Just Ask!
|
Uploading and Downloading CSV Files from a Website Database using CSV Manager
This article comes with the complete source code to implement a
CSV
upload and
download website database using CSV Manager. The code can easily be extended to handle more than
one table or custom data types.
A Fashion Disaster
Last week you finished Laura's new website. Laura runs an exclusive boutique
in a fashionable part of town. Now she can display and sell all of her fashion pieces
online.
That was last week. This week, she's got you on the phone...
Laura: "I'm getting rid of last season. I need to reduce all the prices on that stuff by 50%".
Oops. That discounts system you built only handles categories, not "seasons".
You: "Um..."
Laura: "And I need to put up the prices on the John Rochas, except for the jewellery, which is out..."
Your mind is maze of twistly little SQL queries, all different...
You: "Um..."
Laura: "... by Friday?"
Why can't she just do it herself, you think. Oh wait...
You: "Listen, what if we export the product list to a spreadsheet and you can update it there? Then you can
make all sorts of changes and upload them all when you're done."
Laura: "Great! I still need it by Friday though..."
What Exactly Are We Trying To Do Here?
We're letting people work in Excel. Sometimes it's just easier to edit
a big long list of product descriptions using a spreadsheet. Imagine
your traditional web shop administration area. You get a list of
products, click on one, and then edit that product. It gets pretty
tedious pretty quickly. Global changes to pricing, codes or
descriptions are basically impossible. That's where the spreadsheet
comes in. If you have a big long list of products, making these types
of changes is easy. Especially for the business managers who
need to make these changes. They may not be techies, but you better
believe that they know their way around Excel — their jobs depend on it.
What's the alternative? You're going to have to come up with an
editable table view of some kind, using dynamic HTML. You'll have to
go searching for a component or write it yourself. More work for
you. And it's still just a bad imitation of a spreadsheet at the end
of the day. Why not just export the data as a CSV file, and let Excel
do the heavy lifting?
You solve more than one problem this way. You can also deal with data
from external sources. So long as it's in a spreadsheet
or a database somewhere, you can get it into CSV format and you're away!
Otherwise you have to enter the data by hand, or write some sort of a
custom import function.
What we're trying to do is give people easy access to large volumes of
data. Spreadsheets are the way to do that. And CSV is the way to get
data into and out of spreadsheets.
The Practicalities
Let's look at the workflow in detail.
- Laura logs on to the administration section of her site. She pulls up the products page
and clicks on the export button.
- A "Save As" dialog window pops up, asking her to save a CSV file.
- She opens Excel, and imports the CSV data.
- Laura modifies the list as she sees fit. She makes many
individual changes to prices and descriptions. She also makes a few
global changes using some Excel formulas that she's put together
herself.
- She saves the data as a CSV file.
- She returns to the site, and uploads the CSV file. The product data in the database is updated immediately.
That's pretty much it. And it's much easier for the client. Of course,
we'll have to report errors and be careful not to corrupt data. If
you're fancy you could build in an undo feature — you get one
nearly for free as the exported CSV file contains the unmodified data.
There's another case we can also handle. Let's say Laura takes on a
new supplier. That supplier can provide her with a list of products in
a spreadsheet and she can then upload that direct to the database
instead of tediously entering in all the new products by hand.
Getting the Data Out of the Database
Let's start at the beginning. How do we get the data we want to export?
Let's go straight for the kill with the simplest SQL query there is:
We just grab all the columns from the table we want to export, in the order
they are defined in the database. This means that the column headers of the CSV file will
take the names of the database table columns, and also appear in the same order.
Where do we get the name of the database table from? Well, in our example
code we're just going to hard-code the name of the table into a member
variable called mTableName. In the real world you'll
probably want to grab it from some configuration parameter or let the user select a table.
One thing to avoid is to get the name directly from a HTML form parameter. You'll
expose yourself to a
SQL injection security hole in that case.
How Long is a Piece of String?
There's one other little detail. CSV is a text file format. We need to
convert all the data from the database table into text strings. To do this
we'll use a special converter object that takes in a
ResultSet
and returns a string array.
And don't forget we have the same problem in reverse when someone uploads
a CSV file. In that case we have to convert from text to a Java object.
If this all sounds like too much hard work, don't worry! CSV Manager comes with
a utility API that does the job for us. All the grisly details are covered in the
documentation of the
StringConverter
interface.
So how do we know which Java objects to convert to and from strings? We'll use
the
JDBC API
to find out the database table column types. Then we just match them
up with the right StringConverter object. We'll look at the details
later on.
Show Me the Code, Already!
CsvDownloader.javaCsvManager csvman = new CsvManager();
CsvSaver csvsvr
= csvman.makeSaver( outputStream );
PreparedStatement selectsql
= connection.prepareStatement(
"SELECT * FROM "+mTableName+";" );
ResultSet rs = null;
rs = selectsql.executeQuery();
Converter conv = new Converter(mTableName,rs);
csvsvr.begin();
// headers
csvsvr.next( conv.getColumnNames() );
// get each row and output it as a CSV line
while( rs.next() ) {
String[] data = conv.getData(rs);
csvsvr.next(data);
}
csvsvr.end();
See the Getting Started Guide
for more details on the CSV Manager API. There's also a page with more details on the
streaming interface used in this code sample.
Here's the code to pull the data out of the database table, and export it as a CSV file.
What's happening here? Well first we create a
CsvManager
object to actually
do the work of creating the CSV file. We'll be outputting the CSV directly to an
OutputStream.
We're going to use a
CsvSaver
object to handle the data. This is a very handy way of providing the data one row
at a time. We just pass in a string array for each row of data. This allows us to
"stream" the data out as a download, instead of creating the entire CSV file first.
This saves a lot of memory and processing power on our web server.
Then we just use some standard JDBC calls to execute a SELECT query on
our database table. You can also see from the code above how we use a
Converter utility object to create the string array for
each row from the query ResultSet.
One very important thing to notice is that we output the column names as the
first row of the CSV file, in the order they are used by the database. These names
form the header row of the CSV file and
will be used later on for uploading the changing data and for finding the correct
database table column to put each CSV data field into.
Some Real Data
Database Data
| SKU | Name | Price | Shown |
| 1001 | n1 | 150 | 1/3/07 |
| ... | ... | ... | ... |
CSV File
SKU,Name,Price,Shown
1111,Apple,10,1/3/07
2222,Pear,20,2/3/07
What sort of data is Laura, our client, going to be working with? Here's some of the data in the products
table of the database you built for her.
The SKU is a unique identifier for each
clothing item product — the "Stock Keeping Unit". The table also has the name of the product, the current price,
and the date that the product was first shown in the shop.
The fields shown here are just for example purposes. In the real world there would probably be a lot more. You can see,
however, how the CSV file just mirrors what is in the database.
Making it Happen
Now we need a servlet to actually make the download happen. Let's keep
our web application very simple — we'll just have an download page and an upload page.
Each page has a web form that submits to our servlet. The servlet looks for
a hidden parameter that tells it to do an upload or a download.
Downloading files from a website can be a bit tricky. We need to specify the
correct
HTTP headers
so that the user's web browser will pop up a "Save As..." dialog
window, rather than just displaying the CSV data. We also need to specify the file name
of the CSV file. The database table name seems the best choice for this, but you can always use
a different name.
Let's put our downloading code (from above) into a CsvDownloader object.
The servlet will provide a database connection to this object, and the HTTP output stream to write the CSV data to.
The servlet will then ask the CsvDownloader to commence the download. And that's that.
One more thing: to allow for user options (like using a semi-colon instead of a comma), we pass in a
Properties
object. CSV Manager has lots of options for handling different
kinds of CSV files and this way we can provide access to them.
Processing the Download Request
First we need a database connection. In the example code the connection is obtained
in classic JDBC style using a
DriverManager.
In your own application you'll probably use the more
modern DataSource approach.
We pass the database connection to our
CsvDownloader object so that
it can load the data from the database.
Next, we set up the
HttpServletResponse
object with some HTTP headers that will cause the user's browser to save our data as
a CSV file. These are really just magic incantations — there are plenty of web development
websites that explain them in excruciating detail.
For our purposes, we just plug and play.
Finally, we call the process method of our CsvDownloader.
This executes the exporting code we saw already. We pass in the output stream of the
HTTP response, and a properties object to control the CSV Manager settings. Using the
HTTP response output stream allows us to stream the CSV file down to the user. This is much
faster than creating the entire CSV file in memory first.
Here's the code:
UpDownServlet.java
Connection con = getConnection();
csvDownloader.setDatabaseContext( con, mTableName );
response.setContentType("text/csv");
response.setHeader( "Content-disposition", "attachment; filename="+mTableName+".csv");
csvDownloader.process( pResponse.getOutputStream(), properties);
And that's pretty much it for downloading. We'll tie it all together and
look at the whole application in a moment. But first: uploading!
So What About Uploading a CSV file?
This is a little bit more involved. The process we've imagined so
far is that Laura, our customer, has downloaded the CSV file, imported it
into Excel, made some changes, and then saved it again. Now she wants to
upload it to her website.
It's very important to remember at this point that the order of data fields in the
CSV file must match the columns that we exported.
This ensures that the right data goes into the right database table column. Of course,
you could extend the code to allow for a different field order, or missing fields, but we'll
assume for the moment that we need all the fields.
To make our lives easier, let's use a third-party component to handle the
file uploading. We'll use the venerable
FileUpload from
the Apache Commons project. We'll plug this into our servlet to handle
upload requests. And there are more magic incantations — we need to
use a special syntax for the HTML form element:
<form name="uploadform" action="updown" enctype="multipart/form-data" method="post">
The encoding type enctype attribute causes the browser to send the
uploaded file to our servlet in MIME format
(also used for email attachments).
The FileUpload component know how to handle this.
Some Practical Servlet Code
UpDownServlet.javaServletFileUpload upload
= new ServletFileUpload();
FileItemIterator itemI
= upload.getItemIterator( pRequest );
String action = null;
InputStream in = null;
Properties props = new Properties();
while( itemI.hasNext() ) {
FileItemStream item = itemI.next();
String name = item.getFieldName();
if( item.isFormField() ) {
String value
= Streams.asString( item.openStream() );
if( "action".equals(name) ) {
action = value;
}
else {
props.setProperty( name, value );
}
}
else {
in = item.openStream();
break;
}
}
if( "upload".equals(action) ) {
doUpload( in, props, pRequest, pResponse );
}
else if( "download".equals(action) ) {
doDownload( props, pRequest, pResponse );
}
This servlet code is the start point of any file upload or download.
Since we're only using one servlet, you might be wondering how it can handle both uploads and
downloads. Well, here's the code.
What's going on here? First, we use the FileUpload API to create
a
FileItemIterator so that
we can stream through the items submitted to us. Each form field and each uploaded file is
a separate item.
We're looking for a hidden form field called action that tells us to upload or download. With the rest of the form fields, we
just put them into a Properties object, which we'll pass on to the CsvDownloader.
If a file is being uploaded, then eventually we come to an item that is
not a form field, and we open an input stream so we can read the
contents of the file. After handling the input fields, we then either
upload or download depending on the action, passing the
appropriate objects onwards.
There's an important "gotcha" to be aware of here. The FileUpload component deals with the form fields in the order
they appear in the HTML code. It also deals with them as a stream. That is, you have to read them when you get them, you can't
go back and look at a previous value. This ensures the highest performance when uploading files, as they then don't have to be
stored in memory, and you can work with the data as you get it, rather than waiting for the whole file to upload.
So what's the "gotcha"? Well, if the file upload form field (<input type="file" ... >) appears in the
HTML code before any other form fields, you will never see the value of those form fields. Since the CSV file has to be dealt with
first, we immediately have to stop and go perform the upload. That's why there's a break statement just after we get the
file input stream. So we never get a change to read the rest of the form fields.
This isn't usually a problem — you can use CSS to layout your HTML anyway you want, while still ensuring that the
file field is the last one before the closing form tag. Still, be careful out there!
Onwards and Upwards
Ok. Let's look at the upload code:
UpDownServlet.java
Connection con = getConnection();
csvUploader.setDatabaseContext(con, mTableName, mIdentityColumn);
csvUploader.process(pIn, pProperties);
pResponse.sendRedirect("upload.htm");
It looks pretty much the same as the download code, except that we use a CsvUploader object (we'll
look at that in the next section)
We also use another variable, mIdentityColumn. This is the name of the database table primary key column.
This is how we will identity which row of data we are updating. Again, for security reasons, this variable is
hard-coded in our example.
The CsvUploader works in the same was as the CsvDownloader. It has a process method
that does all the hard work for us. Once the file has been uploaded and processed, the only thing left to do is to redirect the
user back to the upload HTML page, and display some statistics about the upload.
Processing the Upload
CsvUploader.javamNumLines = 0;
CsvSpec csvspec = makeCsvSpec( properties );
CsvManager csvman = new CsvManager( csvspec );
CsvLoader csvldr = csvman.makeLoader( inputStream );
String[] colnames = null;
csvldr.begin();
if( csvldr.hasNext() ) {
colnames = csvldr.next();
}
int idcol = -1;
for( int colI = 0; colI < colnames.length; colI++ ) {
if( mIdentityColumn.equals(colnames[colI]) ) {
idcol = colI;
break;
}
}
Converter conv
= new Converter( colnames, mTableName,
mIdentityColumn, connection );
ResultSet rs = null;
PreparedStatement selectsql
= makeSelect( mTableName,
mIdentityColumn, connection );
PreparedStatement insertsql
= makeInsert( colnames,
mTableName, mIdentityColumn, connection );
PreparedStatement updatesql
= makeUpdate( colnames,
mTableName, mIdentityColumn, connection );
while( csvldr.hasNext() ) {
String[] data = csvldr.next();
conv.setData( 1, idcol, data[idcol], selectsql );
rs = selectsql.executeQuery();
boolean exists = rs.next();
rs.close();
if( exists ) {
conv.setData( data, updatesql );
conv.setData( data.length+1, idcol,
data[idcol], updatesql );
updatesql.executeUpdate();
}
else {
conv.setData( data, insertsql );
insertsql.executeUpdate();
}
}
csvldr.end();
mNumLines = csvman.getLineCount();
Here's the code to process the uploaded CSV file. It's a bit more involved than the downloading code we saw first.
We need a few more SQL statements because we need to decide whether a data row already exists or not. If it does, we
update, if it does not, we insert.
Notice that we also use a mNumLines member variable, so that we report back the number of lines we found in the CSV file after
we've finished processing it.
Let's step through this code. First we create the CsvManager object that will process the upload.
We use the properties from the HTML form to create a
CsvSpec
object that controls how the CSV file is parsed — our CSV file might use semi-colons instead of colons, for example (the source code of the makeCsvSpec method is available
in the download).
We create a
CsvLoader object to do the actual loading. Like
CsvSaver, this is a streaming interface. We can read one line of data from it at a time. This means we can
start inserting rows into the database as we get them, without waiting for the whole CSV file to load. And this means that
we can handle really big CSV files just as easily as small ones.
The first thing we have to with the CSV that we are reading in is get the column headers. We assume
that these are in the first line of the file. The CsvLoader interface returns each row as a String array
so accessing the CSV data fields is quite easy. We search through these fields looking for the identity column.
That way we'll know which data field to use as the identity value when we are looking for existing fields.
Once we've found the identity column, we store its column index in the idcol variable.
Next we create a Converter to handle our CSV file. The converter will use these settings to set the
correct parameters for the PreparedStatements that we will use for the update. Speaking of which...
Preparing Statements
You can see from the upload processing code that we use three
PreparedStatements;
one to see if a record already exists;
one to update the record if it does; and one to insert a record if it is new. The SQL for these statements is constructed by three
utility methods. We'll look at one of them quickly, just to give you a feel for the code.
CsvUploader.javapublic PreparedStatement makeUpdate( String[] pColNames, String pTableName,
String pIdentityColumn, Connection pConnection )
throws Exception
{
StringBuffer sql = new StringBuffer( "UPDATE "+pTableName+" SET ");
for( int colI = 0; colI < pColNames.length; colI++ ) {
sql.append( 0 == colI ? "":", " );
sql.append( pColNames[colI]+" = ?" );
}
sql.append(" WHERE "+pIdentityColumn+" = ?;");
return pConnection.prepareStatement( sql.toString() );
}
We use the
question mark syntax
to mark the location of the data field values. The PreparedStatement
will insert the data correctly for us, so that we don't have to worry
about escaping apostrophes.
For the UPDATE statement we loop through each column and add it to the SET clause. We use
the identity column to identify the row that we want to update.
The Upload Loop
Let's go back to the main upload loop. Here it is again:
CsvUploader.javawhile( csvldr.hasNext() ) {
String[] data = csvldr.next();
conv.setData( 1, idcol, data[idcol], selectsql );
rs = selectsql.executeQuery();
boolean exists = rs.next();
rs.close();
if( exists ) {
conv.setData( data, updatesql );
conv.setData( data.length+1, idcol, data[idcol], updatesql );
updatesql.executeUpdate();
}
else {
conv.setData( data, insertsql );
insertsql.executeUpdate();
}
}
csvldr.end();
mNumLines = csvman.getLineCount();
We keep asking the CsvLoader if there are any more lines in the CSV file. For each line,
we get the data fields from the line as the data string array.
Then we check to see if this record already exists in the database. We use the Converter to
set the value of the identity field for this row in the SELECT query. If there is at least one
row, then we know that the record already exists. In that case, we use an UPDATE query to update
the database row with the new data from the CSV file. If it does not exist, we INSERT the data
into a new database table row.
Finally, when all the lines of the CSV file have been read, we let the CsvLoader know that we're done by calling
its end method.
We also record the final total of lines in the CSV file so that we can report it to the user.
Putting It All Together
Let's look at the user interface we're going to ask Laura, our shop owner, to use. There are
two pages, one for downloads and one for uploads.
The download page is pretty straightforward. All that we specify is the field separator. By default this is
a comma. The separator is passed as a form field to our servlet and placed into the properties object
that is used to create the CsvSpec.
The upload page is pretty much the same, except that we also have a file input dialog for the CSV file to upload. When the
file has finished uploading, we also show the number of lines found in the CSV file.
Handling CSV Format Variations
We handle variations in the CSV format by creating a set of properties from the HTML form fields, and using these
properties to create a
CsvSpec
object, which tells CSV Manager what sort of
CSV format to expect.
Here's the code that sets up the CsvSpec object:
CsvSpecHandler.javaCsvSpec csvspec = new CsvSpec();
csvspec.setIgnoreEmptyLines(true);
String sep
= pProperties.getProperty("Separator");
if( null != sep && !"".equals(sep) ) {
csvspec.setSeparator(sep);
}
The first thing is that we want to ignore blank lines. These often occur in CSV files at the end of the
file or between the column header line and the data. Ignoring them just prevents spurious data from entering the
database.
Then we look at the properties object that contains the HTML form fields. We just look for certain predefined
properties and use these to set the corresponding setting on the CsvSpec object. The example code only handles
the field separator setting, but you can easily add more settings.
Handling String Conversions
Converting the string values in a CSV file into proper data, and converting back proper data from the
database into string values for CSV data fields is the trickiest part of this example. Luckily, CSV Manager makes
this easy. CSV Manager includes a set of utility classes for exactly this purpose. These
StringConverter
utility classes can convert all the common Java data types into and out of strings, so you don't have to worry about doing it yourself.
Here's part of the Converter object code, so you can see how StringConverters are used:
Converter.java
if( Types.INTEGER == pColType
|| Types.TINYINT == pColType
|| Types.SMALLINT == pColType
|| Types.BIGINT == pColType
|| Types.BIT == pColType ) {
mSetMethod[pColIndex]
= psc.getMethod("setInt", new Class[]{Integer.TYPE,Integer.TYPE});
mGetMethod[pColIndex]
= rs.getMethod("getInt", new Class[]{Integer.TYPE});
mStringConverter[pColIndex] = new IntegerStringConverter();
}
This code analyses the database table column types before loading begins, and sets up the
correct get and set methods to call on the ResultSet and PreparedStatement objects. It also
creates the correct StringConverter utility object for each field.
In the code above, if the database table column data type, as provided by the
ResultSetMetaData
object,
is an integer of some sort, then we use the
IntegerStringConverter
class to do the conversion to and
from ints and Strings. You can also see how we set up references to the appropriate set and get
methods for ints.
What if you want to convert custom data? Here's how to write a custom StringConverter that converts between
java.sql.Date
objects and
strings
SqlDateStringConverter.java
public class SqlDateStringConverter extends DefaultStringConverter {
private DateFormat mFormat;
public SqlDateStringConverter( DateFormat pFormat ) {
mFormat = pFormat;
}
protected Object makeObjectImpl( String pString ) throws Exception {
return new java.sql.Date( mFormat.parse( pString ).getTime() );
}
protected String makeStringImpl( Object pObject ) {
return mFormat.format( new Date( ((java.sql.Date)pObject).getTime() ) );
}
}
As you can see, we just have to provide the methods that turn a String into
an Object, and an Object into a String. We use Java's
DateFormat
class to do the
actual conversion work. You can apply this model to any sort of string representation of data.
A Happy Camper
Laura: "So did you make those pricing changes?"
You: "Actually, I've set it up so that you can do them in Excel, whenever you need to."
Laura: "Really?"
You: "Oh yes, You just download a spreadsheet from the site, and make whatever changes you
need to make to the prices, all at the same time, and then upload them again, and they're live."
Laura: "Great! Can you pop round and show me how later this week? Oh, and I think I have a check for you..."
The Source Code
The source code to fully implement all the functionality described in
this article is made freely available under a BSD-style
license. This means that you can freely use it for commercial or
private use, without any restrictions.
All the required files are included in this zip file:
updown.zip.
| |