Ricebridge
Search This Site
Sep 02 2010 23:03 UTC


$15 Gift Certificate for every bug you find.

MaxMind GeoIP
This IP address to country database is provided as a CSV file that can easily be read by CSV Manager.

Got a question for us?
Just Ask!

Bookmark Uploading and Downloading CSV Files from a Website using CSV Manager at del.icio.us Digg Uploading and Downloading CSV Files from a Website using CSV Manager at Digg.com Bookmark Uploading and Downloading CSV Files from a Website using CSV Manager at reddit.com Bookmark Uploading and Downloading CSV Files from a Website using CSV Manager at YahooMyWeb Bookmark Uploading and Downloading CSV Files from a Website using CSV Manager at Spurl.net Bookmark Uploading and Downloading CSV Files from a Website using CSV Manager at Simpy.com Bookmark Polyphasic Mutants at NewsVine Blink this Uploading and Downloading CSV Files from a Website using CSV Manager at blinklist.com Bookmark Uploading and Downloading CSV Files from a Website using CSV Manager at Furl.net Fark Uploading and Downloading CSV Files from a Website using CSV Manager at Fark.com

Uploading and Downloading CSV Files from a Website Database using CSV Manager

This article comes with the complete source code to implement a CSV upload and download website database using CSV Manager. The code can easily be extended to handle more than one table or custom data types.

A Fashion Disaster

MaxMind CSV Data

Last week you finished Laura's new website. Laura runs an exclusive boutique in a fashionable part of town. Now she can display and sell all of her fashion pieces online.

That was last week. This week, she's got you on the phone...

Laura: "I'm getting rid of last season. I need to reduce all the prices on that stuff by 50%".

Oops. That discounts system you built only handles categories, not "seasons".

You: "Um..."

Laura: "And I need to put up the prices on the John Rochas, except for the jewellery, which is out..."

Your mind is maze of twistly little SQL queries, all different...

You: "Um..."

Laura: "... by Friday?"

Why can't she just do it herself, you think. Oh wait...

You: "Listen, what if we export the product list to a spreadsheet and you can update it there? Then you can make all sorts of changes and upload them all when you're done."

Laura: "Great! I still need it by Friday though..."

What Exactly Are We Trying To Do Here?

Stock

We're letting people work in Excel. Sometimes it's just easier to edit a big long list of product descriptions using a spreadsheet. Imagine your traditional web shop administration area. You get a list of products, click on one, and then edit that product. It gets pretty tedious pretty quickly. Global changes to pricing, codes or descriptions are basically impossible. That's where the spreadsheet comes in. If you have a big long list of products, making these types of changes is easy. Especially for the business managers who need to make these changes. They may not be techies, but you better believe that they know their way around Excel — their jobs depend on it.

What's the alternative? You're going to have to come up with an editable table view of some kind, using dynamic HTML. You'll have to go searching for a component or write it yourself. More work for you. And it's still just a bad imitation of a spreadsheet at the end of the day. Why not just export the data as a CSV file, and let Excel do the heavy lifting?

You solve more than one problem this way. You can also deal with data from external sources. So long as it's in a spreadsheet or a database somewhere, you can get it into CSV format and you're away! Otherwise you have to enter the data by hand, or write some sort of a custom import function.

What we're trying to do is give people easy access to large volumes of data. Spreadsheets are the way to do that. And CSV is the way to get data into and out of spreadsheets.

The Practicalities

CSV Data Flow

Let's look at the workflow in detail.

  • Laura logs on to the administration section of her site. She pulls up the products page and clicks on the export button.
  • A "Save As" dialog window pops up, asking her to save a CSV file.
  • She opens Excel, and imports the CSV data.
  • Laura modifies the list as she sees fit. She makes many individual changes to prices and descriptions. She also makes a few global changes using some Excel formulas that she's put together herself.
  • She saves the data as a CSV file.
  • She returns to the site, and uploads the CSV file. The product data in the database is updated immediately.

That's pretty much it. And it's much easier for the client. Of course, we'll have to report errors and be careful not to corrupt data. If you're fancy you could build in an undo feature — you get one nearly for free as the exported CSV file contains the unmodified data.

There's another case we can also handle. Let's say Laura takes on a new supplier. That supplier can provide her with a list of products in a spreadsheet and she can then upload that direct to the database instead of tediously entering in all the new products by hand.

Getting the Data Out of the Database

Let's start at the beginning. How do we get the data we want to export? Let's go straight for the kill with the simplest SQL query there is:

SELECT * FROM TableName

We just grab all the columns from the table we want to export, in the order they are defined in the database. This means that the column headers of the CSV file will take the names of the database table columns, and also appear in the same order.

Where do we get the name of the database table from? Well, in our example code we're just going to hard-code the name of the table into a member variable called mTableName. In the real world you'll probably want to grab it from some configuration parameter or let the user select a table.

One thing to avoid is to get the name directly from a HTML form parameter. You'll expose yourself to a SQL injection security hole in that case.

How Long is a Piece of String?

There's one other little detail. CSV is a text file format. We need to convert all the data from the database table into text strings. To do this we'll use a special converter object that takes in a ResultSet and returns a string array.

And don't forget we have the same problem in reverse when someone uploads a CSV file. In that case we have to convert from text to a Java object.

If this all sounds like too much hard work, don't worry! CSV Manager comes with a utility API that does the job for us. All the grisly details are covered in the documentation of the StringConverter interface.

So how do we know which Java objects to convert to and from strings? We'll use the JDBC API to find out the database table column types. Then we just match them up with the right StringConverter object. We'll look at the details later on.

Show Me the Code, Already!

CsvDownloader.java
CsvManager csvman  = new CsvManager();
CsvSaver    csvsvr  
  = csvman.makeSaver( outputStream );

PreparedStatement selectsql 
  = connection.prepareStatement( 
    "SELECT * FROM "+mTableName+";" );
ResultSet rs = null;

rs = selectsql.executeQuery();
Converter conv = new Converter(mTableName,rs);
  
csvsvr.begin();

// headers
csvsvr.next( conv.getColumnNames() );

// get each row and output it as a CSV line
while( rs.next() ) {
  String[] data = conv.getData(rs);
  csvsvr.next(data);
}

csvsvr.end();

See the Getting Started Guide for more details on the CSV Manager API. There's also a page with more details on the streaming interface used in this code sample.

Here's the code to pull the data out of the database table, and export it as a CSV file.

What's happening here? Well first we create a CsvManager object to actually do the work of creating the CSV file. We'll be outputting the CSV directly to an OutputStream.

We're going to use a CsvSaver object to handle the data. This is a very handy way of providing the data one row at a time. We just pass in a string array for each row of data. This allows us to "stream" the data out as a download, instead of creating the entire CSV file first. This saves a lot of memory and processing power on our web server.

Then we just use some standard JDBC calls to execute a SELECT query on our database table. You can also see from the code above how we use a Converter utility object to create the string array for each row from the query ResultSet.

One very important thing to notice is that we output the column names as the first row of the CSV file, in the order they are used by the database. These names form the header row of the CSV file and will be used later on for uploading the changing data and for finding the correct database table column to put each CSV data field into.

Some Real Data

Database Data

SKUNamePriceShown
1001n11501/3/07
............

CSV File

SKU,Name,Price,Shown
1111,Apple,10,1/3/07
2222,Pear,20,2/3/07

What sort of data is Laura, our client, going to be working with? Here's some of the data in the products table of the database you built for her. The SKU is a unique identifier for each clothing item product — the "Stock Keeping Unit". The table also has the name of the product, the current price, and the date that the product was first shown in the shop.

The fields shown here are just for example purposes. In the real world there would probably be a lot more. You can see, however, how the CSV file just mirrors what is in the database.

Making it Happen

Bags

Now we need a servlet to actually make the download happen. Let's keep our web application very simple — we'll just have an download page and an upload page. Each page has a web form that submits to our servlet. The servlet looks for a hidden parameter that tells it to do an upload or a download.

Downloading files from a website can be a bit tricky. We need to specify the correct HTTP headers so that the user's web browser will pop up a "Save As..." dialog window, rather than just displaying the CSV data. We also need to specify the file name of the CSV file. The database table name seems the best choice for this, but you can always use a different name.

Let's put our downloading code (from above) into a CsvDownloader object. The servlet will provide a database connection to this object, and the HTTP output stream to write the CSV data to. The servlet will then ask the CsvDownloader to commence the download. And that's that.

One more thing: to allow for user options (like using a semi-colon instead of a comma), we pass in a Properties object. CSV Manager has lots of options for handling different kinds of CSV files and this way we can provide access to them.

Processing the Download Request

First we need a database connection. In the example code the connection is obtained in classic JDBC style using a DriverManager. In your own application you'll probably use the more modern DataSource approach. We pass the database connection to our CsvDownloader object so that it can load the data from the database.

Next, we set up the HttpServletResponse object with some HTTP headers that will cause the user's browser to save our data as a CSV file. These are really just magic incantations — there are plenty of web development websites that explain them in excruciating detail. For our purposes, we just plug and play.

Finally, we call the process method of our CsvDownloader. This executes the exporting code we saw already. We pass in the output stream of the HTTP response, and a properties object to control the CSV Manager settings. Using the HTTP response output stream allows us to stream the CSV file down to the user. This is much faster than creating the entire CSV file in memory first.

Here's the code:

UpDownServlet.java

Connection con = getConnection();
csvDownloader.setDatabaseContext( con, mTableName );

response.setContentType("text/csv");
response.setHeader( "Content-disposition", "attachment; filename="+mTableName+".csv");

csvDownloader.process( pResponse.getOutputStream(), properties);

And that's pretty much it for downloading. We'll tie it all together and look at the whole application in a moment. But first: uploading!

So What About Uploading a CSV file?

This is a little bit more involved. The process we've imagined so far is that Laura, our customer, has downloaded the CSV file, imported it into Excel, made some changes, and then saved it again. Now she wants to upload it to her website.

It's very important to remember at this point that the order of data fields in the CSV file must match the columns that we exported. This ensures that the right data goes into the right database table column. Of course, you could extend the code to allow for a different field order, or missing fields, but we'll assume for the moment that we need all the fields.

To make our lives easier, let's use a third-party component to handle the file uploading. We'll use the venerable FileUpload from the Apache Commons project. We'll plug this into our servlet to handle upload requests. And there are more magic incantations — we need to use a special syntax for the HTML form element:

<form name="uploadform" action="updown" enctype="multipart/form-data" method="post">

The encoding type enctype attribute causes the browser to send the uploaded file to our servlet in MIME format (also used for email attachments). The FileUpload component know how to handle this.

Some Practical Servlet Code

UpDownServlet.java
ServletFileUpload upload 
  = new ServletFileUpload();

FileItemIterator  itemI  
  = upload.getItemIterator( pRequest );

String            action = null;
InputStream       in     = null;
Properties        props  = new Properties();

while( itemI.hasNext() ) {
  FileItemStream item = itemI.next();
  String name = item.getFieldName();
  
  if( item.isFormField() ) {
    String value 
      = Streams.asString( item.openStream() );

    if( "action".equals(name) ) {
      action = value;
    }
    else {
      props.setProperty( name, value );
    }
  } 
  else {
    in = item.openStream();
    break;
  }
}


if( "upload".equals(action) ) {
  doUpload( in, props, pRequest, pResponse );
}
else if( "download".equals(action) ) {
  doDownload( props, pRequest, pResponse );
}

This servlet code is the start point of any file upload or download.

Since we're only using one servlet, you might be wondering how it can handle both uploads and downloads. Well, here's the code.

What's going on here? First, we use the FileUpload API to create a FileItemIterator so that we can stream through the items submitted to us. Each form field and each uploaded file is a separate item.

We're looking for a hidden form field called action that tells us to upload or download. With the rest of the form fields, we just put them into a Properties object, which we'll pass on to the CsvDownloader.

If a file is being uploaded, then eventually we come to an item that is not a form field, and we open an input stream so we can read the contents of the file. After handling the input fields, we then either upload or download depending on the action, passing the appropriate objects onwards.

There's an important "gotcha" to be aware of here. The FileUpload component deals with the form fields in the order they appear in the HTML code. It also deals with them as a stream. That is, you have to read them when you get them, you can't go back and look at a previous value. This ensures the highest performance when uploading files, as they then don't have to be stored in memory, and you can work with the data as you get it, rather than waiting for the whole file to upload.

So what's the "gotcha"? Well, if the file upload form field (<input type="file" ... >) appears in the HTML code before any other form fields, you will never see the value of those form fields. Since the CSV file has to be dealt with first, we immediately have to stop and go perform the upload. That's why there's a break statement just after we get the file input stream. So we never get a change to read the rest of the form fields.

This isn't usually a problem — you can use CSS to layout your HTML anyway you want, while still ensuring that the file field is the last one before the closing form tag. Still, be careful out there!

Onwards and Upwards

Ok. Let's look at the upload code:

UpDownServlet.java

Connection con = getConnection();
csvUploader.setDatabaseContext(con, mTableName, mIdentityColumn);

csvUploader.process(pIn, pProperties);

pResponse.sendRedirect("upload.htm");

It looks pretty much the same as the download code, except that we use a CsvUploader object (we'll look at that in the next section)

We also use another variable, mIdentityColumn. This is the name of the database table primary key column. This is how we will identity which row of data we are updating. Again, for security reasons, this variable is hard-coded in our example.

The CsvUploader works in the same was as the CsvDownloader. It has a process method that does all the hard work for us. Once the file has been uploaded and processed, the only thing left to do is to redirect the user back to the upload HTML page, and display some statistics about the upload.

Processing the Upload

CsvUploader.java
mNumLines = 0;

CsvSpec    csvspec = makeCsvSpec( properties );
CsvManager csvman  = new CsvManager( csvspec );
CsvLoader  csvldr  = csvman.makeLoader( inputStream );

String[] colnames = null;
csvldr.begin();

if( csvldr.hasNext() ) {
  colnames = csvldr.next();
}

int idcol = -1;
for( int colI = 0; colI < colnames.length; colI++ ) {
  if( mIdentityColumn.equals(colnames[colI]) ) {
    idcol = colI;
    break;
  }
} 

Converter conv 
  = new Converter( colnames, mTableName, 
      mIdentityColumn, connection );

ResultSet rs = null;
PreparedStatement selectsql 
  = makeSelect( mTableName, 
      mIdentityColumn, connection );
PreparedStatement insertsql 
  = makeInsert( colnames, 
      mTableName, mIdentityColumn, connection );
PreparedStatement updatesql 
  = makeUpdate( colnames, 
      mTableName, mIdentityColumn, connection );

while( csvldr.hasNext() ) {
  String[] data = csvldr.next();

  conv.setData( 1, idcol, data[idcol], selectsql );
  rs = selectsql.executeQuery();
  
  boolean exists = rs.next();
  rs.close();
  
  if( exists ) {
    conv.setData( data, updatesql );
    conv.setData( data.length+1, idcol, 
                  data[idcol], updatesql );
    updatesql.executeUpdate();
  }
  else {
    conv.setData( data, insertsql );
    insertsql.executeUpdate();
  }
}

csvldr.end();
mNumLines = csvman.getLineCount();

Here's the code to process the uploaded CSV file. It's a bit more involved than the downloading code we saw first. We need a few more SQL statements because we need to decide whether a data row already exists or not. If it does, we update, if it does not, we insert.

Notice that we also use a mNumLines member variable, so that we report back the number of lines we found in the CSV file after we've finished processing it.

Let's step through this code. First we create the CsvManager object that will process the upload. We use the properties from the HTML form to create a CsvSpec object that controls how the CSV file is parsed — our CSV file might use semi-colons instead of colons, for example (the source code of the makeCsvSpec method is available in the download). We create a CsvLoader object to do the actual loading. Like CsvSaver, this is a streaming interface. We can read one line of data from it at a time. This means we can start inserting rows into the database as we get them, without waiting for the whole CSV file to load. And this means that we can handle really big CSV files just as easily as small ones.

The first thing we have to with the CSV that we are reading in is get the column headers. We assume that these are in the first line of the file. The CsvLoader interface returns each row as a String array so accessing the CSV data fields is quite easy. We search through these fields looking for the identity column. That way we'll know which data field to use as the identity value when we are looking for existing fields. Once we've found the identity column, we store its column index in the idcol variable.

Next we create a Converter to handle our CSV file. The converter will use these settings to set the correct parameters for the PreparedStatements that we will use for the update. Speaking of which...

Preparing Statements

You can see from the upload processing code that we use three PreparedStatements; one to see if a record already exists; one to update the record if it does; and one to insert a record if it is new. The SQL for these statements is constructed by three utility methods. We'll look at one of them quickly, just to give you a feel for the code.

CsvUploader.java
public PreparedStatement makeUpdate( String[] pColNames, String pTableName, 
  String pIdentityColumn, Connection pConnection ) 
throws Exception 
{
  StringBuffer sql = new StringBuffer( "UPDATE "+pTableName+" SET ");
  for( int colI = 0; colI < pColNames.length; colI++ ) {
    sql.append( 0 == colI ? "":", " );
    sql.append( pColNames[colI]+" = ?" );
  }
  sql.append(" WHERE "+pIdentityColumn+" = ?;");
  return pConnection.prepareStatement( sql.toString() );
}

We use the question mark syntax to mark the location of the data field values. The PreparedStatement will insert the data correctly for us, so that we don't have to worry about escaping apostrophes.

For the UPDATE statement we loop through each column and add it to the SET clause. We use the identity column to identify the row that we want to update.

The Upload Loop

Let's go back to the main upload loop. Here it is again:

CsvUploader.java
while( csvldr.hasNext() ) {
  String[] data = csvldr.next();

  conv.setData( 1, idcol, data[idcol], selectsql );
  rs = selectsql.executeQuery();
  
  boolean exists = rs.next();
  rs.close();
  
  if( exists ) {
    conv.setData( data, updatesql );
    conv.setData( data.length+1, idcol, data[idcol], updatesql );
    updatesql.executeUpdate();
  }
  else {
    conv.setData( data, insertsql );
    insertsql.executeUpdate();
  }
}

csvldr.end();
mNumLines = csvman.getLineCount();

We keep asking the CsvLoader if there are any more lines in the CSV file. For each line, we get the data fields from the line as the data string array.

Then we check to see if this record already exists in the database. We use the Converter to set the value of the identity field for this row in the SELECT query. If there is at least one row, then we know that the record already exists. In that case, we use an UPDATE query to update the database row with the new data from the CSV file. If it does not exist, we INSERT the data into a new database table row.

Finally, when all the lines of the CSV file have been read, we let the CsvLoader know that we're done by calling its end method. We also record the final total of lines in the CSV file so that we can report it to the user.

Putting It All Together

Let's look at the user interface we're going to ask Laura, our shop owner, to use. There are two pages, one for downloads and one for uploads.

The download page is pretty straightforward. All that we specify is the field separator. By default this is a comma. The separator is passed as a form field to our servlet and placed into the properties object that is used to create the CsvSpec.

The upload page is pretty much the same, except that we also have a file input dialog for the CSV file to upload. When the file has finished uploading, we also show the number of lines found in the CSV file.

Handling CSV Format Variations

Shoes

We handle variations in the CSV format by creating a set of properties from the HTML form fields, and using these properties to create a CsvSpec object, which tells CSV Manager what sort of CSV format to expect. Here's the code that sets up the CsvSpec object:

CsvSpecHandler.java
CsvSpec csvspec = new CsvSpec();

csvspec.setIgnoreEmptyLines(true);

String sep 
  = pProperties.getProperty("Separator");

if( null != sep && !"".equals(sep) ) {
  csvspec.setSeparator(sep);
}

The first thing is that we want to ignore blank lines. These often occur in CSV files at the end of the file or between the column header line and the data. Ignoring them just prevents spurious data from entering the database.

Then we look at the properties object that contains the HTML form fields. We just look for certain predefined properties and use these to set the corresponding setting on the CsvSpec object. The example code only handles the field separator setting, but you can easily add more settings.

Handling String Conversions

Converting the string values in a CSV file into proper data, and converting back proper data from the database into string values for CSV data fields is the trickiest part of this example. Luckily, CSV Manager makes this easy. CSV Manager includes a set of utility classes for exactly this purpose. These StringConverter utility classes can convert all the common Java data types into and out of strings, so you don't have to worry about doing it yourself.

Here's part of the Converter object code, so you can see how StringConverters are used:

Converter.java

if( Types.INTEGER == pColType 
    || Types.TINYINT == pColType 
    || Types.SMALLINT == pColType 
    || Types.BIGINT == pColType
    || Types.BIT == pColType ) {
  mSetMethod[pColIndex] 
    = psc.getMethod("setInt", new Class[]{Integer.TYPE,Integer.TYPE});
  mGetMethod[pColIndex] 
    = rs.getMethod("getInt", new Class[]{Integer.TYPE});
  mStringConverter[pColIndex] = new IntegerStringConverter();
}

This code analyses the database table column types before loading begins, and sets up the correct get and set methods to call on the ResultSet and PreparedStatement objects. It also creates the correct StringConverter utility object for each field.

In the code above, if the database table column data type, as provided by the ResultSetMetaData object, is an integer of some sort, then we use the IntegerStringConverter class to do the conversion to and from ints and Strings. You can also see how we set up references to the appropriate set and get methods for ints.

What if you want to convert custom data? Here's how to write a custom StringConverter that converts between java.sql.Date objects and strings

SqlDateStringConverter.java

public class SqlDateStringConverter extends DefaultStringConverter {
  private DateFormat mFormat;

  public SqlDateStringConverter( DateFormat pFormat ) {
    mFormat = pFormat;
  } 
  protected Object makeObjectImpl( String pString ) throws Exception { 
    return new java.sql.Date( mFormat.parse( pString ).getTime() );
  }
  protected String makeStringImpl( Object pObject ) {
    return mFormat.format( new Date( ((java.sql.Date)pObject).getTime() ) );
  }
}

As you can see, we just have to provide the methods that turn a String into an Object, and an Object into a String. We use Java's DateFormat class to do the actual conversion work. You can apply this model to any sort of string representation of data.

A Happy Camper

Laura

Laura: "So did you make those pricing changes?"

You: "Actually, I've set it up so that you can do them in Excel, whenever you need to."

Laura: "Really?"

You: "Oh yes, You just download a spreadsheet from the site, and make whatever changes you need to make to the prices, all at the same time, and then upload them again, and they're live."

Laura: "Great! Can you pop round and show me how later this week? Oh, and I think I have a check for you..."

The Source Code

The source code to fully implement all the functionality described in this article is made freely available under a BSD-style license. This means that you can freely use it for commercial or private use, without any restrictions.

All the required files are included in this zip file: updown.zip.

Comments and Suggestions

Contact Us

Thanks for taking the time to read this article. We hope you've found something of use here. We'd love to hear what you think, and we'd definitely like to hear about anything that we got wrong, so please send us your comments and suggestions.

Finally, if you'd like to get an email whenever we publish a new article, sign up here:

Name:

Email:



Best Regards,
Richard Rodger
Ricebridge

PS. If you liked this article, please bookmark it. Thanks!
Bookmark Uploading and Downloading CSV Files from a Website using CSV Manager at del.icio.us Digg Uploading and Downloading CSV Files from a Website using CSV Manager at Digg.com Bookmark Uploading and Downloading CSV Files from a Website using CSV Manager at reddit.com Bookmark Uploading and Downloading CSV Files from a Website using CSV Manager at YahooMyWeb Bookmark Uploading and Downloading CSV Files from a Website using CSV Manager at Spurl.net Bookmark Uploading and Downloading CSV Files from a Website using CSV Manager at Simpy.com Bookmark Polyphasic Mutants at NewsVine Blink this Uploading and Downloading CSV Files from a Website using CSV Manager at blinklist.com Bookmark Uploading and Downloading CSV Files from a Website using CSV Manager at Furl.net Fark Uploading and Downloading CSV Files from a Website using CSV Manager at Fark.com




comment on this page Home | Search | About Us | Contact Us | Our Products | Documentation | Resources | Login
Copyright © 2004-2010 Ricebridge. All Rights Reserved.