Ricebridge
Search This Site
Jul 09 2008 03:54 UTC

XML Manager

Standard Edition

• Try the Demo!
• Download the FREE Trial!
• Product Home Page
• Getting Started Guide
• How You Can Save Money
• All Your Purchase Options

Database Example


Summary

Learn how to get CSV data into your database, and back out again:

A Note on Compiling and Running

To compile and run the Java programs in this example, either import them into your IDE (say, Eclipse or NetBeans), or follow the standard manual compilation instructions. When running the programs, run them in the same folder as this README.htm file, so that they can find their input files.

The Data

We will use a small CSV file as our initial example data, before we look at working with large files. This file contains a list of products, with data fields for name, price and code.

products.csv

id,code,name,price
1,A01,Apple,99
2,O01,Orange,149
3,O02,Satsuma,219
4,B01,Banana,29
5,P01,Pear,129

We want to be able to insert this data into a database table, and to extract it from that table. The data contains string fields and number fields, so we'll have to make sure to store those properly in the database.

First, let's create our database table. In this example we are assuming that we only have to load and save data into one table. For your own projects, you will have to modify the code in this example to fit your own database. So here is our table:

product-table.sql

CREATE TABLE Product ( 
id    integer, 
code  varchar(20), 
name  varchar(255), 
price integer 
);

The Direct Approach

The easiest way to handle CSV data with CSV Manager is just to load up all the data at once into memory and then use the data as need. Let's look at loading the data into the database using this approach.

Here's the code:


public void doLoadFull() throws Exception {
  Connection        con = getConnection();
  PreparedStatement ps  
    = con.prepareStatement( "INSERT INTO Product VALUES (?,?,?,?)" ); 

  CsvManager csvman = new CsvManager();
  List data = csvman.load( "products.csv" );    
  
  for( int line = 1; line < data.size(); line++ ) {
    String[] fields = (String[]) data.get(line);
    insertRow( fields, ps );
  }    

  con.close();
}


public void insertRow( String[] pFields, 
                       PreparedStatement pPS ) 
  throws Exception 
{
  pPS.setInt(    1, Integer.parseInt( pFields[0] ) );
  pPS.setString( 2, pFields[1] );
  pPS.setString( 3, pFields[2] );
  pPS.setInt(    4, Integer.parseInt( pFields[3] ) );
  pPS.executeUpdate();
}

(The full code for this example is shown in the ImportExport.java file)

So what's happening here? The doLoadFull method is the main one. First, we get a database connection to the database where we'll be INSERTing the data (the code for the getConnection method is not shown, but you can see it in the ImportExport.java file).

Second, we load up all the data at once using the CsvManager.load method. Now we have all the data lines from the CSV file in an ArrayList as String[] arrays. The last thing to do is to loop throug this list, and insert each line into the database. The insertRow method converts the data for each row into the right type of data for the database table. Notice that we start at line 1, not line 0. Line 0 contains the data field header names, and we don't want to put those in the database.

This method of loading data from a CSV file into a database is simple, quick and effective. But there is one snag. What if you have lots of data? What if you are trying to insert a really big gigabyte-sized file into the database? This method won't work as you'll run out of RAM and get a dreaded java.lang.OutOfMemoryError. We'll look at the answer in a moment — let's look at how to save data from a database first.

Saving data using the direct approach is pretty much the same as loading data. Here's the code:


public static void doSaveFull() throws Exception {
  Connection        con = getConnection();
  PreparedStatement ps  
    = con.prepareStatement( "SELECT * FROM Product" ); 

  CsvManager csvman = new CsvManager();
  csvman.getCsvSpec().setProperty("ResultSet.saveHeaders", true);
  ResultSet rs = ps.executeQuery();

  csvman.saveResultSet( "products-export.csv", rs );

  con.close();
}

This looks even easier than loading — just call CsvManager.saveResultSet and CSV Manager does all the work for you. You don't even need a for loop this time. Notice that we're using the special property ResultSet.saveHeaders to output the data field headers as the first line of the CSV file.

The Event-based Approach

So what about those really big files. Well CSV Manager provides two ways of dealing with them. One is event-based and one is stream-based. Let's look at the event-based approach first.

The idea with the event-based approach is that you implement an interface, and CSV Manager calls the methods of that interface whenever a line of CSV data is loaded. That way you can handle lines as they are loaded, rather than waiting until the end when they are all in memory. Using this method, you can load a line of data, immediately save it to the database, and free up the memory for the next line. There is no limit to the amount of data you can process in this manner. (There may be a limit to how long you'll wait for all that data, but don't worry CSV Manager is also pretty fast!). So here's the code:


public void doLoadEvent() throws Exception {
  CsvManager csvman = new CsvManager();
  csvman.getCsvSpec().setStartLine(2);
  csvman.load( "products.csv", new DatabaseListener() );
}


public final class DatabaseListener extends CustomLineListener {
  private Connection con = null;
  private PreparedStatement ps = null;

  protected void startProcessImpl() throws Exception {
    con = getConnection();
    ps  = con.prepareStatement( "INSERT INTO Product VALUES (?,?,?,?)" ); 
  }

  protected BadLine handleLineImpl( String[] pFields, int pNumFields, 
                                    long pLineNumber, String pOriginalLine) 
    throws Exception 
  {
    insertRow( pFields, ps );      
    return null;
  }

  protected void endProcessImpl() throws Exception {
    con.close();
  }
}

As you can see, the actual loading code in the doLoadEvent method is very small — just a few lines. Notice again that we only start on the second line to avoid the headers. The meat of the problem is attacked in the DatabaseListener class. This extends CustomLineListener which is the recommended way to implement the LineListener interface.

What you need to do here is implement some methods so that CSV Manager can let you know what's going on during the loading process. The startProcessImpl and endProcessImpl methods let you setup your database connection and close it again. They are called before loading of CSV data begins, and after it finishes.

But how do you actually get the data? This happens in the handleLineImpl method. The CSV data comes in via the pFields String[] array. In our case, we're just passing this array on to the insertRow method that we used the last time. (We return null from handleLineImpl to indicate that there were no errors, otherwise we can return a BadLine object to let CSV Manager know that something was wrong with the data.)

So what's happening here? Instead of loading all the data into one big data structure, we've given CSV Manager a new object, DatabaseListener. CSV Manager knows how to use this object to pass on the data that it loads from the CSV file. And inside DatabaseListener, we actually save the data to the database. Because all of this happens one line at a time, we can keep processing lines for a very long time. This means that we can handle files with millions of lines (or more).

What about saving lots of data? What if your database table contains millions of rows and you need to get them into a CSV file? Well, we can use the same idea. Here's the code:


public static void doSaveEvent() throws Exception {
  CsvManager csvman = new CsvManager();
  csvman.save( "products-export.csv", new DatabaseProvider() );
}


public static void loadRow( String[] pFields, 
                            ResultSet pResultSet ) 
  throws Exception 
{
  pFields[0] = String.valueOf(pResultSet.getInt(1));
  pFields[1] = pResultSet.getString(2);
  pFields[2] = pResultSet.getString(3);
  pFields[3] = String.valueOf(pResultSet.getInt(4));
}


public static final class DatabaseProvider extends CustomLineProvider {
  private Connection con   = null;
  private ResultSet  rs    = null;
  private boolean    first = true; 

  protected void startProcessImpl() throws Exception {
    con = getConnection();
    PreparedStatement ps  = con.prepareStatement( "SELECT * FROM Product" ); 
    rs = ps.executeQuery();
  }

  protected boolean hasNextLineImpl() throws Exception {
    if( first ) { return true; }
    return rs.next();
  }

  protected String[] nextLineImpl() throws Exception {
    String[] fields = new String[4];

    if( first ) {
      fields = new String[] {"id","code","name","price"};
      first  = false;
    }
    else {
      loadRow( fields, rs );
    }
    
    return fields;
  }

  protected void endProcessImpl() throws Exception {
    con.close();
  }
}

Yeah, this is a bit more code than the other examples. Actually this is the longest piece of code in this example. Anyway, the starting method doSaveEvent is pretty much the same as for event-based loading. In this case we use a DatabaseProvider object.

We've also defined a new utility method: loadRow. This method just gets data out of a ResultSet row and puts it into a String[] array.

So let's look at DatabaseProvider. The structure is pretty much the same as DatabaseListener. The startProcessImpl and endProcessImpl methods do pretty much the same stuff. However, startProcessImpl also executes the database query. Now, you might be saying to yourself, "hold on! that means that you're loading all the data at once!" Actually we're safe. Almost all database drivers implement ResultSet using a fetch-size concept. This means that only 10 or so rows (the fetch-size, which depends on the database) are ever loaded at one time. The database driver goes back to the database for more when the current set runs out. It just appears from the outside that you have immediate access to all the data. Clever, eh?

Back to DatabaseProvider. Let's look at where all the work happens, inside the hasNextLineImpl and nextLineImpl methods. Inside hasNextLineImpl we move to the next ResultSet row. We also check to see if this is the first line of CSV data. If so, we don't go near the ResultSet and just return true. This allows us to output the CSV headers in nextLineImpl.

So we get to nextLineImpl. CSV Manager is expecting to get back a String[] array of data from us. This String[] array will be written out to the CSV file as the next line of data. We create an empty array and get the data for the array using the loadRow utility method. The only special case is for the first line of the CSV file. In this case we manually return the headers.

Well this approach is very nice and certainly solves the problem of handling big files. Still, you have to implement interfaces and handle callback methods. Sometimes is nicer just to use direct method calls. Let's look at that option now...

The Stream-based Approach

The CSV Manager streaming API works like ResultSet, so it should be fairly familiar. Basically, you keep calling the hasNext and next methods until there is no more data (hasNext returns false). The best way to see it is in code, so here it is:


public static void doLoadStream() throws Exception {
  Connection        con = getConnection();
  PreparedStatement ps  
    = con.prepareStatement( "INSERT INTO Product VALUES (?,?,?,?)" ); 

  CsvManager csvman = new CsvManager();
  CsvLoader loader = csvman.makeLoader( "products.csv" );    

  boolean first = true;

  loader.begin();
  while( loader.hasNext() ) {
    if( first ) { 
      first = false; 
      continue;
    }
    String[] fields = loader.next();
    insertRow( fields, ps );
  }    
  loader.end();
  
  con.close();
}

All the database stuff is the same. The new bit is the CsvManager.makeLoader method call. This creates a CsvLoader object that is setup to load data lines from the specified file (products.csv). You call methods on the CsvLoader in a defined sequence, just like ResultSet.

The method calls work like so: at the start and end you call CsvLoader.begin and CsvLoader.end. This lets CSV Manager know that it should reserve resources for reading the CSV file, and then free them up again afterwards.

After calling begin, you then enter while loop based on the CsvLoader.hasNext and CsvLoader.next methods. The hasNext method returns true so long as there is another line of data to deal with. The next method returns that line of data as a String[] array. As before, we then use the insertRow utility method inside this loop to actually insert the data into the database. And also as before, we ignore the first line becuase it is the field name headers.

In principle the event-based and stream-based approaches are actually the same. They each load a line of data at a time. In the event-based approach however, you must respond to a method call, so you do not determine when a line of data is loaded. In the stream-based case, you do decide. No data is loaded until you start a hasNext, next cycle (small point: the data is actually loaded in the hasNext method, as it has to be loaded anyway to tell if there is any data).

So you can use the stream-based API if you need exact control of the loading process. And it's also sometimes easier to think in terms of method calls that you make, rather than callbacks that are made for you. The stream-based API is also useful if you want to chain data, by wrapping the CsvLoader in a custom class or classes. This would be similar to the way you can chain InputStreams.

OK, let's look at saving data using the streaming API. Here's the code:


public static void doSaveStream() throws Exception {
  Connection        con = getConnection();
  PreparedStatement ps  = con.prepareStatement( "SELECT * FROM Product" ); 

  CsvManager csvman = new CsvManager();
  CsvSaver saver = csvman.makeSaver( "products-export.csv" );

  ResultSet rs = ps.executeQuery();

  saver.begin();

  String[] fields = new String[] {"id","code","name","price"};
  saver.next( fields );
  
  while( rs.next() ) {
    loadRow( fields, rs );
    saver.next( fields );
  }    
  saver.end();
  
  con.close();
}

Saving is a little bit simpler than loading. You don't have to worry about a hasNext method, you just keep calling CsvSaver.next while you have more data. In this example code we setup the database as usual, open a ResultSet, and then use a CsvSaver to save the data one line at a time. As before, the header lines are a special case. We use the loadRow utility method to get the data out of the database and into a String[] array.

Source Code

Here is a list of all the files used in this example. Note that the actual source code is slightly longer than the examples above, which have been abridged for clarity.

Feel free to experiment with these files and see what happens. You can also use them as the basis for your own solutions.

Questions and Comments

Please feel free to email us at examples@ricebridge.com if you have any questions or comments about this example.

Got a question for us?
Just Ask!


$15 Gift Certificate for every bug you find.

Bookmark Ricebridge Java Components - user friendly and well documented at del.icio.us Digg Ricebridge Java Components - user friendly and well documented at Digg.com Bookmark Ricebridge Java Components - user friendly and well documented at reddit.com Bookmark Ricebridge Java Components - user friendly and well documented at YahooMyWeb Bookmark Ricebridge Java Components - user friendly and well documented at Spurl.net Bookmark Ricebridge Java Components - user friendly and well documented at Simpy.com Bookmark Polyphasic Mutants at NewsVine Blink this Ricebridge Java Components - user friendly and well documented at blinklist.com Bookmark Ricebridge Java Components - user friendly and well documented at Furl.net Fark Ricebridge Java Components - user friendly and well documented at Fark.com

comment on this page Home | Search | About Us | Contact Us | Our Products | Documentation | Resources | Login
Copyright © 2004-2008 Ricebridge. All Rights Reserved.