Ricebridge
Search This Site
Feb 04 2012 16:17 UTC

Got a question for us?
Just Ask!


$15 Gift Certificate for every bug you find.

Bookmark Ricebridge Java Components - user friendly and well documented at del.icio.us Digg Ricebridge Java Components - user friendly and well documented at Digg.com Bookmark Ricebridge Java Components - user friendly and well documented at reddit.com Bookmark Ricebridge Java Components - user friendly and well documented at YahooMyWeb Bookmark Ricebridge Java Components - user friendly and well documented at Spurl.net Bookmark Ricebridge Java Components - user friendly and well documented at Simpy.com Bookmark Polyphasic Mutants at NewsVine Blink this Ricebridge Java Components - user friendly and well documented at blinklist.com Bookmark Ricebridge Java Components - user friendly and well documented at Furl.net Fark Ricebridge Java Components - user friendly and well documented at Fark.com

Java Beans and Large Files Example

Summary

This is a longer example showing you how to:

  • load data records from XML and use them to initialise Java Beans
  • handle different kinds of data record in one load operation
  • work with very large files

The Data

We will use a small XML file as our initial example data, before we look at working with large files. This file contains a list of the projects and employees in a company, and indicates which employee is in which project.

company.xml

<company>

  <project num="1" name="Foo">
    <deadline>2006-10-20</deadline>
    <employees>1:2:3</employees>
  </project>

  <project num="2" name="Bar">
    <deadline>2006-11-15</deadline>
    <employees>3:4:5</employees>
  </project>

  <employee num="1" manager="false">
    <name>John Doe</name>
    <hired>2005-01-01</hired>
    <projects>1</projects>
  </employee>

  <employee num="2" manager="false">
    <name>Jane Doe</name>
    <hired>2005-02-02</hired>
    <projects>1</projects>
  </employee>

  <employee num="3" manager="true">
    <name>Mr. Manager</name>
    <hired>2005-03-03</hired>
    <projects>1:2</projects>
  </employee>

  <employee num="4" manager="false">
    <name>Joe Bloggs</name>
    <hired>2005-04-04</hired>
    <projects>2</projects>
  </employee>

  <employee num="5" manager="false">
    <name>Jane Bloggs</name>
    <hired>2005-05-05</hired>
    <projects>2</projects>
  </employee>

</company>

The data contains some special formats that we need to deal with. First the dates are given in a format of YYYY-MM-DD, so we'll have to parse them using a DateFormat. Also, the employees and projects elements are colon-separated lists of numbers. These will correspond to indexed properties in our beans. Luckily, XML Manager can handle these automatically.

We want to display this data as a set of HTML tables showing the projects and employees, and who is assigned to what. Here is the output we want:


ProjectDeadlineEmployees
Bar 15/Nov/2006 Jane Bloggs Joe Bloggs Mr. Manager
Foo 20/Oct/2006 Jane Doe John Doe Mr. Manager

EmployeeManagerHiredProjects
Jane Bloggs NO 05/May/2005 Bar
Jane Doe NO 02/Feb/2005 Foo
Joe Bloggs NO 04/Apr/2005 Bar
John Doe NO 01/Jan/2005 Foo
Mr. Manager YES 03/Mar/2005 Bar Foo

To get the employee data we will use the following XPath expressions;

  • To get each employee: /company/employee
  • To define the type of bean: 'employee'
  • To get the employee number: @num
  • To indicate if the employee is a manager : @manager
  • To get the employee name: name
  • To get the date the employee was hired: hired
  • To get the list of projects that employee is assigned to: projects

To get the project data we will use the following XPath expressions;

  • To get each project: /company/project
  • To define the type of bean: 'project'
  • To get the project number: @num
  • To get the project name: @name
  • To get project deadline: deadline
  • To get employess assigned to this project: employees

The Beans

Here are the beans that we want to use:


public class Employee {
  private int     iNumber         = 0;
  private String  iName           = "";
  private Date    iHireDate       = new Date();
  private boolean iManager        = false;
  private int[]   iProjectNumbers = new int[] {};

  public int getNumber() {
    return iNumber;
  }
  public void setNumber( int pNumber ) {
    iNumber = pNumber;
  }

  public String getName() {
    return iName;
  }
  public void setName( String pName ) {
    iName = pName;
  }

  public Date getHireDate() {
    return iHireDate;
  }
  public void setHireDate( Date pHireDate ) {
    iHireDate = pHireDate;
  }

  public boolean isManager() {
    return iManager;
  }
  public void setManager( boolean pManager ) {
    iManager = pManager;
  }

  public int[] getProjectNumbers() {
    return iProjectNumbers;
  }
  public void setProjectNumbers( int[] pProjectNumbers ) {
    iProjectNumbers = pProjectNumbers;
  }
  public int getProjectNumber( int pIndex ) {
    return iProjectNumbers[pIndex];
  }
  public void setProjectNumber( int pIndex, int pNumber ) {
    iProjectNumbers[pIndex] = pNumber;
  }
}

public class Project {
  private int      iNumber          = 0;
  private String   iName            = "";
  private Date     iDeadline        = new Date();
  private int[]    iEmployeeNumbers = new int[] {};

  public int getNumber() {
    return iNumber;
  }
  public void setNumber( int pNumber ) {
    iNumber = pNumber;
  }

  public String getName() {
    return iName;
  }
  public void setName( String pName ) {
    iName = pName;
  }

  public Date getDeadline() {
    return iDeadline;
  }
  public void setDeadline( Date pDeadline ) {
    iDeadline = pDeadline;
  }

  public int[] getEmployeeNumbers() {
    return iEmployeeNumbers;
  }
  public void setEmployeeNumbers( int[] pEmployeeNumbers ) {
    iEmployeeNumbers = pEmployeeNumbers;
  }
  public int getEmployeeNumber( int pIndex ) {
    return iEmployeeNumbers[pIndex];
  }
  public void setEmployeeNumber( int pIndex, int pNumber ) {
    iEmployeeNumbers[pIndex] = pNumber;
  }
}

And here are the RecordSpecs that will read them in:


// This is the XPath to extract the employee details.
RecordSpec rs_employee
  = new RecordSpec("/company/employee", 
      new String[]{"'employee'","@num","@manager","name","hired","projects"},
      new String[]{"","Number","Manager","Name","HireDate","ProjectNumbers"});

// This is the XPath to extract the project details.
RecordSpec rs_project
  = new RecordSpec("/company/project", 
      new String[]{"'project'","@num","@name","deadline","employees"},
      new String[]{"","Number","Name","Deadline","EmployeeNumbers"});

The first data field in each of these RecordSpecs is used as a code to indentify the type of data record. We just use literal string values: 'employee' for employees and 'project' for projects. Our custom RecordListener will use this code to handle the data records for each bean separately.

Notice also that we use the bean property names as the data field names. The RecordSpec objects are constructed with a third argument that is also a String[] array. This is an array of field names that will be used to identify the Java Bean set and get methods, so they should match the Java Bean property method names exactly if you put a set or get in front of them. For example: "Manager" means look for methods called setManager and getManager.

The final thing that we need is a BeanSpec object for each bean. This is a utility object that describes the Java Bean for XML Manager. The default way to construct a BeanSpec is to use the Java Bean .class property. For example:

BeanSpec bs_employee = new BeanSpec( Employee.class );

For our case, we need to provide a bit more information, because we are using a custom date field. What we need is a way to tell XML Manager how to convert between a java.util.Date and our String format, YYYY-MM-DD. To do that we use another utility class called a StringConverter. This is a very simple interface that takes a String and makes an Object, and also does the opposite. Here is our converter for dates:

DateConverter.java

public class DateConverter extends DefaultStringConverter {
  private Date       iDefault    = new Date();
  private DateFormat iDateFormat = null;

  public DateConverter( DateFormat pDateFormat ) {
    iDateFormat = pDateFormat;
  }

  protected Object makeObjectImpl( String pValue ) throws Exception {
    return iDateFormat.parse(pValue);
  }

  protected Object makeDefaultObjectImpl() {
    return new Date();
  }

  protected String makeStringImpl( Object pValue ) throws Exception {
    return iDateFormat.format((Date)pValue);
  }

  protected String makeDefaultStringImpl() {
    return new Date().toString();
  }
}

The converter actually inherits from the DefaultStringConverter class. The DefaultStringConverter handles any formatting errors for us and returns default values if required (this is how the Bean.useDefault setting is implemented).

Loading the Data

OK, now that we've got all the pieces, let's assemble them. Normally when you use XML Manager to load Java Beans, you just call the XmlManager.loadBeans method. This loads up all the beans in your XML document using the specifed RecordSpec and BeanSpec object. The problem is that they must all be beans of the same type. We want to load Employee and Project beans at the same time.

We can do this by writing a custom RecordListener. In fact, most of the work is already done for us. We can copy the code of BeanRecordListener, and modify it to handle multiple beans. Here is the result:

MultipleBeansRecordListener.java

public class MultipleBeansRecordListener extends RecordListenerSupport {
  protected HashMap iBeanListMap   = new HashMap();
  protected HashMap iBeanFieldsMap = new HashMap();
  protected HashMap iBeanSpecMap   = new HashMap();
  protected boolean iUseDefault    = false;

  public void addBeanSpec( String pCodeName, BeanSpec pBeanSpec, RecordSpec pRecordSpec ) {
    iBeanSpecMap.put( pCodeName, pBeanSpec );
    iBeanFieldsMap.put( pCodeName, pRecordSpec.getFieldNames() );
    iBeanListMap.put( pCodeName, new ArrayList() );
  }

  public List getBeans( String pCodeName) {
    return (List) iBeanListMap.get( pCodeName );
  }

  protected void setXmlSpecImpl( XmlSpec pXmlSpec ) {
    iUseDefault = pXmlSpec.getBooleanProperty( BeanRecordListener.PROP_Bean_useDefault );
  }

  protected BadRecord handleRecordImpl( String[] pRecord, long pRecordNumber ) throws Exception {
    String codename = pRecord[0];
    if( iBeanSpecMap.containsKey( codename ) ) {
      BeanSpec bs         = (BeanSpec) iBeanSpecMap.get( codename );
      String[] fieldnames = (String[]) iBeanFieldsMap.get( codename );
      Object   bean       = bs.getBeanClass().newInstance();
    
      for( int fI = 1; fI < fieldnames.length; fI++ ) {
        bs.setStringValue( bean, fieldnames[fI], pRecord[fI], iUseDefault );
      }

      ArrayList beanlist = (ArrayList) iBeanListMap.get( codename );
      beanlist.add( bean );

      return null;
    }
    else {
      return new BadRecord( pRecordNumber, pRecord, "unknown code: "+codename );
    }
  }
}

This is the core class of our Java Bean reader. The most important line is:

bs.setStringValue( bean, fieldnames[fI], pRecord[fI], iUseDefault );

This uses the BeanSpec.setStringValue method to set the bean property using a String representation of the property value. In order to get this to work, we have to find the correct BeanSpec object for the current record, and the correct list of field names. We do this by using the code (the first element of the record array), to get these objects out of the iBeanSpecMap and iBeanFieldsMap HashMaps respectively.

Notice that we start from the second element of the fieldnames array, as the first element is the bean code.

Once we have set all the bean properties, we add the new bean object to the correct list of beans of that type. We store the different bean lists in the iBeanListMap HashMap.

If the code is not recognised, that is, if a BeanSpec for that code has not been added with the addBeanSpec method, then we return a BadRecord to XML Manager. We could also throw an Exception, but in general it is better to return a BadRecord so that the error can be properly described by the actual application. XML Manager can only create generic error messages when an Exception is thrown by a RecordListener.

Let's tie everything together. Here is the make method of the MakeTable class that runs XML Manager and generates the HTML table from the XML files:


public void make( File pCompanyXmlFile ) throws Exception {
  XmlManager    xmlman = new XmlManager();
  DateConverter dc     = new DateConverter( sDateInputFormat );

  RecordSpec rs_project
    = new RecordSpec("/company/project", 
                     new String[]{"'project'","@num","@name","deadline","employees"},
                     new String[]{"","Number","Name","Deadline","EmployeeNumbers"});
  HashMap project_stringconv = new HashMap();
  project_stringconv.put( "Deadline", dc );
  BeanSpec bs_project = new BeanSpec( Project.class, project_stringconv );

  RecordSpec rs_employee
    = new RecordSpec("/company/employee", 
                     new String[]{"'employee'","@num","@manager","name","hired","projects"},
                     new String[]{"","Number","Manager","Name","HireDate","ProjectNumbers"});
  HashMap employee_stringconv = new HashMap();
  employee_stringconv.put( "HireDate", dc );
  BeanSpec bs_employee = new BeanSpec( Employee.class, employee_stringconv );

  MultipleBeansRecordListener mb = new MultipleBeansRecordListener();
  mb.addBeanSpec( "project", bs_project, rs_project );
  mb.addBeanSpec( "employee", bs_employee, rs_employee );

  xmlman.load( pCompanyXmlFile, new RecordSpec( rs_project, rs_employee ), mb );
  List employees = mb.getBeans("employee");
  List projects  = mb.getBeans("project");
}

The source code above is simplified so that you can see the flow more clearly. The full version is in the MakeTable.java file. Notice that we associate the DateConverter object with the relevant fields by using a HashMap and passing it to the BeanSpec.

All that this method does is assemble the various utility classes and call XML Manager's load method. This is the most important line:

xmlman.load( pCompanyXmlFile, new RecordSpec( rs_project, rs_employee ), mb );

We use one of the RecordSpec convenience constructors so that XML Manager will know to use both RecordSpecs at the same time.

We are left with two Lists containing Employee beans and Project beans. These lists are then used to create the HTML table, using some very simple String generating methods that just output standard HTML table tags. The gory details are in the MakeTable.java file.

To get this example to run, just compile all the java files in the doc/examples/beans folder, and run the MakeTable class without any arguments. A file called company.htm will be created from the company.xml file.

Streaming the Data

Well this is all very nice, but the example company.xml file only has two projects and five employees. What happens when we have 1000s of employees and 100s of projects? Or millions?

The answer is to stream the data. By this we mean that we will not load all the data into memory at once. Instead we will load each data record one at a time, and output each HTML table row one at a time. We will never need to store more than one row in memory. Our current example just generates HTML, but you can also use this technique for loading large volumes of data into databases, or for handling large volumnes of web service transactions.

The HTML generated from the streaming solution does have one drawback. The cross-reference links will have to use numbers, not the names of the employees, because we cannot look ahead into the file to records when have not yet been parsed. As always, speed requires trade-offs.

To create a streaming solution, we subclass the MultipleBeansRecordListener class. Most of the bean logic is the same, but instead of storing the bean in a List, now we call the static HTML generating methods of MakeTable to output HTML directly. We use a PrintWriter as the destination for our HTML. Here is the code:

StreamingMultipleBeansRecordListener.java

public class StreamingMultipleBeansRecordListener extends MultipleBeansRecordListener {
  protected PrintWriter iPrintWriter     = null;
  protected boolean     iLastWasEmployee = false;
  protected boolean     iFirst           = true;

  public StreamingMultipleBeansRecordListener( PrintWriter pPrintWriter ) {
    iPrintWriter = pPrintWriter;
  }


  protected BadRecord handleRecordImpl( String[] pRecord, long pRecordNumber ) throws Exception {
    String codename = pRecord[0];
    if( iBeanSpecMap.containsKey( codename ) ) {
      BeanSpec bs         = (BeanSpec) iBeanSpecMap.get( codename );
      String[] fieldnames = (String[]) iBeanFieldsMap.get( codename );
      Object   bean       = bs.getBeanClass().newInstance();
    
      for( int fI = 1; fI < fieldnames.length; fI++ ) {
        bs.setStringValue( bean, fieldnames[fI], pRecord[fI], iUseDefault );
      }

      saveBean( bean );

      return null;
    }
    else {
      return new BadRecord( pRecordNumber, pRecord, "unknown code: "+codename );
    }
  }


  protected void saveBean( Object pBean ) {
    if( pBean instanceof Project ) {
      if( iLastWasEmployee || iFirst ) {
        if( !iFirst ) { 
          MakeTable.outputEndEmployeeTable( iPrintWriter );
        }
        else {
          iFirst = false;
        }
        MakeTable.outputStartProjectTable( iPrintWriter );
      }
      MakeTable.outputProject( (Project) pBean, null, iPrintWriter );
      iLastWasEmployee = false;
    }
    else if( pBean instanceof Employee ) {
      if( !iLastWasEmployee || iFirst ) {
        if( !iFirst ) { 
          MakeTable.outputEndProjectTable( iPrintWriter );
        }
        else {
          iFirst = false;
        }
        MakeTable.outputStartEmployeeTable( iPrintWriter );
      }
      MakeTable.outputEmployee( (Employee) pBean, null, iPrintWriter );
      iLastWasEmployee = true;
    }
  }
}

As you can see, the handleRecordImpl method is nearly identical, but once we have the bean, we send it to the saveBean method to generate the HTML.

The saveBean method looks at the bean, and if it is an employee, it outputs an employee table row, and the same for projects. There is some fiddly bookkeeping code to make sure the right type of table is being used.

And that's it. Now, how many records can this handle? Well, we've included a utility class called MakeReallyBigFile.java. This takes a single numeric argument indicating how many records you want to generate. Try it with 1000000. Go on, you know you want to! It generates an XML file called reallybig.xml. To test this file, run MakeTable in non-streaming mode (Windows command line version):

java -cp .;..\..\lib\xmlman.jar MakeTable reallybig.xml

If the file is big enough, you'll probably get an OutOfMemoryError. Now try it in streaming mode (alternate UNIX version):

java -cp .:../../lib/xmlman.jar MakeTable reallybig.xml stream

To keep you updated, status information is output every 100 records. This includes memory usage. Compare the memory usage of the streaming and non-streaming versions. You'll notice that the non-streaming version just keeps using more memory until there's none left. The streaming version maintains a relatively constant memory load and will keep going until your hard disk is full.

Source Code

Here is a list of all the files used in this example. Note that the actual source code is slightly longer than the examples above, which have been abridged for clarity.

Feel free to experiment with these classes and see what happens. You can also use them as the basis for your own streaming solution.

Questions and Comments

Please feel free to email us at examples@ricebridge.com if you have any questions or comments about this example.

comment on this page Home | Search | About Us | Contact Us | Our Products | Documentation | Resources | Login
Copyright © 2004-2012 Ricebridge. All Rights Reserved.