Ricebridge
Search This Site
Jul 03 2009 02:11 UTC

Got a question for us?
Just Ask!


$15 Gift Certificate for every bug you find.

Bookmark How to convert CSV into XML and XML into CSV using the Ricebridge CSV Manager and XML Manager Java Components at del.icio.us Digg How to convert CSV into XML and XML into CSV using the Ricebridge CSV Manager and XML Manager Java Components at Digg.com Bookmark How to convert CSV into XML and XML into CSV using the Ricebridge CSV Manager and XML Manager Java Components at reddit.com Bookmark How to convert CSV into XML and XML into CSV using the Ricebridge CSV Manager and XML Manager Java Components at YahooMyWeb Bookmark How to convert CSV into XML and XML into CSV using the Ricebridge CSV Manager and XML Manager Java Components at Spurl.net Bookmark How to convert CSV into XML and XML into CSV using the Ricebridge CSV Manager and XML Manager Java Components at Simpy.com Bookmark Polyphasic Mutants at NewsVine Blink this How to convert CSV into XML and XML into CSV using the Ricebridge CSV Manager and XML Manager Java Components at blinklist.com Bookmark How to convert CSV into XML and XML into CSV using the Ricebridge CSV Manager and XML Manager Java Components at Furl.net Fark How to convert CSV into XML and XML into CSV using the Ricebridge CSV Manager and XML Manager Java Components at Fark.com


How to convert CSV into XML and XML into CSV using Java

You can use the Ricebridge XML Manager and CSV Manager Java components together to easily convert from CSV to XML and back again. You can create XML with multiple levels of grouped elements, and simple CSV files from complex XML documents. You can even handle multi-gigabyte files and never run out of memory. In this article we take you through the code, line by line, and explain all the details. Don't worry, we've designed our components so that this stuff is easy.

You get a FREE copy of CSV Manager when you buy XML Manager, so you only need to invest in one component to implement the source code in this article.

Turning XML into CSV


XML to CSV

Why would you want to do that? What's so cool about converting XML into CSV?

"My boss told me to." Or rather, you're getting business data in as an XML document, and you need to pump it into another system as a CSV file. Most databases accept CSV files for data import, and it's a handy way of loading really large amounts of data.

So what's the big deal? Well if you're going to be creating a CSV file, then you have to convert your XML data into a list of data records, so you'll need to pull out all the XML elements that represent these data records. I'm betting they're given as a big long list of ITEM or RECORD elements, containing a set of FIELD or PROPERTY elements, right? Of course, your XML may not be so nice. You might have a real mess of subelements and attributes, with records grouped by parent elements. All sorts of fun. That's where XML Manager comes in. It can pluck all that data out of the XML and turn into a nice list of data fields, ready to be saved out to CSV one line at a time.

So the basic idea is:

  • Get your XML file and decide on the main element for data records
  • Use XML Manager to pull out the data as a list
  • Save the list of records to a CSV file, one line per record

Some eBay XML Data

OK. Let's look at using some real-world XML. Let's use some XML from the eBay web service. One of the things you can do with the eBay service is search for auctions using a keyword. Here's the query:

http://rest.api.sandbox.ebay.com/restapi
?CallName=GetSearchResults
&RequestToken=[your request token]
&RequestUserId=[your user id]
&Query=toy&Schema=1

The eBay documentation explains how to create this query. You can find out all the details on their site - it's pretty easy to figure it out. It just asks for all the auctions that are for "toys" (The data is from the eBay sandbox and is not live data). What do you get back? Here's an extract from the XML we're going to work with:

GetSearchResults.xml

<GetSearchResultsResponse xmlns="urn:ebay:apis:eBLBaseComponents">
  <Timestamp>2005-12-20T15:34:02.286Z</Timestamp>
  <Ack>Success</Ack>
  <Version>437</Version>
  <Build>e437_core_Bundled_2146580_R1</Build>
  <SearchResultItemArray>
    <SearchResultItem>
      <Item>
        <ItemID>4504216603</ItemID>
        <ListingDetails>
          <StartTime>2005-12-19T10:00:10.000Z</StartTime>
          <EndTime>2005-12-20T10:00:10.000Z</EndTime>
          <ViewItemURL>http://cgi.sandbox.ebay.com/ws/eBayISAPI.dll
            ?ViewItem&item=4504216603&category=26424&rd=1</ViewItemURL>
        </ListingDetails>
        <SellingStatus>
          <BidCount>0</BidCount>
          <CurrentPrice currencyID="AUD">0.99</CurrentPrice>
          <ConvertedCurrentPrice currencyID="USD">0.73953</ConvertedCurrentPrice>
        </SellingStatus>
        <Site>Australia</Site>
        <Title>12x NEW AQUATIC SPORTS WATER PARK TOY, LARGE, BULK LOTS</Title>
        <ListingType>Chinese</ListingType>
        <Currency>AUD</Currency>
        <GiftIcon>0</GiftIcon>
        <SiteHostedPicture>
          <GalleryType>Gallery</GalleryType>
        </SiteHostedPicture>
        <VendorHostedPicture>
          <GalleryURL>http://thumbs.ebay.com/pict/4504216603.jpg</GalleryURL>
          <GalleryType>Gallery</GalleryType>
        </VendorHostedPicture>
        <SubTitle>
        </SubTitle>
        <Country>AU</Country>
        <Storefront>
          <StoreCategoryID>0</StoreCategoryID>
          <StoreCategory2ID>0</StoreCategory2ID>
          <StoreURL>http://stores.sandbox.ebay.com/id=428741</StoreURL>
          <StoreName>S3 Group Store</StoreName>
        </Storefront>
        <PostalCode>2020</PostalCode>
        <ShippingDetails>
          <ShippingType>NotSpecified</ShippingType>
        </ShippingDetails>
        <SearchDetails>
          <BuyItNowEnabled>false</BuyItNowEnabled>
        </SearchDetails>
      </Item>
      <SearchResultValues>New</SearchResultValues>
      <SearchResultValues>Picture</SearchResultValues>
    </SearchResultItem>
    ...
  </SearchResultItemArray>
  <ItemsPerPage>100</ItemsPerPage>
  <PageNumber>1</PageNumber>
  <HasMoreItems>false</HasMoreItems>
  <PaginationResult>
    <TotalNumberOfPages>1</TotalNumberOfPages>
    <TotalNumberOfEntries>40</TotalNumberOfEntries>
  </PaginationResult>
  <CategoryArray/>
</GetSearchResultsResponse>

Getting Data Out of the XML

Dude!

Dude! How does eBay expect people to work with this stuff? Well that's web services for you. Luckily we have XML Manager and we can grab the data we want with a few simple XPath expressions.

So what do we want? For each item, It'd be nice to get the price, the title, the currency, stuff like that. We'll have to tell XML Manager which XML element contains all the data for one data record. One data record represents one line of our CSV file. From this record XML element, we can specify further subelements inside that element that contain the data for each data field.

We use XPath to point at the record we are interested in, so let's write a little XPath to do that:

/e:GetSearchResultsResponse/e:SearchResultItemArray/e:SearchResultItem

What does that do? First it looks for a GetSearchResultsResponse element, then inside that a SearchResultItemArray, and finally inside that it looks for each SearchResultItem. We want to output one line of CSV for each SearchResultItem element, so this is our record element. This XPath reflects the structure of that beautiful eBay XML document in the previous section. What does the e: at the start of each element name mean? Well, you may have noticed that the eBay XML uses a namespace: urn:ebay:apis:eBLBaseComponents. We use the letter e to point to this namespace (Don't worry, you'll see the code in a minute!).

What About the Price and Title and the Other Data?

We're going to need some more XPath to pull out the data fields. These XPath expressions will be relative to the data record element. Here's how we do it:

<SearchResultItem>
  <Item>
    <ItemID>4504216603</ItemID>
    <ListingDetails>
      <StartTime>2005-12-19T10:00:10.000Z
        </StartTime>
      <EndTime>2005-12-20T10:00:10.000Z
        </EndTime>
      <ViewItemURL>http://cgi.sandbox.ebay.com/
        ws/eBayISAPI.dll?ViewItem&
        item=4504216603&category=26424&
        rd=1</ViewItemURL>
    </ListingDetails>
    <SellingStatus>
      <BidCount>0</BidCount>
      <CurrentPrice
        currencyID="AUD">0.99</CurrentPrice>
      ...
    </SellingStatus>
    <Site>Australia</Site>
    <Title>12x NEW AQUATIC SPORTS 
      WATER PARK TOY, LARGE, BULK LOTS</Title>
    ... 
    <Country>AU</Country>
  </Item>
</SearchResultItem>

To make this example easier, we've removed all the bits from the eBay XML we're not interested in.

e:Item/e:ItemID
This is the eBay code for the auction item.
e:Item/e:ListingDetails/e:StartTime
The start time of the auction.
e:Item/e:ListingDetails/e:EndTime
The end time of the auction.
e:Item/e:ListingDetails/e:ViewItemURL
The URL of the auction.
e:Item/e:SellingStatus/e:BidCount
The number of bids.
e:Item/e:SellingStatus/e:CurrentPrice/@currencyID
The currency.
e:Item/e:SellingStatus/e:CurrentPrice
The current price.
e:Item/e:Site
The eBay site.
e:Item/e:Title
The title of the auction.
e:Item/e:Country
The country of the auction.

Using XML Manager to Load the Data

Alright, lets get some source code on the table. The following example shows how to use the XML Manager API to load the eBay XML file and produce a nice list of String[] arrays containing our data fields.

Xml2Csv.java
// Step 1. Create an XmlManager
File       xmlfile = new File("GetSearchResults.xml");
XmlManager xmlman  = new XmlManager(); 

// Step 2. Define the eBay namespace
XmlSpec    xmlspec = xmlman.getXmlSpec(); 
xmlspec.addNamespace("e","urn:ebay:apis:eBLBaseComponents");

// Step 3. Define the data fields
String[] fieldpaths = new String[] { 
    "e:Item/e:ItemID",
    "e:Item/e:ListingDetails/e:StartTime",
    "e:Item/e:ListingDetails/e:EndTime",
    "e:Item/e:ListingDetails/e:ViewItemURL",
    "e:Item/e:SellingStatus/e:BidCount",
    "e:Item/e:SellingStatus/e:CurrentPrice/@currencyID",
    "e:Item/e:SellingStatus/e:CurrentPrice",
    "e:Item/e:Site",
    "e:Item/e:Title",
    "e:Item/e:Country",
  };

// Step 3. cont. Define the data record
RecordSpec itemspec 
  = new RecordSpec( "/e:GetSearchResultsResponse/"
                    + "e:SearchResultItemArray/e:SearchResultItem", fieldpaths );

// Step 4. XML Manager returns a List of String[] arrays
List xmldata = xmlman.load( xmlfile, itemspec );

You can use XML Manager by following these steps:

  • 1. Create a new XmlManager object.
  • 2. Define any namespaces you need (we've added the eBay urn:ebay:apis:eBLBaseComponents namespace in the code above).
  • 3. Create a RecordSpec object to hold the XPath expressions
  • 4. Call one of the load methods of the XmlManager class.
XML elements converted to CSV lines

Let's Save the Data as a CSV File

This table shows the data we've just loaded:

ItemIDStartTimeEndTimeViewItemURLBidsCurrencyPriceSiteTitleCC
4...6032005-12-19...2005-12-20...http://...ebay.com/...item=4...603...0AUD0.99Australia12x NEW AQ...AU
4...8462005-12-13...2005-12-20...http://...ebay.com/...item=4...846...0USD1.0USVintage 70's...US
4...6402005-12-13...2005-12-20...http://...ebay.com/...item=4...640...0USD1.0USVintage LEGO...US
4...1022005-12-19...2005-12-20...http://...ebay.com/...item=4...102...0AUD0.99Australia12x NEW AQ...AU
..............................

We want to turn this table of data into a CSV file that looks like this:

GetSearchResults.csv

ItemID,StartTime,EndTime,ViewItemURL,BidCount,Currency,Price,Site,Title,Country
4504216603,2005-12-19T10:00:10.000Z,2005-12-20T10:00:10.000Z,http://cgi.sandbox.ebay.com/ws/
  eBayISAPI.dll?ViewItem&item=4504216603&category=26424&rd=1,0,AUD,0.99,Australia,
    "12x NEW AQUATIC SPORTS WATER PARK TOY, LARGE, BULK LOTS",AU
4504162846,2005-12-13T11:07:29.000Z,2005-12-20T11:07:29.000Z,http://cgi.sandbox.ebay.com/ws/
  eBayISAPI.dll?ViewItem&item=4504162846&category=19011&rd=1,0,USD,1.0,US,
    Vintage 70's LEGO Building Set w/ Motor-excellent cond!,US
4504163640,2005-12-13T13:18:00.000Z,2005-12-20T13:18:00.000Z,http://cgi.sandbox.ebay.com/ws/
  eBayISAPI.dll?ViewItem&item=4504163640&category=19011&rd=1,0,USD,1.0,US,
    Vintage LEGO Building Set from the 70's -Excellent Cond,US
4504221102,2005-12-19T19:53:25.000Z,2005-12-20T19:53:25.000Z,http://cgi.sandbox.ebay.com/ws/
  eBayISAPI.dll?ViewItem&item=4504221102&category=26424&rd=1,0,AUD,0.99,Australia,
    "12x NEW AQUATIC SPORTS WATER PARK TOY, LARGE, BULK LOTS",AU
...

The data records lines are pretty long, so we've broken them up and indented them so you can see all the data for each row.

So we have these rows of data as a list of String[] arrays, loaded by XML Manager. To save the String[] arrays to a CSV file, we'll use CSV Manager. Handily enough, CSV Manager can save a list of String[] arrays to a CSV file. Here's the code, it's really easy:


// xmldata is just a List of String[] arrays,
ArrayList csvdata = xmldata; 

// which CSV Manager also accepts!
csvman.save( new File("GetSearchResults.csv"), csvdata );


Oh. You wanted a header row in your CSV file? Sure thing!


ArrayList csvdata = new ArrayList();
csvdata.add( new String[] {"ItemID","StartTime","EndTime","ViewItemURL","BidCount",
                           "Currency","Price","Site","Title","Country"} );
csvdata.addAll( xmldata );

csvman.save( new File("GetSearchResults.csv"), csvdata );

And You're Done

That's it. No more work to do. All you did was load up the XML, use a few simple XPaths, got back the just about the simplest data structure possible: a list of strings, and saved them out to CSV. XML Manager and CSV Manager are designed to work together, so this type of conversion is no problem for them. OK. What about going the other way? What about converting CSV to XML. Stay tuned.

Turning CSV into XML


CSV to XML

Not satisfied with CSV? Need to add some tree-structured magic to your day?

"My boss told me to." Again? Demanding, aren't they?

Alrighty then. With CSV you get a big long list of data records. Each data record consists of a set of data fields. What you're going to do is take those data fields and put them into XML elements. Now they can be used as the text inside an element, or as attributes of an element, XML Manager can handle both. In the simplest case, you just want to have a set of subelements of the data record element, for each of the data fields. But XML Manager can also handle grouping of records, which allows you to create pretty complex XML documents from a simple CSV file.

In this case, the basic idea is:

  • Get your CSV file and load it with CSV Manager
  • Define the XML output format you want to create.
  • Save the list of records to an XML file, using XML Manager.

Show Me the CSV!

Let's use some more real-world data. Let's use the CSV report from the Google Adwords system. Google Adwords lets you bid on keywords that you want your adverts to appear beside. But you knew that already. They give you a nice report showing how much each keyword is costing you and how many clicks you're getting. And you can download it as a CSV file. So let's turn this CSV file into an XML document.

report.csv

REPORT:,Site / Keyword Report
ACCOUNT:,A Company
DATE RANGE:,"Nov 1, 2005 - Nov 30, 2005"
CAMPAIGNS:,A Campaign 
SITES / KEYWORDS:,Active
ADWORDS TYPE:,All
Date,Campaign,Ad Group,Site / Keyword,Match Type,Status,Keyword Min CPC,Current Maximum CPC,
  Current Maximum CPM,Keyword Destination URL,Impressions,Clicks,CTR,Avg CPC,Avg CPM,Cost,Avg Position
11/1/05,A Campaign,A Group,key1,Broad,Active,$0.52,$0.42,
  $0.00,default URL,1501,42,0.8%,$0.24,$0.94,$1.92,5.0
11/2/05,A Campaign,A Group,key1,Broad,Active,$0.52,$0.42,
  $0.00,default URL,3391,12,0.1%,$0.24,$1.03,$1.82,1.8
11/2/05,A Campaign,A Group,key2 key1,Broad,Active,$0.12,$0.12,
  $0.00,default URL,531,3,1.3%,$0.12,$0.65,$0.15,1.1
11/3/05,A Campaign,A Group,key3 key1,Broad,Active,$0.09,$0.09,
  $0.00,default URL,15,3,50.0%,$0.04,$8.00,$0.12,1.1
11/3/05,A Campaign,A Group,key1,Broad,Active,$0.52,$0.42,
  $0.00,default URL,1524,8,0.5%,$0.52,$0.31,$1.00,1.8
11/3/05,A Campaign,A Group,key2 key1,Broad,Active,$0.12,$0.12,
  $0.00,default URL,509,6,5.1%,$0.12,$0.44,$0.30,1.1
...

Don't get all excited now, it's fake data. We're only interested in the data field format of this CSV file. We'll also have to ignore the first seven lines, and we'll identify each data field by it's position in the header list. Once again, we've indented the lines to fit them on the page.

Show Me the XML!

report.xml

<report>
  <campaign name="A Campaign" group="A Group">
    <day date="11/1/05">
      <keyword text="key1" match="Broad" status="Active">
        <cpc min="$0.52" max="$0.42"></cpc>
        <cpm max="$0.00"></cpm>
        <url>default URL</url>
        <clicks imp="1501" count="42" ctr="0.8%" 
                avgcpc="$0.24" avgcpm="$0.94"></clicks>
        <cost>$1.92</cost>
        <avgpos>5.0</avgpos>
      </keyword>
    </day>
    <day date="11/2/05">
      <keyword text="key1" match="Broad" status="Active">
        <cpc min="$0.52" max="$0.42"></cpc>
        <cpm max="$0.00"></cpm>
        <url>default URL</url>
        <clicks imp="3391" count="12" ctr="0.1%" 
                avgcpc="$0.24" avgcpm="$1.03"></clicks>
        <cost>$1.82</cost>
        <avgpos>1.8</avgpos>
      </keyword>
      <keyword text="key2 key1" match="Broad" status="Active">
        <cpc min="$0.12" max="$0.12"></cpc>
        <cpm max="$0.00"></cpm>
        <url>default URL</url>
        <clicks imp="531" count="3" ctr="1.3%" 
                avgcpc="$0.12" avgcpm="$0.65"></clicks>
        <cost>$0.15</cost>
        <avgpos>1.1</avgpos>
      </keyword>
    </day>
    ...
  </campaign>
</report>

Here's the XML document we want to create. We've chosen to group the keywords by date. If a keyword was included in a search on the given date, and somebody clicked on the ad, then a keyword element will appear inside the day element. Days in turn are grouped by advertising campaign, and the root element is named report.

The nice thing about XML Manager is that we can use XPath expressions to save XML data, just the same as we use them to load XML data. The XPath is interpreted as a direct description of the XML to create. We just have to use a few simple rules...

How to Turn XPath into XML

Let's build our XPath expressions one step at a time. First up, how do we create the basic structure of the document? Each CSV line from the Google CSV file represents a keyword data record. We want to put this data into a keyword element inside a day element, inside a campaign element, all in the root element, called record.

We'll use the XPath: /report/campaign/day/keyword to define our XML output data record. This produces the skeleton of our output XML:

<report>
  <campaign>
    <day>
      <keyword></keyword>
    </day>
  </campaign>
<report>

Now you can put some data fields into this XML skeleton. Let's start with the text of the keywords. Let's put this in as an attribute on the keyword element, called text. Here's the XPath: @text. Is that all? Yes. XML Manager knows that data field XPath expressions are relative to the data record element, keyword. So @text becomes /report/campaign/day/keyword/@text (This rule doesn't just apply to attributes, you can also use it to create subelements).

<report>
  <campaign>
    <day>
      <keyword text="key1"></keyword>
      <keyword text="key1"></keyword>
      ...
    </day>
  </campaign>
<report>

We can handle most of the data fields this way. But let's look at the day element for a moment. This is the parent element of keyword, and we want to group keyword elements by day. Each day has a date, so we'll put that data field into a date attribute. How are we going to do all this? Well, XML Manager makes it really easy, just use the XPath: /report/campaign/day/@date. By defining an explicit absolute XPath you can group XML data record elements. What happens is that, each time the date changes, a new group is created. Here's what we get:

<report>
  <campaign>
    <day date="11/1/05">
      <keyword text="key1"></keyword>
    </day>
    <day date="11/2/05">
      <keyword text="key1"></keyword>
      <keyword text="key2"></keyword>
    </day>
    ...
  </campaign>
<report>

We can apply these two simple rules to define all the data fields. Here they are, in the order they appear in the CSV file:

CSV HeaderXPath
Date/report/campaign/day/@date
Campaign/report/campaign/@name
Ad Group/report/campaign/@group
Site / Keyword@text
Match Type@match
Status@status
Keyword Min CPCcpc/@min
Current Maximum CPCcpc/@max
Current Maximum CPMcpm/@max
Keyword Destination URLurl
Impressionsclicks/@imp
Clicksclicks/@count
CTRclicks/@ctr
Avg CPCclicks/@avgcpc
Avg CPMclicks/@avgcpm
Costcost
Avg Positionavgpos

Show Me the Code!

And here's the code to tie it all together:

Csv2XML.java
// Step 1. Use CSV Manager to load the CSV
CsvManager csvman  = new CsvManager();
CsvSpec    csvspec = csvman.getCsvSpec();
csvspec.setEncoding("UTF-8");
csvspec.setStartLine(8);

// Step 2. Prepare XML Manager 
XmlManager xmlman  = new XmlManager();
XmlSpec xmlspec = xmlman.getXmlSpec();
xmlspec.setHeader( "<?xml version='1.0'"
                   +" encoding='UTF-8'?>\n" );

// Step 3. Define output XML
String[] fieldpaths 
  = new String[] { 
    "/report/campaign/day/@date",
    "/report/campaign/@name",
    "/report/campaign/@group",
    "@text",
    "@match",
    "@status",
    "cpc/@min",
    "cpc/@max",
    "cpm/@max",
    "url",
    "clicks/@imp",
    "clicks/@count",
    "clicks/@ctr",
    "clicks/@avgcpc",
    "clicks/@avgcpm",
    "cost",
    "avgpos",
  };

// Step 3. cont.
RecordSpec keyspec
  = new RecordSpec("/report/campaign/day/keyword", 
                   fieldpaths );

// Step 4. Load and save the data
List csvdata = csvman.load( pCsvFile );
csvdata.remove( csvdata.size()-1 );
xmlman.save( pXmlFile, keyspec, csvdata );

And here are the steps:

  • 1. Load the CSV using a CsvManager object, and ignore any lines before line eight.
  • 2. Create an XmlManager object and set the XML header we want to use.
  • 3. Create a RecordSpec object to hold the output XPath expressions.
  • 4. Load the data, remove the last line (the totals), and save as XML.

XML elements converted to CSV lines

And you're done! Saving XML is really just the same as loading it as far as XML Manager is concerned. Both operations use XPath expressions in the same way. And loading and saving CSV with CSV Manager is even easier as the data format does not need to be specified.

What About Really Big Files?

Hmm...

So far we've been using String[] arrays to store the data we've loaded before saving it again. But what about running out of memory? If you try to load a file that's too big, you'll get a nasty OutOfMemoryError from Java. What are you going to do?

Stream, of course! If you load the data as a stream, one record at a time, and save that record right away, you can handle files of any size. Gigabytes, even. XML Manager and CSV Manager both provide streaming functionality for exactly this type of situation. Let's take a look at the XML-to-CSV code and convert it into a streaming solution. Then you can convert files of any size!

How Streaming Works

Listener Flow

Streaming works by loading or saving just one data record at time. You get hold of the data stream by implementing Listeners. These are interfaces designed to handle the data stream via method callbacks, which basically means that XML Manager or CSV Manager just keeps calling a method that you have written.

For example, let's say we want to load a CSV file as a stream. You need to implement the LineListener interface and pass your object to CSV Manager. CSV Manager then calls the addLine method each time a data record is loaded. In the addLine method you can handle the data as you see fit &mdash save it to a file or insert it into a database.

OK. They only way to really see how this works is to code it up. Here we go...

A Streaming XML to CSV Converter

This code shows you how to do the conversion. We'll use the eBay example, with the same data as before. We're leaving out the XPath code, as that stays the same. Instead, let's focus on the StreamingRecordListener class. This is where all the magic happens. First, the code:

StreamingXml2Csv.java
public class StreamingXml2Csv {
  public static void convert( File pXmlFile, File pCsvFile ) {
    // Steps 2,3. as before, Step 1. occurs inside the Listener
    
    // Step 4. Create a Listener to do the hard work
    StreamingRecordListener streamer = new StreamingRecordListener( pCsvFile ); 

    // Step 5. Use the Listener to handle the data stream
    xmlman.load( pXmlFile, itemspec, streamer );
  }

  // This is our Listener class. We use an internal CSV Manager.
  public static class StreamingRecordListener extends RecordListenerSupport {
    private CsvManager       iCsvManager = null;
    private File             iCsvFile    = null;
    private FileOutputStream iCsvOS      = null;

    public StreamingRecordListener( File pCsvFile ) {
      iCsvFile = pCsvFile;
    }

    protected void startProcessImpl() throws Exception {
      iCsvManager = new CsvManager();
      CsvSpec csvspec = iCsvManager.getCsvSpec();
      csvspec.setEncoding( "UTF-8" );      
      csvspec.setCloseOutputStream( false );      
      csvspec.setFlushEachLine( true );      

      iCsvOS = new FileOutputStream( iCsvFile, true );

      ArrayList headers = new ArrayList();
      headers.add( new String[] {"ItemID","StartTime","EndTime","ViewItemURL","BidCount",
                                 "Currency","Price","Site","Title","Country"} );
      iCsvManager.save( iCsvOS, headers );
    }

    protected BadRecord handleRecordImpl( String[] pRecord, long pRecordNumber ) throws Exception {
      ArrayList row = new ArrayList();
      row.add( pRecord );
      iCsvManager.save( iCsvOS, row );
      return null;
    }

    protected void endProcessImpl() throws Exception {
      iCsvOS.close();
    }
  } 
}

StreamingRecordListener extends RecordListenerSupport, and implements the RecordListener interface.

Here's the key point: put the CSV Manager that saves the data inside the Listener. Let's look at what happens to a single data record to understand the code above. We'll follow the path of this data record through the code:

  • 1. The data record is extracted from the XML file by XML Manager.
  • 2. The data record is handed to the handleRecordImpl method of our Listener.
  • 3. We save the data record to an open OutputStream as CSV using CSV Manager

In order to get this to work, we have to setup the RecordListener so that it can save to a CSV file. We use the startProcessImpl and endProcessImpl to do this. These methods are called by XML Manager are the start and the end of the XML loading process. We can use them to do the initialisation and cleanup work.

At the start we create a CSV Manager. We also create a FileOutputStream and set it to append output. That way we can keep appending new lines of CSV as we get data from the XML. And finally we output a header to the new CSV file.

At the end, all we have to do is close that open FileOutputStream, and we're done. There is one small catch. To get this to work you need to change some settings in CSV Manager. As you can see above, we make sure that CSV Manager does not close our open FileOutputStream (normally CSV Manager closes streams for you), by calling the CsvSpec.setCloseOutputStream method. We also make sure that data is flushed out as soon as possible, so that it is written to the CSV file as soon as we have it.

...and The Other Way, Streaming CSV to XML

Converting CSV into XML as a stream is done the same way. We put an XML Manager inside a LineListener and away we go. Here's the code:

StreamingCsv2Xml.java

public class StreamingCsv2Xml {
  public static void convert( File pCsvFile, File pXmlFile ) {
    // Steps 1,3. as before, Step 2. occurs inside the Listener

    // Step 4. Create a Listener to do the hard work
    StreamingLineListener streamer = new StreamingLineListener( pXmlFile, keyspec );
    
    // Step 5. Use the Listener to handle the data stream
    csvman.load( pCsvFile, streamer );
  }

  // This is our Listener class, using an internal XML Manager
  public static class StreamingLineListener extends LineListenerSupport {
    private XmlManager iXmlManager = null;
    private RecordSpec iRecordSpec = null;
    private File iXmlFile = null;
    private FileOutputStream iXmlOS = null;

    public StreamingLineListener( File pXmlFile, RecordSpec pRecordSpec ) {
      iXmlFile = pXmlFile;
      iRecordSpec = pRecordSpec;
    }

    public void startLoadImpl() throws Exception {
      iXmlManager = new XmlManager( iRecordSpec );
      XmlSpec xmlspec = iXmlManager.getXmlSpec();
      xmlspec.setHeader( "<?xml version='1.0' encoding='UTF-8'?>\n" );
      xmlspec.setStreamOutput( true );
      iXmlOS = new FileOutputStream( iXmlFile, true );
    }

    public BadLine addLineImpl( String[] pLine, int pNumFields, 
                                long pLineNumber, String pOriginalLine ) throws Exception {
      if( 0 < pLine.length && !"Totals and Overall Averages:".equals(pLine[0]) ) {
        ArrayList line = new ArrayList();
        line.add( pLine );
        iXmlManager.save( iXmlOS, line );
      }
      return null;
    }

    public void endLoadImpl() throws Exception {
      iXmlManager.finishSave();
      iXmlOS.close();
    }
  }
}

StreamingLineListener extends LineListenerSupport, and implements the LineListener interface.

And again, the trick is to put the XML Manager that saves the data as XML inside the Listener. Here's the data record path:

  • 1. The data record is extracted from the CSV file by CSV Manager.
  • 2. The data record is handed to the addLineImpl method of our Listener.
  • 3. We save the data record to an open OutputStream as XML using XML Manager

To setup the LineListener we implement the startLoadImpl and endLoadImpl methods. These methods are called by CSV Manager are the start and the end of the CSV loading process. No surprise there!

Before loading we create an XML Manager and a FileOutputStream for the XML. As with the previous example, we keep the stream open for appending. XML Manager will append blocks of XML that correspond to our data records.

When we're done, we need to do a little bit of work to finish off the XML file. We need to call the finishSave method. This outputs the final closing XML tags to the file, so that you will get a valid XML document. It uses the most recent OutputStream, as passed to the save method. Finally, we close the stream when we're done.

There is an additional XML Manager setting that is needed for streaming. We need to tell XML Manager that streaming is what we want to do. So call the XmlSpec.setStreamOutput method to do this.

Sorted!

Sorted!

So that's it. You can now convert XML to CSV, and CSV to XML, in one go, or as a continuous stream. No need to muck about trying to parse the XML document elements or output valid XML. And no need to parse the rather quaint CSV format and handle all of it's gotchas. All you have to deal with is String[] arrays. Nifty!

Where do you go from here? Well, if you want to try this out on your own data, without writing any code, you can use the XML Manager online demo. It allows you to test conversions in both directions &mdash XML input and XML output.

For a quick introduction to the components used for this article, try the CSV Manager Getting Started Guide, and the XML Manager Getting Started Guide.

The Source Code

The source code to fully implement all the functionality described in this article is made freely available under a BSD-style license. This means that you can freely use it for commercial or private use, without any restrictions.

All the required classes are listed opposite, linked to a downloadable text file of the source code.

Comments and Suggestions

Contact Us

Thanks for taking the time to read this article. We hope you've found something of use here. We'd love to hear what you think, and we'd definitely like to hear about anything that we got wrong, so please send us your comments and suggestions.

Finally, if you'd like to get an email whenever we publish a new article, sign up here:

Name:

Email:



Best Regards,
Richard Rodger
Ricebridge

PS. If you liked this article, please bookmark it. Thanks!
Bookmark How to convert CSV into XML and XML into CSV using the Ricebridge CSV Manager and XML Manager Java Components at del.icio.us Digg How to convert CSV into XML and XML into CSV using the Ricebridge CSV Manager and XML Manager Java Components at Digg.com Bookmark How to convert CSV into XML and XML into CSV using the Ricebridge CSV Manager and XML Manager Java Components at reddit.com Bookmark How to convert CSV into XML and XML into CSV using the Ricebridge CSV Manager and XML Manager Java Components at YahooMyWeb Bookmark How to convert CSV into XML and XML into CSV using the Ricebridge CSV Manager and XML Manager Java Components at Spurl.net Bookmark How to convert CSV into XML and XML into CSV using the Ricebridge CSV Manager and XML Manager Java Components at Simpy.com Bookmark Polyphasic Mutants at NewsVine Blink this How to convert CSV into XML and XML into CSV using the Ricebridge CSV Manager and XML Manager Java Components at blinklist.com Bookmark How to convert CSV into XML and XML into CSV using the Ricebridge CSV Manager and XML Manager Java Components at Furl.net Fark How to convert CSV into XML and XML into CSV using the Ricebridge CSV Manager and XML Manager Java Components at Fark.com




comment on this page Home | Search | About Us | Contact Us | Our Products | Documentation | Resources | Login
Copyright © 2004-2009 Ricebridge. All Rights Reserved.