Ricebridge
Search This Site
May 16 2008 10:10 UTC

Got a question for us?
Just Ask!


$15 Gift Certificate for every bug you find.

  • Visa MasterCard
  • Visa Delta Laser
  • WorldPay

Bookmark Competition Winners - Ricebridge Java Component - XML Manager at del.icio.us Digg Competition Winners - Ricebridge Java Component - XML Manager at Digg.com Bookmark Competition Winners - Ricebridge Java Component - XML Manager at reddit.com Bookmark Competition Winners - Ricebridge Java Component - XML Manager at YahooMyWeb Bookmark Competition Winners - Ricebridge Java Component - XML Manager at Spurl.net Bookmark Competition Winners - Ricebridge Java Component - XML Manager at Simpy.com Bookmark Polyphasic Mutants at NewsVine Blink this Competition Winners - Ricebridge Java Component - XML Manager at blinklist.com Bookmark Competition Winners - Ricebridge Java Component - XML Manager at Furl.net Fark Competition Winners - Ricebridge Java Component - XML Manager at Fark.com

Previous Competition Winners

Back to the main competition page.

Competition #3, June 26th, 2006

XML Manager Winner:
Jose Ignacio Santa Cruz G. and Gustavo Valdés Aracena

The data to extract:

http://www.example.com/firstFirst
http://www.example.com/ipsumipsum
http://www.example.com/ametamet
http://www.example.com/sempersemper
http://www.example.com/secondSecond
http://www.example.com/nuncnunc

...from this XML:

<html>
<body>
<div>
<h1>First Paragraph</h1>
<a href="http://www.example.com/first">First</a>
<p>Lorem <a href="http://www.example.com/ipsum">ipsum</a> dolor sit amet, 
consectetuer adipiscing elit. Etiam sit 
<a href="http://www.example.com/amet">amet</a> diam vestibulum diam posuere suscipit. 
Nam convallis.</p>
<p>Phasellus <a href="http://www.example.com/semper">semper</a>, nisl ac malesuada sodales, 
libero sapien vestibulum libero, quis cursus enim ligula sit amet justo.</p>
</div>
<div>
<h1>Second Paragraph</h1>
<a href="http://www.example.com/second">Second</a>
<p>Donec iaculis, enim sit amet lobortis dapibus, 
felis <a href="http://www.example.com/nunc">nunc</a> pulvinar nibh, 
id varius eros ante et nunc.</p>
</div>
</body>
</html>

And the winning solutions were:

RECORD:  //a
FIELD 1: @href
FIELD 2: .

Comments
This one was from Jose Ignacio Santa Cruz G. This is the most efficient solution. The principle idea is that you can ignore almost all the structure and just go directly for the a elements. The //element syntax is extremely useful for this type of problem. Add this to XML Manager's ability to stream the XML, and you have a very powerful and fast data extraction engine.

RECORD:  /html/body/div[*]//a
FIELD 1: @href
FIELD 2: .

This one was from Gustavo Valdés Aracena. In this solution, the record XPath is used to specify the exact subset of a elements to extract. This solution is useful when you need to exclude elements that are not relevant. No a elements outside of a div will be returned.

One other minor point: the predicate on the div element is not strictly necessary. The record XPath /html/body/div//a will work just as well.

[top]

Competition #2, March 20th, 2006

XML Manager Winner:
Roman Nowak

The challenge was to extract this data...

IsaacNewton051 87063511 Apple Avenue.
Lincolnshire.
MichealFaraday051 87063522 Spark Street.
Surrey.
AlbertEinstein051 87063533 Relative Road.
Princeton.

...from this XML:

<addressbook>

  <person firstname="Isaac" lastname="Newton">
    <phone>051 8706351</phone>
    <address>
1 Apple Avenue.
Lincolnshire.
    </address>
  </person>

  <person firstname="Micheal" lastname="Faraday">
    <phone>051 8706352</phone>
    <address>
2 Spark Street.
Surrey.
    </address>
  </person>

  <person firstname="Albert" lastname="Einstein">
    <phone>051 8706353</phone>
    <address>
3 Relative Road.
Princeton.
    </address>
  </person>

</addressbook>

And the winning solution was:

RECORD:  /addressbook/person
FIELD 1: @firstname
FIELD 2: @lastname
FIELD 3: phone
FIELD 4: address

Comments
This challenge demonstrates a very simple use case for XML Manager — the direct extract of data fields from simple XML records. Some of the data is in attributes and some of data is in element text.

There is one additional refinement to this solution. In a field path, the name of an element returns all the text inside that element. In this case, address, say, was used to return the address text. However, it returnds all the text between the start and end address tags. This includes all the newlines and empty space. It would be nice to get rid of this, and XML Manager provides a way to this: rb:trim(address). This performs a java.lang.String.trim() operation on element text, removing all the surrounding whitespace. This function is not standard XPath, but is provided by XML Manager to make things a bit easier.

[top]

Competition #1, March 6th, 2006

XML Manager Winner:
Andre Luepke

The challenge was to extract this data...

BugsBunny
HomerSimpson
FredFlintstone

...from this XML:

<table>
  <tr>
    <td>Bugs</td><td>Bunny</td>
  </tr>
  <tr>
    <td>Homer</td><td>Simpson</td>
  </tr>
  <tr>
    <td>Fred</td><td>Flintstone</td>
  </tr>
</table>

And the winning solution was:

RECORD: /table/tr
FIELD 1: td[1]/text()
FIELD 2: td[2]/text()

Comments
This solution shows how you can use the position of an element to get at the right data. The XML document uses the elements of an HTML table, so td elements are repeated. To get at both of them, you need to use the XPath position predicate: [x]. In XPath positions start at one, not zero, so you watch out for that as well.

The winning entry uses the text() function to get the text of the td elements. In fact, you don't actually need to use this function in this case, the expression td[1], for example, will work just as well.

[top]



































comment on this page Home | Search | About Us | Contact Us | Our Products | Documentation | Resources | Login
Copyright © 2004-2008 Ricebridge. All Rights Reserved.