|
|||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||||
java.lang.Objectcom.ricebridge.xmlman.RecordSpec
Specify the XPath expressions to pull record data out of your XML document.
This is the most important class in the XML Manager component after the XmlManager class.
You use this Record Specification class to specify the location of your data records, and the
data fields of those records. You do this using XPath expressions that point at the data you need.
Let's start with this XML document:
<?xml version='1.0' encoding='UTF-8'?>
<root>
<record name="r1">
<foo>f1</foo>
</record>
<record name="r2">
<foo>f2</foo>
</record>
<record name="r3">
<foo>f3</foo>
</record>
</root>
You can pull out each data record by using the XPath expression:
/root/record
This will return each record element in the order that it appears in the document. Now to get the
record data, we use another set of XPath expressions, one for each data field. These XPath expressions
are evaluated relative to the record element. For example:
@name
will extract the value of the name attribute from each record element. And
foo
will get the text content of the child foo element in each record element.
To specify all this we create a new RecordSpec object, like so:
new RecordSpec( "/root/record", new String[] {"@name","foo"} );
If we hand this record specification over to XML Manager, then we will obtain the following data:
| Name | Foo |
|---|---|
| r1 | f1 |
| r2 | f2 |
| r3 | f3 |
Here you can see that we have extracted three records from the XML document.
Record-based Parsing
XML Manager parses and loads XML documents using the standard SAX Java interface.
This interface loads the XML document one element at a time, and does not store any of the elements in memory. This means that
very large XML documents can be loaded quickly and easily. XML Manager then applies your XPath expressions to this
continuous stream of elements, picking out the ones that match.
XML Manager is thus a record-oriented XML parser. That means that it is best suited to handling documents with a regular structure, that is, with repeated data records. Although XML Manager can in fact extract any data out of any XML document, some documents are easier to deal with than others. Luckily, most of the XML documents used for data exchange, web services and configuration are very easy to work with using XML Manager.
XPath Expressions
The use of XPath expressions is what makes XML Manager so effective. Instead of writing large amounts of node traversal code,
as with the DOM interface,
you can just write a few XPath expressions and get your data out directly.
However, because XML Manager is designed for high-speed and low-memory usage, and because it uses the SAX API,
it cannot support the full functionality of XPath. XML Manager cannot see beyond the current element,
and it cannot return to previous elements. That means that only the common XPath expression types are supported, such as
normal/child/elements, descendant//elements and element[predicates].
In particular, you can't use the .. expression. This might seem like a bit of problem, but XML Manager provides
an alternative. It is not necessary for record field paths to be relative. They can also be absolute. In this case, XML Manager
provides you with the most recently seen value of the expression. In practical terms, this works the same way as the
.. path element.
Here is an example to demonstrate this method:
<?xml version='1.0' encoding='UTF-8'?>
<root>
<group code="g1">
<record name="r1">
<foo>f1</foo>
</record>
<record name="r2">
<foo>f2</foo>
</record>
</group>
<group code="g2">
<record name="r3">
<foo>f3</foo>
</record>
</group>
</root>
In order to access the group element, we can use the following XPath expressions:
//record - this selects each record@name - this selects the record name attribute
//group/@code - this will select the most recent group element
and take the value of the code attribute. It is thus the same as ../group/@code.These expressions will produce the following data records:
| Name | Group-Code |
|---|---|
| r1 | g1 |
| r2 | g1 |
| r3 | g2 |
XML Manager fully supports XPath predicates, so that you can say things like:
/root/record[@name='r1']
to extract the record where the name attribute has the value r1.
XML Manager also supports most of the XPath functions, so you can use expressions such as
concat(foo,bar)
to concatenate the text content of two elements.
Of course, some functions cannot be supported by XML Manager, due to the streaming nature if its input.
In particular, the last function is not fully supported, as XML Manager cannot tell whether another
element of the same kind follows the current one.
XML Manager also provides access to the text content of elements. The expression
fooconcatenates the child text nodes of the element
foo, and all of it's child elements.
Note that all XML elements are dropped and only the text content is returned.
The expression foo/text()concatenates the child text nodes of the element foo, but does not include the text content of child elements. The expression
foo/text()[n]returns the contents of the nth child text node. XML Manager can also return the actual XML text of any element, see the Special Functions section for more details.
Which Constructor Should I Use?
The RecordSpec object provides a number of convenience constructors. However, most of these
simply allow you to specify additional secondary RecordSpecs without having to create
a List object to put them in.
The essential parameters of every RecordSpec are the always the record XPath expression String,
and the String[] array containing the field XPath expressions. These are always the first two parameters.
Thus, the simplest way to create a RecordSpec is with the RecordSpec(String,String[]) constructor.
Sometimes you will need to provide names for the data fields. You can do this using a second String[] array.
This should be the same length as the field expressions array, and contain the name of each data field. This extra
information is required by some of the load methods in XmlManager, such as the
XmlManager.loadBeans(File,RecordSpec,BeanSpec) method. This parameter, if present, is always the third parameter.
Finally, you may wish to use secondary RecordSpecs. These secondary RecordSpecs are used to extract extra
information from the XML document at the same time as the primary data records are extracted. For more information about using them,
see the Multiple RecordSpecs section below. To add secondary RecordSpecs
to your primary RecordSpec, add them at the end of the constructor parameter list (RecordSpec(String,String[],RecordSpec)).
If there are more than three secondary RecordSpecs, then you will have to put them in a List, see the
RecordSpec(String,String[],List) method.
If you already have a set of RecordSpecs, and you want to create a primary RecordSpec from one of them,
and add the rest as secondaries, use the RecordSpec(RecordSpec,RecordSpec) methods. These use
the record specification details of the first argument to create a new primary RecordSpec.
Special Functions
XML Manager provides some additional functions to aid you in accessing the data in your XML documents.
These functions are placed in the http://www.ricebridge.com/xmlman namespace, which has the default
prefix rb (you can change this prefix). These functions are:
rb:trim( string ) - remove surrounding whitespace from a string rb:xml( path ) - extract the XML text of an elementThe rb:trim function provides exactly the same functionality as the Java String.trim
method. It removes any whitespace characters from the start and end of it's string argument.
The rb:xml function allows you to get at the actual XML of your document. In cases where you need to
process the XML afterwards, but you still want to use XML Manager to parse the XML file, this function is the one to use.
It is especially useful in cases where XML content is the data that you want. For example this occurs in the
content
element of the Atom specification.
Using our first example XML document above, the RecordSpec:
new RecordSpec("/root/record", new String[]{"rb:xml(foo)"})
For the first data record, this will return the value:
<foo>f1</foo>
You can also define your own functions. See the XmlSpec.addFunction method.
Using Multiple RecordSpecs
Sometimes your XML document will contain more than one type of data record. Many documents also contain header information.
To handle these cases, XML Manager allows you to use more than one RecordSpec at a time.
In this case, every time any RecordSpec matches a data record, the data fields are delivered back to you.
Let's look an example to show how this works:
<root>
<title name="t1" />
<other bar="b1" />
<other bar="b2" />
<record name="r1">
<foo>f1</foo>
</record>
<record name="r2">
<foo>f2</foo>
</record>
</root>
From this data we want to get:
title headerrecord elementsother elementsIt is impossible to get all this data in one pass using just a single RecordSpec. Rather, we need three:
title: new RecordSpec("root/title",new String[]{"'title'","@name"})record: new RecordSpec("root/record",new String[]{"'record'","@name","foo"})other: new RecordSpec("/root/other",new String[]{"'other'","@bar"})Notice that we have put a constant value in the first field. This serves as a marker to identify where the
data came from. This is very useful if you are accessing the data using the convenience methods such as
load(File,RecordSpec). Even if you using your own (@link RecordListener} this is probably
a good idea as you cannot rely on the order of the elements in the source XML document.
In order to use these RecordSpecs together, we use the
appropriate constructor of RecordSpec:
ArrayList secondaries = new ArrayList();
secondaries.add( new RecordSpec("root/title",new String[]{"'title'","@name"}) );
secondaries.add( new RecordSpec("/root/other",new String[]{"'other'","@bar"}) );
RecordSpec primary = new RecordSpec("root/record",new String[]{"'record'","@name","foo"}, secondaries );
This makes the record RecordSpec the primary RecordSpec,
and the other two RecordSpecs are then secondary RecordSpecs.
What's the difference between a primary and secondary RecordSpec?
The field names of the primary RecordSpec are the only field names
that are used. Field names are used by the XmlManager.loadBeans(File,RecordSpec,BeanSpec) method, for example, to
identify the property methods of the Java Beans to be loaded. Any field names defined in the
secondary RecordSpecs are ignored. Normally you don't have to worry about this, especially if you
are just using the load and
loadAsLists methods. As a rule of thumb, always access the main data record
using the primary RecordSpec.
Let's look at the data that is returned by the three RecordSpecs:
| First Element | Second Element | Third Element |
|---|---|---|
| title | t1 | |
| other | b1 | |
| other | b2 | |
| record | r1 | f2 |
| record | r1 | f2 |
You can see how the undefined data fields of the secondary RecordSpecs are simply returned as empty strings.
When you process this list of String[] arrays, you can use the first element of each array to identify the
type of data: title, other or record.
Limitations
XML Manager is designed for speed and stable memory use. This means you can use it on arbitrarily large XML documents
and you will be able to process them without running into resource problems. In order to achieve this, XML Manager only makes
one pass through the XML file, using a SAX parser.
It uses the SAX events to construct an internal view of the XML document
that can then be checked against the XPath expressions you are using to extract your data.
Because only one pass is made this means that XML Manager cannot "see into the future". This means that XPath expressions that refer to elements ahead of the current element cannot be used. Also, because we want to prevent memory errors, XML Manager does not try to keep a record of all the XML it has already seen. With these limitations in mind, let's take a look at the subset of XPath that XML Manager does support:
self::node(), .root/child::foo, root/fooroot/descendant::foodescendant-or-self::node()/foo, //foonamespace::fooattribute::foo, @foofoo[1=position()], foo[1]string(foo)concat(foo,bar)starts-with('foo','f')contains('foo','fo')substring-before('foo','oo')substring-after('foo','f')substring('foo',2,1)string-length(foo)normalize-space(foo)translate('foo','f','F')boolean(foo)not(true())true()false()number(foo)floor(1.1)ceiling(1.9)round(1.5)lang('en')true): foo/text()[last()]local-name(.)namespace-uri(.)name(.)foo[single]foo[mul][tiple]foo[nes[ted]]*text()node()///[].@:::* (multiply)+-=!=<<=>>=andormoddiv* (any node)$Note: the last function always returns true, as this is the most useful default. But this does mean that
it will be true on the last element, and since XML Manager only uses the most recent value of an expression, this
is most often what you want, especially in data field expressions.
The XPath specification relies heavily on the idea of a context. XML Manager can only provide a partial context, because it is not possible to determine the context size, and the context node is not a real node in a document model. This means that XML Manager has no concept of a node set, or rather, that all node sets contain just one node, the current one. As a result, the following parts of the XPath specification are not implemented:
parent::node(), ..root/ancestor::fooancestor-or-self::foofollowing::foofollowing-sibling::foopreceding::foopreceding-sibling::foocommentprocessing-instruction() (node sets grouping)| (based on node sets)As you can see, the subset of XPath supported is much greater that the unsupported subset. When you use XML Manager you are trading a few infrequently used XPath expressions for a very fast and memory-stable parser.
XmlManager,
XmlSpec| Constructor Summary | |
RecordSpec(RecordSpec pPrimaryRecordSpec,
List pSecondaryRecordSpecs)
Convenience constructor that uses the first RecordSpec argument as the primary RecordSpec, and
places the remaining RecordSpecs in the list argument as secondary RecordSpecs into the first. |
|
RecordSpec(RecordSpec pPrimaryRecordSpec,
RecordSpec pSecondaryRecordSpec0)
Convenience constructor that uses the first RecordSpec argument as the primary RecordSpec, and
places the second RecordSpec argument as a secondary RecordSpec into the first. |
|
RecordSpec(RecordSpec pPrimaryRecordSpec,
RecordSpec pSecondaryRecordSpec0,
RecordSpec pSecondaryRecordSpec1)
Convenience constructor that uses the first RecordSpec argument as the primary RecordSpec, and
places the remaining RecordSpec arguments as secondary RecordSpecs into the first. |
|
RecordSpec(RecordSpec pPrimaryRecordSpec,
RecordSpec pSecondaryRecordSpec0,
RecordSpec pSecondaryRecordSpec1,
RecordSpec pSecondaryRecordSpec2)
Convenience constructor that uses the first RecordSpec argument as the primary RecordSpec, and
places the remaining RecordSpec arguments as secondary RecordSpecs into the first. |
|
RecordSpec(String pRecordPath,
String[] pFieldPaths)
Create a new record specification, giving the record path, and an array of field paths. |
|
RecordSpec(String pRecordPath,
String[] pFieldPaths,
List pSecondaryRecordSpecs)
Create a new record specification, and include a list of secondary record specifications. |
|
RecordSpec(String pRecordPath,
String[] pFieldPaths,
RecordSpec pRecordSpec0)
Create a new record specification, and include a secondary record specification. |
|
RecordSpec(String pRecordPath,
String[] pFieldPaths,
RecordSpec pRecordSpec0,
RecordSpec pRecordSpec1)
Create a new record specification, and include two secondary record specifications. |
|
RecordSpec(String pRecordPath,
String[] pFieldPaths,
RecordSpec pRecordSpec0,
RecordSpec pRecordSpec1,
RecordSpec pRecordSpec2)
Create a new record specification, and include three secondary record specifications. |
|
RecordSpec(String pRecordPath,
String[] pFieldPaths,
String[] pFieldNames)
Create a new record specification, and include field names for the record data fields. |
|
RecordSpec(String pRecordPath,
String[] pFieldPaths,
String[] pFieldNames,
List pSecondaryRecordSpecs)
Create a new record specification, include field names for the record data fields, and include a list of secondary RecordSpecs. |
|
RecordSpec(String pRecordPath,
String[] pFieldPaths,
String[] pFieldNames,
RecordSpec pRecordSpec0)
Create a new record specification, include field names for the record data fields, and include a secondary RecordSpec. |
|
RecordSpec(String pRecordPath,
String[] pFieldPaths,
String[] pFieldNames,
RecordSpec pRecordSpec0,
RecordSpec pRecordSpec1)
Create a new record specification, include field names for the record data fields, and include two secondary RecordSpecs. |
|
RecordSpec(String pRecordPath,
String[] pFieldPaths,
String[] pFieldNames,
RecordSpec pRecordSpec0,
RecordSpec pRecordSpec1,
RecordSpec pRecordSpec2)
Create a new record specification, include field names for the record data fields, and include three secondary RecordSpecs. |
|
| Method Summary | |
String[] |
getFieldNames()
Get the field names. |
String[] |
getFieldPaths()
Get the field XPath expressions. |
String |
getRecordPath()
Get the record XPath expression. |
List |
getSecondaryRecordSpecs()
Get the list of secondary RecordSpecs. |
String |
toString()
Get a textual description of the RecordSpec, suitable for debugging. |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Constructor Detail |
public RecordSpec(String pRecordPath,
String[] pFieldPaths)
This is the standard constructor for the RecordSpec object, and the
one you will see used most often in the documentation examples.
pRecordPath - XPath expression to extract the record elementspFieldPaths - XPath expressions to extract the record data fieldsRecordSpec(String,String[],RecordSpec)
public RecordSpec(String pRecordPath,
String[] pFieldPaths,
RecordSpec pRecordSpec0)
Secondary RecordSpecs are applied to the XML document at the
same time as the primary RecordSpec. They are useful for extracting
additional information. All RecordSpecs send their data to the same RecordListener,
so it is often helpful to include a constant data field expression (such as 'item')
as a marker for the RecordSpec that produced the data.
This is a convenience constructor. To include any number of secondary RecordSpecs,
use the RecordSpec(String,String[],List) constructor.
pRecordPath - XPath expression to extract the record elementspFieldPaths - XPath expressions to extract the record data fieldspRecordSpec0 - additional secondary RecordSpecRecordSpec(String,String[],List)
public RecordSpec(String pRecordPath,
String[] pFieldPaths,
RecordSpec pRecordSpec0,
RecordSpec pRecordSpec1)
This is a convenience constructor. To include any number of secondary RecordSpecs,
use the RecordSpec(String,String[],List) constructor.
pRecordPath - XPath expression to extract the record elementspFieldPaths - XPath expressions to extract the record data fieldspRecordSpec0 - additional secondary RecordSpecpRecordSpec1 - additional secondary RecordSpecRecordSpec(String,String[],RecordSpec),
RecordSpec(String,String[],List)
public RecordSpec(String pRecordPath,
String[] pFieldPaths,
RecordSpec pRecordSpec0,
RecordSpec pRecordSpec1,
RecordSpec pRecordSpec2)
This is a convenience constructor. To include any number of secondary RecordSpecs,
use the RecordSpec(String,String[],List) constructor.
pRecordPath - XPath expression to extract the record elementspFieldPaths - XPath expressions to extract the record data fieldspRecordSpec0 - additional secondary RecordSpecpRecordSpec1 - additional secondary RecordSpecpRecordSpec2 - additional secondary RecordSpecRecordSpec(String,String[],RecordSpec),
RecordSpec(String,String[],List)
public RecordSpec(String pRecordPath,
String[] pFieldPaths,
List pSecondaryRecordSpecs)
pRecordPath - XPath expression to extract the record elementspFieldPaths - XPath expressions to extract the record data fieldspSecondaryRecordSpecs - additional secondary RecordSpecsRecordSpec(String,String[],RecordSpec)
public RecordSpec(String pRecordPath,
String[] pFieldPaths,
String[] pFieldNames)
This constructor allows you to associate names with each of the record data field
XPath expressions. These field names are used by specific record listeners to implement
additional functionality. For example, the TableModelRecordListener uses the
field names as the table column names, and the BeanRecordListener uses them
to identify the Java Bean property methods.
pRecordPath - XPath expression to extract the record elementspFieldPaths - XPath expressions to extract the record data fieldspFieldNames - data field namesRecordSpec(String,String[])
public RecordSpec(String pRecordPath,
String[] pFieldPaths,
String[] pFieldNames,
RecordSpec pRecordSpec0)
RecordSpec.
Note that only the data field names of the primary RecordSpec are used.
The data field names in the secondary RecordSpec are ignored.
pRecordPath - XPath expression to extract the record elementspFieldPaths - XPath expressions to extract the record data fieldspFieldNames - data field namespRecordSpec0 - additional secondary RecordSpecRecordSpec(String,String[],String[]),
RecordSpec(String,String[],String[],List)
public RecordSpec(String pRecordPath,
String[] pFieldPaths,
String[] pFieldNames,
RecordSpec pRecordSpec0,
RecordSpec pRecordSpec1)
RecordSpecs.
pRecordPath - XPath expression to extract the record elementspFieldPaths - XPath expressions to extract the record data fieldspFieldNames - data field namespRecordSpec0 - additional secondary RecordSpecpRecordSpec1 - additional secondary RecordSpecRecordSpec(String,String[],String[]),
RecordSpec(String,String[],String[],RecordSpec)
public RecordSpec(String pRecordPath,
String[] pFieldPaths,
String[] pFieldNames,
RecordSpec pRecordSpec0,
RecordSpec pRecordSpec1,
RecordSpec pRecordSpec2)
RecordSpecs.
pRecordPath - XPath expression to extract the record elementspFieldPaths - XPath expressions to extract the record data fieldspFieldNames - data field namespRecordSpec0 - additional secondary RecordSpecpRecordSpec1 - additional secondary RecordSpecpRecordSpec2 - additional secondary RecordSpecRecordSpec(String,String[],String[]),
RecordSpec(String,String[],String[],RecordSpec)
public RecordSpec(RecordSpec pPrimaryRecordSpec,
RecordSpec pSecondaryRecordSpec0)
RecordSpec argument as the primary RecordSpec, and
places the second RecordSpec argument as a secondary RecordSpec into the first.
pPrimaryRecordSpec - primary RecordSpecpSecondaryRecordSpec0 - additional secondary RecordSpecRecordSpec(String,String[],RecordSpec)
public RecordSpec(RecordSpec pPrimaryRecordSpec,
RecordSpec pSecondaryRecordSpec0,
RecordSpec pSecondaryRecordSpec1)
RecordSpec argument as the primary RecordSpec, and
places the remaining RecordSpec arguments as secondary RecordSpecs into the first.
pPrimaryRecordSpec - primary RecordSpecpSecondaryRecordSpec0 - additional secondary RecordSpecpSecondaryRecordSpec1 - additional secondary RecordSpecRecordSpec(RecordSpec,RecordSpec)
public RecordSpec(RecordSpec pPrimaryRecordSpec,
RecordSpec pSecondaryRecordSpec0,
RecordSpec pSecondaryRecordSpec1,
RecordSpec pSecondaryRecordSpec2)
RecordSpec argument as the primary RecordSpec, and
places the remaining RecordSpec arguments as secondary RecordSpecs into the first.
pPrimaryRecordSpec - primary RecordSpecpSecondaryRecordSpec0 - additional secondary RecordSpecpSecondaryRecordSpec1 - additional secondary RecordSpecpSecondaryRecordSpec2 - additional secondary RecordSpecRecordSpec(RecordSpec,RecordSpec)
public RecordSpec(RecordSpec pPrimaryRecordSpec,
List pSecondaryRecordSpecs)
RecordSpec argument as the primary RecordSpec, and
places the remaining RecordSpecs in the list argument as secondary RecordSpecs into the first.
pPrimaryRecordSpec - primary RecordSpecpSecondaryRecordSpecs - additional secondary RecordSpecsRecordSpec(RecordSpec,RecordSpec)
public RecordSpec(String pRecordPath,
String[] pFieldPaths,
String[] pFieldNames,
List pSecondaryRecordSpecs)
RecordSpecs.
pRecordPath - XPath expression to extract the record elementspFieldPaths - XPath expressions to extract the record data fieldspFieldNames - data field namespSecondaryRecordSpecs - additional secondary RecordSpecsRecordSpec(String,String[],String[]),
RecordSpec(String,String[],String[],RecordSpec),
RecordSpec(String,String[],List)| Method Detail |
public String getRecordPath()
public String[] getFieldPaths()
public String[] getFieldNames()
public List getSecondaryRecordSpecs()
RecordSpecs.
public String toString()
RecordSpec, suitable for debugging.
|
|||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||||