Peter
Komisar © Conestoga
College v.1.2 / 2005
references:
'Developing Java Web Services, R. Nagappan et.al.,
Wiley Press
'The J2EE
Tutorial',http://java.sun.com/j2ee/1.4/docs/tutorial/doc/index.html
JDK
1.5 Documentation, http://java.sun.com/j2se/1.5.0/docs/api/index.html
Overview
JAXP
puts under one umbrella three standards, SAX, DOM and XSLT.
Compliant parsers will conform to a standard set of
interfaces, and thus
provide a standard of functionality. Whether
processing XML as an event
stream, a document object in memory or
transforming an XML document
into another document, the approach
is largely similar thanks to the JAXP
API. The uniformity supplied by JAXP also makes it easy to use these
different APIs together.
JAXP includes a new
rendering of SAX, an event-driven processing model
that uses the
classic 'listener' pattern to different elements that are
encountered in a document as it is streamed through a parser. JAXP also
supplies the DOM API that allows loading a document object into
memory
in the form of a tree data structure which can than be
traversed to obtain data
selectively. Finally JAXP includes
transformer parsers that facilitate using
XSLT or XML Style Sheet
Language Transforms to transform source XML
files into entirely new
documents.
'Pluggable' Architecture
Application <-->
JAXP Plugaggle Interface <-- > { list of compliant parsers
}
JAXP is based on a 'pluggable' architecture
which enables different parser
implementations to be exchanged
based on some extra parameter information
that is explicitly
supplied. This is the Factory design pattern where several
classes
objects supply their own specific implementations of a
particular interface that a
factory object supplies based on some
invocation criteria, for instance, a parameter
to a method or a
property supplied from file or the command line. In the event
that
no explicit parameter is supplied, the JAXP 'Factory' classes
will supply a default
implementation.
The following table
shows the different ways in which a specific parser can
be
supplied by a factory.
Options for Specifying a Particular Factory
Option |
Description |
Example |
Default |
The Java system supplies a
default |
// no explicit actions are taken to specify a factory |
System Property |
the particular factory is
specified at |
// java
-Djavax.xml.parsers.DocumentBuilderFactory= |
Property |
The name of the factory can be |
// $JAVA_HOME/jre/lib/jaxp.properties |
JAR Service |
A Factory class may be supplied
|
// META-INF/services/javax.xml.parsers.SAXParserFactory |
How SAX Works
SAX is described as
an 'event-based' model as opposed to a 'tree-based'
model such as
is described in the DOM, (the Document Object Model). This
means
the XML parser reads and parses the XML document as a stream in
real
time. During the process, as the parser hits each unique item in the
document, an internal notification or event is created that
triggers a call on
a corresponding method defined in a content
handler. It is in these methods
that different sorts of
processing can be done, for instance extracting element
data and
collecting it to send to a database.
The approach used with
the DOM API is different. In DOM the whole hierarchal
tree is
created in memory before parsing begins. This makes SAX more
efficient
to use under circumstances where documents are very
large and resources are
limited.
SAXParserFactory
supplies a SAXParser that parses an XML document.
Different
listeners, ( ContentHandler, ErrorHandler, DTD Handler &
Entity
Resolver)
supply responses to document events that occur as the XML
document
is streamed through the parser.
SAX
Processing Model
SAXParserFactory
__ returns parser object
__
|
XML file ---
to Input Stream-->
SAXParser (parses) -->
Stream
events trigger Handler responses
-> generates Output
Specifics
include, calling the parse( ) method on the parser and
supplying 'callback'
methods for the Content, Error and DTD
handler classes as well as the Entity Resolver.
Package
Description
org.xml.sax
- SAX interfaces
org.xml.sax.ext
- extensions for specialized processing
org.xml.sax.helpers
- supply default helper classes
javax.xml.parsers
- contains SAXParserFactory, SAXParser
& Exception classes
Again we see a Factory
is used to supply a processor, this time a DocumentBuilder.
The
DocumentBuilder often will make private use of an underlying SAX parser
to
parse( )a document. The stream generated by the parse( ) function
is captured in
in memory in the form of a DOM object. This tree data
structure can then be queried
for pertinent information.
DOM
Processing Model
DocumentBuilderFactory
______
|
XML Data
----------DocumentBuilder ---->
Document Object in Memory ---query
---> Output
Specifics
of using the DOM API include using a DocumentBuilderFactory to obtain
a DocumentBuilder object whose parse methods produce a Document
object.
DocumentBuilder's newDocument( ) method can also be used to produce an
empty Document object that is an implementation of the
org.w3c.dom.Document
interface.
Package
Description
org.xml.dom
- DOM interfaces
javax.xml.parsers -
contains DocumentBuilderFactory,
DocumentBuilder and Exception classes
XSLT,
The Extensible Stylesheet Language Transformations
XSLT
is an XML API that allows one XML document to be 'screened' by a set
of
templates in order to produce an entirely different document.
The starting XML
document is called the 'source' document
and the document created after
transformation is called the
'result'. Again XSLT builds on other APIs of JAXP.
XSLT
Processing Model
TransformerFactory _
returns transformer
_
|
XML file --
to Source
Object-->
Transformer (transforms) -- templates
are applied
-> generates Result Object
Specifics include the
use of a TransformerFactory to instantiate and return a Transformer
object. The Transformer object is a custom creation that relates
to the particular XSLT
stylesheet it is modeled on. The source
object, a Java object version of the XML source
document is
transformed into the resultant XML object which may be rendered as a
new
XML document.
XSLT
Packages
Package
Description
org.xml.tranform
- Transformer classes
org.xml.transform.dom
- to create source & result objects from a DOM object
org.xml.transform.sax
- to create source & result objects from SAX parser(s)
& event handler
javax.xml.transform.stream
- to create source & result objects from an I/O stream
SAX Mechanics
Really there are two
important phases in using SAX. The plumbing has to be
handled
where a SAXFactory is obtained from which a SAXParser object
is
returned. This
object then is used to parse our XML documents. The other part
of
the formula is to decide what you will do with the XML information
during the
parsing
process. This is achieved by overriding
the methods of the Handler
class.
Rather than implementing the Handler interface(s), the stock
approach
taken in
JAXP is to extend the DefaultHandler class. This has the advantage
of allowing a
developer to override only methods of interest.
First of
all we need classes. The following imports have everything needed to
use the SAX API.
SAX.
Example of Typical SAX Imports
import
org.xml.sax.*;
import org.xml.sax.ext.*;
import
org.xml.sax.helpers.*;
import javax.xml.parsers.*;
import
java.io.*;
The next two lines show the code to get a
SAXFactory and obtain a SAXParser
from it.
Example
SAXParserFactory factory = SAXParserFactory.newInstance( ); SAXParser parser = factory.newSAXParser( );<>
Extra Exception handling code will need to be added as the following detailed look at the newSAXParser( ) method indicates.
Example
public abstract SAXParser newSAXParser( ) throws ParserConfigurationException, SAXException
|
A detailed
description of one the parse method is shown next and shows that
two
Exception object must be handled when using this method.
Exception
handling
code will also need to be added to our code.
Detailed
Sample of the parse( ) method
public void parse (
File f, DefaultHandler dh) throws SAXException, IOException
Let
us consider the arguments to the parse( ) method. One is the
handler. For
the moment let us assume the DefaultHandler has been
coded and only needs
to be instantiated. In this case an instantiation like the following
would supply
us with a reference to a content handler.
Example
DefaultHandler handler = new MyHandler( );
The other argument
is our XML source object. We can use Java's File constructor
to
create a handle to our XML document.
Example
new File( "XMLMessage.xml");
Now we have all the
pieces to the the plumbing of the parsing process which we
can
place them inside a main( ) method adding exception handling code. Note
we will allow our XML file to be
specified at the command line as an argument
following the name of
the parser program. This entails adding code to pass the
file in
as an argument to the main method and also to report to the user
that an
input file is required. Following is a complete main(
) method example.
The main( ) method of a SAX
Runner
Program
public static void main(String[ ]
args)
{
// code if the
user fails to supply an input file
if
(args.length != 1) {
System.err.println("Usage:
An xml file name is required as an argument" );
System.exit(1);
}
//
the handler instantiation
DefaultHandler
handler = new MyHandler( );
//
the default parser
SAXParserFactory
factory = SAXParserFactory.newInstance( );
try
{ // parse input
SAXParser parser = factory.newSAXParser( );
parser.parse( new File(args[0]), handler );
}
catch (Exception e){
e.printStackTrace( );
}
//
catching SAXException, IOException and
ParserConfigurationException
}
All
the Exceptions have been replaced by a single generic Exception
catch.
There would be circumstances where catching each of
SAXException,
IOException and ParserConfigurationException would
be more suitable,
for instance, to supply different courses of action depending on the
type
of error that was generated.
(In one text, an interesting technique, was used where where an
IOException
was caught in a catch claus and rethrown as a
SAXException with a message
associated stating that an IOException was thrown. )
Setting
Features & Properties
In
addition to the default parser settings, The SAXParserFactory
object
supplies important methods for setting properties and
features. The most
common of these are methods to set the
validating feature and namespace
support.
Example
factory.setNameSpaceAware(true);
factory.setValidating(true);
Other useful methods are ones to
set a schema and a more generic
method that sets a
feature.
Example factory.setSchema(
"ASchema.xsd");
factory.setFeature("A_feature_represented_by_a_URI", true);
ContentHandler
The other key player in
the SAX parsing model is the DefaultHandler which
is a stubbed
implementation of the ContentHandler interface along with
the
subordinate interfaces, DTDHandler, EntityHandler &
ErrorHandler.
We can focus on the
methods of the ContentHandler interface which are called
in
response to parsing events are generated during the parsing of the
XML
document. This popular approach where a stubbed implementation of a set
of interfaces is provided is easy to use as the only
requirement is that we
override those interface methods we are interested in.
Practically
speaking this involves being careful not to tampe with the given
method signatures
of the interface(s), restricting our changes and code additions
to the braced areas of the
overridden methods.
// caution: coincidence of names: java.net.ContentHandler class & org.xml.sax.ContentHandler interface
Following a
class them provides stub implementations for all the methods listed
in the ContentHandler interface. The DefaultHandler supplies
stubs for all four
SAX related interfaces.
Example
of a Stub Implementation of the ContentHandler Interface
// all the methods throw SAXException except setDocumentLocator
class
AHandler implements ContentHandler {
public void
setDocumentLocator(Locator locator) { }
// sets
Locator for Document
public
void startDocument() throws SAXException { }
// called at document start
public void endDocument()
throws SAXException { }
//
called at end of parse
public void
processingInstruction(String target, String data){ }
// called when a 'PI' is
encountered
public void startPrefixMapping(String
prefix, String uri) { }
// start of an XML Namespace prefix
mapping
public
void startElement(String namespaceURI, String localName,
String
rawName, Attributes atts)
// reports
element namespace
URI, local name part and attributes
public void
endElement(String namespaceURI, String localName,
String
rawName)
// marks end of element
public void characters(char[]
ch, int start, int end)
// reports character data inside an element
public
void ignorableWhitespace(char[] ch, int start, int end)
// reports whitespace that can be ignored in the originating
document
public void skippedEntity(String name)
// reports skipped entity
}
The following
application shows DefaultHandler extended. It also shows
examples of
how the Locator object may be used. Notice the Locator is
declared in
class scope, instantiated in the constructor and then referenced
in the overridden methods of the DefaultHandler.
Example
public void
startElement(String uri,String localName, String qName, Attributes
attributes){
System.out.println("\nLine Number: " +
locator.getLineNumber() + "\n______");
System.out.println("URI: " + uri);
System.out.println("Qualified name: " + qName);
System.out.println("Local name: " +
localName);
int a = attributes.getLength();
System.out.println("Number of attributes: " + a);
AttributesImpl ai= new AttributesImpl(attributes);
// instantiating AttributesImpl is redundant but I wanted to show
it
System.out.println("Value of attribute at second index: " +
ai.getValue(2));
}
public void characters(char[] ch,int start,int
length){
System.out.println("Text: " + new String(ch,
start,length).trim());
}
public
void setDocumentLocator(Locator locator){
this.locator=locator;
System.out.println("Initial line number setting: " +
locator.getLineNumber());
}
public void endDocument(){
System.out.println("__________\n\n End of
Document\n_____________");
}
}
Now we can supply a complete example. Note that the above
Handler class
is instantiated inside the following code.
Complete SAX Example
import
org.xml.sax.*;
import org.xml.sax.ext.*;
import org.xml.sax.helpers.*;
import javax.xml.parsers.*;
import java.io.*;
public class SAXRunner{
public static void main(String[ ] args)
{
//
code if the user fails to supply an input file
if (args.length != 1) {
System.err.println("Usage: An xml file name is
required as an argument" );
System.exit(1);
}
//
the handler instantiation
DefaultHandler handler = new MyHandler( );
// the default parser non-validating
SAXParserFactory factory = SAXParserFactory.newInstance( );
//
investigative methods
System.out.println("Validation On: " +
factory.isValidating());
System.out.println("Namespace Aware On: " +
factory.isNamespaceAware());
factory.setValidating(true);
System.out.println("Validation On: " +
factory.isValidating());
try
{ // parse input
SAXParser parser = factory.newSAXParser(
);
parser.parse( new File(args[0]), handler
);
}
catch (Exception e){
e.printStackTrace( );
}
System.exit(0);
}
}
XML Sample To Parse
Following is a
simple XML instance that provides us some fodder for our
parsing
process. Notice the parser is 'hard-coded' as an argument
to the createXMLReader( )
method.
An XML Instance
<?xml
version="1.0"?>
<bug:FlyingObjects
xmlns:bug="www.bugs.bug">
<bug:Plane wingspan="30"
seats="2" horsepower="150"> Composite KitPlane
</bug:Plane>
<bug:Rocket
wingspan="5" seats="1" horsepower="10000">
V1 Rocket </bug:Rocket>
<bug:Mosquito wingspan="1"
seats="none" horsepower="2"> Muskoka Mosquito
</bug:Mosquito>
</bug:FlyingObjects>
To run this at the
command line you will need to use a command line something like
the
following.
> java MySAXApp UFO.xml
The output is as
follows.
Output
Initial
line number setting: 1
_________________
Document
Start
_________________
Line Number: 2
_______________
URI:
www.bugs.bug
Qualified name: bug:FlyingObjects
Local name:
FlyingObjects
Number of attributes: 0
Value of attribute at
second index: null
Line Number: 3
_______________
URI:
www.bugs.bug
Qualified name: bug:Plane
Local name: Plane
Number
of attributes: 3
Value of attribute at second index: 150
Text:
Composite KitPlane
Line Number: 4
_______________
URI:
www.bugs.bug
Qualified name: bug:Rocket
Local name:
Rocket
Number of attributes: 3
Value of attribute at second
index: 10000
Text: V1 Rocket
Line Number:
5
_______________
URI: www.bugs.bug
Qualified name:
bug:Mosquito
Local name: Mosquito
Number of attributes: 3
Value
of attribute at second index: 2
Text: Muskoka
Mosquito
_____________
End of
Document
________________
The DOM API
DOM is most suited for
processing documents like articles and books. The
J2EE tutorial suggests using 'JDOM' and 'dom4j' for applications
that are
data structure oriented. DOM was designed to be used by a
number of
languages including non-object oriented languages like
C and Perl.
Unfortunately, what
constitutes a node
in DOM is different from other models
such as XPath. In simple terms, the DOM makes more things nodes. For
instance CDATA sections are considered nodes where in XPath they are
not. The DOM model
facilitates use with mixed content models, the sort
of model
that common in long
documents.
The Mixed Content Model
in XML describes the documentary style where
text
and tags are mixed together. Consider the following example.
Mixed
Content Example
<role>
Mr. Brennar is the <key> President </key > of the
company. </role >
From
a DOM point of view this would breakdown to a tree representation
like
the
following.
Hierarchical
View of the 'role' Tag
ELEMENT:
role
|__
TEXT: Mr.Brennar is the
|__ ELEMENT: key
|
|__ TEXT: President
|__ TEXT: of the
company.
Following
are the key DOM interfaces used to represent the DOM tree model.
The
Node Inteface
The heart of the the DOM model is
the Node Interface. The Node interface is
used
to represent a single node in the document tree. It supplies a set of
methods
that
allow querying the DOM tree in a non-sequential fashion.
As
a sample of the methods that the Node inteface supplies consider the
following: The getNodeType( ) method is interesting as the DOM
API uses
the 'short' primitive type to create constants that
represent each of the
Node types.
Sample
of Method Signatures for the Node Interface //
from the JDK1.5 Documentation
String
getNodeName( ) //
the name of this node
short
getNodeType( ) //
returns constants that represent underlying object type
String
getNodeValue( ) //
returns the value of this node
The
Node interface also serves as the general parent interface for a set
of
sub-interfaces
that provide details for the various sub-components of the DOM
tree.
Each of
these sub-component types uniquely describe the particular part of
the
DOM that
they represent. For instance, only the 'Element' type will have
attributes.
Attributes
are stored in the datatype called the 'NamedNodeMap' which supplies
getting and
setting attribute names, getting attributes by index and getting the
length of the
collection via the getlength( ) method.
The
'Text' type, though implementing the 'Node' inteface cannot have
children.
Adding
children to a 'Text' Node will result in a DOMException being thrown.
The Document interface
deserves mention as it is used to represent the entire
XML or HTML document.
The
following table shows the various sub-interfaces, the name that
is associated
with
them and the node value. Note the value for attributes is
null for all types
except
the 'Element' type.
Table
of Node Sub-interfaces // from
the JDK 1.5 documentation
Interface |
nodeName |
nodeValue |
attributes |
---|---|---|---|
|
same as |
same as |
|
|
|
same as |
|
|
|
same as |
|
|
|
|
|
|
|
|
|
|
same as |
|
|
|
same as |
|
|
|
entity name |
|
|
|
name of entity referenced |
|
|
|
notation name |
|
|
|
same as |
same as |
|
|
|
same as |
|
DOM Mechanics
DOM is a little
different than SAX in that it loads the Document object into
memory. From
this vantage point the tree can be traversed to obtain pertinent
information.
Instead of the ContentHandler used in SAX, a set of methods of the
Node class are
used to query the data structure.
Here
is where we benefit from the uniform style supplied by the JAXP API.
You'll will
notice that the programming mechanics for the DOM API are very
similar to those
used in SAX.
import
javax.xml.parsers.*;
//
assuming DOM builds on SAX parser
import
org.xml.sax.*;
import
org.xml.sax.*;
import
java.io.*;
//
for File processing and IOException
import
org.w3c.dom.*; //
DOM imports
Instead
of using SAXFactory and SAXParser we use similar instantiations
of
DocumentBuilderFactory and DocumentBuilder.
DOM
Factory Code
DocumentBuilderFactory
factory =
DocumentBuilderFactory.newInstance();
.
. . .
DocumentBuilder
builder = factory.newDocumentBuilder();
Document
document = builder.parse( new File(args[0]) );
}
Again
related exceptions will need to be caught. We will use the
generic
catch for Exception.
Example
try
{
DocumentBuilder
builder = factory.newDocumentBuilder();
Document
document = builder.parse( new File(args[0]) );
}
catch (Exception e){
e.printStackTrace( );
}
Again we avoid complexity and
catch generic Exception. This
is not as general a approach as it may sound, Catching Exception
still communicates what specific kind of exception is being caught.
Example
//
inserting throw new SAXException();
in the try
clause, yields the following Exception report
at
the command line.
org.xml.sax.SAXException
at
DOMRunner.main(DOMRunner.java:26)
// or
opening on a non-existent file yields
java.io.FileNotFoundException:
/home/peter/java/../XML/Samples/NoSuchFile
(No such file or directory)
// DOM also allows
setting namespace awareness and validation
Querying
the Document Node
//
parallels providing the ContentHandler in SAX
We
are missing one major component. The above code, when it is stitched
together will
leave us with a DOM object in memory. We still need to do
something
with it. This we do with the methods of the Node interface and
it's
sub-interfaces.
A
subtle but important point is to recognize that the Document object
that the
DOM
parser has returned is in fact one of the sub-interfaces of the Node
interface.
Therefore, Node methods can be called on it.
One
approach we can take to keep the Node query code somewhat modular
is
to create a class method that takes the Document object as an
argument
and call
nodal methods on the DOM object from inside the method.
Notice
this
method exercises five of Node's methods. It is recursive so it picks
off
child nodes
and iterates them as well.
Example
static
void queryNode(Node document){
Node child =
document.getFirstChild();
do
{
System.out.println(indent +
"CHILD NAME: " + child.getNodeName( ) );
System.out.println(indent +
"NODE TYPE: " + child.getNodeType( ));
System.out.println(indent +
"VALUE: " + child.getNodeValue( ));
if(child.hasChildNodes( )){
indent
= indent + " . . . ";
queryNode(child);
indent =
"";
}
child =
child.getNextSibling( );
}
while (child
!= null );
}
We
can collect all the code in a complete DOM sample below.
DOM
Sample Code
import
javax.xml.parsers.*;
import
org.xml.sax.*; //
assuming DOM builds on SAX parser
import
org.xml.sax.*;
import
java.io.*; // for File
processing and IOException
import
org.w3c.dom.*; //
DOM imports
class
DOMRunner{
static
Document document;
static
String indent="";
public
static void main(String[ ] args)
{
// code if
the user fails to supply an input file
if (args.length != 1) {
System.err.println("Usage:
An XML file name is required as an argument" );
System.exit(1);
}
// the DOM
Factory
DocumentBuilderFactory
factory =
DocumentBuilderFactory.newInstance(
);
try {
DocumentBuilder
builder = factory.newDocumentBuilder( );
document = builder.parse(
new File(args[0]) );
// method
that calls Node methods on DOM object
queryNode(document);
}
catch (Exception e){
e.printStackTrace( );
}
}
//
method to query Document Nodes
static
void queryNode(Node document){
Node child =
document.getFirstChild();
do
{
System.out.println(indent +
"CHILD NAME: " + child.getNodeName( ) );
System.out.println(indent +
"NODE TYPE: " + child.getNodeType( ));
System.out.println(indent +
"VALUE: " + child.getNodeValue( ));
if(child.hasChildNodes()){
indent
= indent + " . . . ";
queryNode(child);
indent =
"";
}
child =
child.getNextSibling( );
}
while (child
!= null );
}
}
If
we apply this code to the XML Sample supplied earlier we get the
following output.
Output
[peter@localhost
java]$ java DOMRunner UFO.xml
CHILD NAME:
bug:FlyingObjects
NODE TYPE: 1
VALUE: null
. . . CHILD
NAME: #text
. . . NODE TYPE: 3
. . . VALUE:
.
. . CHILD NAME: bug:Plane
. . . NODE TYPE: 1
. . .
VALUE: null
. . . . . . CHILD NAME: #text
. .
. . . . NODE TYPE: 3
. . . . . . VALUE:
Composite KitPlane
CHILD NAME: #text
NODE TYPE: 3
VALUE:
CHILD
NAME: bug:Rocket
NODE TYPE: 1
VALUE: null
. . . CHILD
NAME: #text
. . . NODE TYPE: 3
. . . VALUE:
V1 Rocket
CHILD NAME: #text
NODE TYPE: 3
VALUE:
CHILD
NAME: bug:Mosquito
NODE TYPE: 1
VALUE: null
. . .
CHILD NAME: #text
. . . NODE TYPE: 3
. . . VALUE:
Muskoka Mosquito
CHILD NAME: #text
NODE TYPE: 3
VALUE:
This
is not the greatest example in that it doesn't show the real power of
DOM
which is the
ability to query out of sequence different data items on the
DOM
tree.
We will leave this aspect of the DOM as part of the assignment.
Setting Up Validation with the JAXP API // reference
// J2EE tutorial sets up constants static final String JAXP_SCHEMA_LANGUAGE = "http://java.sun.com/xml/jaxp/properties/schemaLanguage"; static final String W3C_XML_SCHEMA ="http://www.w3.org/2001/XMLSchema"; The J2EE tutorial next shows
setting the factory to validate and show DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance( ); factory.setNamespaceAware(true); factory.setValidating(true); try { factory.setAttribute(JAXP_SCHEMA_LANGUAGE, W3C_XML_SCHEMA); } catch (IllegalArgumentException x) { // Happens if the parser does not support JAXP 1.2 } There are two ways to associate an
XML schema document with an XML <documentRoot xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation='YourSchemaDefinition.xsd' > // this is the noNamespace variation The J2EE Tutorial shows the
following code to set the schema in the JAXP static final String schemaSource = "YourSchemaDefinition.xsd"; static final String JAXP_SCHEMA_SOURCE = "http://java.sun.com/xml/jaxp/properties/schemaSource"; ... DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance() ... factory.setAttribute(JAXP_SCHEMA_SOURCE, new File(schemaSource)); // there is also the attractive
setSchema( ) method whose use we will test |
XSLT, XML Stylesheet Language Transformations
XSL is an
abbreviation of 'Extensible StyleSheet Language'. XSL acts as the
umbrella for three specifications, XSLT, which stands for XSL
Transformations,
XSL-FO which is short for XSL Formatting Objects
and XPath, the XML Path
Language
which is an expression language used by XSLT to access or refer to
different parts of an XML document. The
two XSL standards, XSLT and XSL-
FO represent different metaphors
in XSL, the former is based on templates and
the latter on
formatting objects.
Hierarchy of XML Stylesheet
Technologies
XSL
|____ XSLT // XSL
Transformations transform documents based on 'rules' described in
templates
|____ XSL-FO // XSL
Formatting Objects a powerful and fine grained version of CSS
|____ XPath //
tree querying language, further relates to XLink & XPointer
Snapshot
of How XSLT Works
We don't have time to describe
XSLT in any detail. However we should understand
that XSLT works
like a CSS stylesheet in that a page is prepared that can be
applied
to an XML document and change it in some way. While in CSS
the changes are
confined to visual presentation, XSLT really has
no restriction on what can be done.
For instance, a style sheet
could be applied to a law document or a medieval poem
and supply
simple interpretation for each word, phrase or sentence in the original
work.
XSLT is an XML application and so is made up of tags.
The key tag, <template>
tag is use to house rules that are
associated elements inside an XML document using
'match' and
'select' attributes. Along with tags that return values and
XPath query
statements, filters can be created to return a
transformed value for any part or set
of parts of an XML tree
structure.
The W3Schools website supply the following example
that gives us some idea
of the form XSLT takes. First in the XML
document we see an <?xml-stylesheet ?>
processing
instruction that references the stylesheet.
W3Schools HTML Example
<?xml
version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet type="text/xsl"
href="cdcatalog.xsl"?>
<catalog>
<cd>
<title>Empire Burlesque</title>
<artist>Bob Dylan</artist>
<country>USA</country>
<company>Columbia</company>
<price>10.90</price>
<year>1985</year>
</cd>
</catalog>
In the
associated XSL StyleSheet that is referenced in the xml-stylesheet
processing instruction, we see the XSLT <template> tag
enclosed in the
characteristic <stylesheet> tag. In
this example we see other 'main players'
of XSLT such as the
'match' and 'select' attributes, the <value-of> and
the
<for-each> tags. They are are prefixed with the orthodox
'xsl' prefix which
stands in for the XSLT namespace URI. The
'match' and 'select' attributes
are assigned path phrases written
in ' XPath' which is a topic in it's own right.
XPath is a
full-powered pattern matching language that can describe
minute
details of an XML tree. The granularity of XPath
comes at a price. XPath
can easily compete with regular
expression languages like Perl, for cryptic
complexity.
W3Schools StyleSheet Example
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template
match="/"> //
XPath is used to describe paths into the XML tree
<html>
<body>
<h2>My CD Collection</h2>
<table border="1">
<tr bgcolor="#9acd32">
<th
align="left">Title</th>
<th align="left">Artist</th>
</tr>
<xsl:for-each select="catalog/cd">
<tr>
<td><xsl:value-of select="title"/></td>
<td><xsl:value-of
select="artist"/></td>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
We can't go into
XSLT in any detail, but we have described enough that we
may
confortably use a given XSLT stylesheet and use it to do
transformations inside
the context of the JAXP environment.
Again, kudos th JAXP for maintaining the same basic pattern
where a factory returns
an engine, in this case a transformer. The
XSLT model is a little different at this point
as the object that is dictating how the input object will be changed or
processed is
built into the
transformer object. In other words the Transformer the XSLT
stylesheet
configures the Transformer object that is
created to transform the input XML object
in a manner uniquely
described by that stylesheet. ( Where in SAX the
ContentHandler
supplied the actions, and in DOM calling methods on
the Document object methods
supplied output, In the XSLT API the
Transformer interprets the XSLT stylesheet and
applies it to the
input XML and outputs the transformed result document.
TransformerFactory__ returns a transformer ____
|
XML file -- to
Input Stream-->
Transformer (based on XSL sheet)
transforms
--- applies templates
-> generates Output
The
XSLT API allows inputing reading the input XML from several sources,
including from standard IO, input from a DOM tree or
input from SAX input
events. The corresponding packages are listed
below as import statements.
import
javax.xml.transform.stream.StreamSource; //
InputStream, Reader or String
import
javax.xml.transform.dom.DOMSource;
// DOM Node object
import
javax.xml.transform.sax.SAXSource;
// SAX InputSource or
Reader
Outputs can also be directed to forms
useful to different departments of the JAXP API.
import
javax.xml.transform.stream.StreamResult; //
OutputStream
import
javax.xml.transform.dom.DOMResult;
// writes a DOM Document
object
import
javax.xml.transform.sax.SAXResult;
// SAX
ContentHandler
methods
In addition
to inputs and outputs, the XSLT API has it's own exceptions,
namely
the TransformerConfigurationException and
TransformerException.
TransformerConfigurationException is thrown
by the TransformerFactory.
The TransformerException is thrown when
an error condition occurs in the
Transformer object. These
classes are carried in the javax.xml.transform
package so
come with that import.
//
JAXP also defines an ErrorListener interface with methods, warning(
), error( ) and
// fatalError( ). Each take
a Transformer exception and in turn throw one. The error listener
//
is set via ' factory.setErrorListener(new MyErrorListener( ) );
'
The
standard imports are the javax.xml.transform package to get the
Transformer.
Together the imports might take the following
form.
import javax.xml.transform.*;
import
javax.xml.transform.stream.*;
import
javax.xml.transform.dom.*;
import
javax.xml.transform.sax.*; //
SAX ContentHandler methods
The next example shows a
code excerpt from the J2EE Tutorial. Here we
see a DOM
DocumentBuilder is
returned from a factory that is already provided.
The builder parses
our XML document and
represents it in memory as a DOM
object. A new transformer is
returned by the TransformerFactory
instance.
Now the DOM document object is rendered as a DOMSource.
The result of
transformation is prepared and directed to the console.
Finally the
transform( )
method is called on the source and result. The
transformation is
streamed to
console.
Example
//
from the J2EE Tutorial
try { File f = new File(argv[0]); DocumentBuilder builder = factory.newDocumentBuilder(); document = builder.parse(f); // Use a Transformer for output TransformerFactory tFactory = TransformerFactory.newInstance(); Transformer transformer = tFactory.newTransformer(); DOMSource source = new DOMSource(document); StreamResult result = new StreamResult(System.out); transformer.transform(source, result);
Because no XSLT file has been supplied here, the result is the same as the input. The J2EE tutorial have a name for this called this an
'identity transform.'
Following is a complete example. The earlier examples have been
modified to require two files supplied at the command line, the XML
source document and the XSLT stylesheet.
A Complete DOM Code Example import javax.xml.parsers.*;
import org.w3c.dom.*; import org.xml.sax.*; import java.io.*; import javax.xml.transform.*; import javax.xml.transform.stream.*; import javax.xml.transform.dom.*;
class XSLTRunner{ static Document xmlDocument, xsltDocument; static String indent=""; public static void main(String[ ] args) { // code if the user fails to supply an input file if (args.length != 2) { System.err.println("Usage: An XML file name is required as a first argument" ); System.err.println("An XSLT stylesheet file is required as a second argument" ); System.exit(1); } // the DOM Factory DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance(); domFactory.setNamespaceAware(true); // XSLT requires namespace awareness // the Transformer Factory TransformerFactory xsltFactory = TransformerFactory.newInstance(); try{ // The XML and XSL files are loaded into memory as // Document objects then converted to DOMSource objects DocumentBuilder builder = domFactory.newDocumentBuilder( ); xmlDocument = builder.parse( new File(args[0]) ); DOMSource xmlSource= new DOMSource(xmlDocument); // input XML as a DOMSource // The result object is set up to be streamed to standard output ( console ) StreamResult result = new StreamResult(System.out); xsltDocument = builder.parse( new File(args[1]) ); DOMSource xsltSource = new DOMSource(xsltDocument); // xslt source object // a Transformer object is created configured to the XSLT Source object Transformer transformer = xsltFactory.newTransformer(xsltSource); // the transform method applies the transformation and streams result object as output transformer.transform(xmlSource,result); } catch (Exception e){ e.printStackTrace( ); } // catching SAXException, IOException and ParserConfigurationException // TransformerConfigurationException and TransformerException } }
Following is an XML document and an associated XSL stylesheet. These will
serve are the first and second arguments at the command line.
Example [peter@localhost java]$ java XSLTRunner ../XML/Samples/Z.xml ../XML/Samples/Z.xsl
// Save to Z.xml and Z.xsl respectively
An XML Sample to Transform
<?xml version="1.0" ?>
<?xml-stylesheet version="1.0" type="text/xsl" href="Z.xsl"?>
<AtoZ>
<A>Ardvark</A>
<B>Boomerang</B>
<C>Caterpillar</C>
<D>Demolition</D>
<E>Elephant</E>
<Z keyword="paper"> Paper is made from XML trees </Z>
<Z keyword="rock"> Mines are dug into rock </Z>
<Z keyword="scissors"> Scissors cut cloth </Z>
<Z keyword="water"> Water is hard on paper </Z>
<Z keyword="fire"> Water is hard on fire </Z>
</AtoZ>
A Corresponding XSL Stylesheet
<?xml version="1.0" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="AtoZ">
<html>
<head><title>AtoZ</title></head>
<body >
<xsl:for-each select="Z">
<B>Z : </B> <xsl:value-of select="."/>
<BR/>
</xsl:for-each>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
Output
The output is an HTML page with values of the XML extracted and located
inside HTML tags. This looks visually formatted in a browser.
<html>
<head>
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>AtoZ</title>
</head>
<body>
<B>Z : </B> Paper is made from XML trees <BR>
<B>Z : </B> Mines are dug into rock <BR>
<B>Z : </B> Scissors cut cloth <BR>
<B>Z : </B> Water is hard on paper <BR>
<B>Z : </B> Water is hard on fire <BR>
</body>
</html>
Reading in the Text Associated With Material Covered in this Note:
'Developing Java Web Services', Chapter 8, pages 337 - 383
Exercise
1)
Printing to console is OK for demonstration but ultimately we
will
want
to capture information we have processed in other formats.
One
useful format is the plain text document. In the following
method
taken
from the J2EE tutorial a StringBuffer is used along with it's
append( ) method is used to gather the results each time the
characters( ) method is called in the SAX Parsing process.
Method
from the J2EE Tutorial
public void characters(char
buf[], int offset, int len)
throws SAXException
{
String
s = new String(buf, offset, len);
if (textBuffer ==
null) {
textBuffer = new
StringBuffer(s);
} else
{
textBuffer.append(s);
}
}
The
echoText( ) method transfers the StringBuffer resulting
from the
series appends and converts it into a String object.
The method
takes advantage of the simple concatenation
technique where the
textBuffer is rendered in it's 'toString( )
and concatenated to
an empty String, "". The 'toString( )
method might have
been used instead.
Method from the J2EE
Tutorial
private void
echoText()
throws SAXException
{
if (textBuffer
== null) return;
String s = ""+textBuffer;
// might have used textBuffer.toString();
textBuffer = null;
}
This method definition can be
modified to return the String object
created.
We can then
do some classic IO by writing this String to file.
We can use the
following constructor form of the FileWriter class
in the java.io
package which allows us to choose whether to replace
or append to
a file.
One of FileWriter's Constructors
FileWriter(String
fileName, boolean append); //
creates a File object with name Specified
The
following write method can be used to write the String we created
in
the parsing process to the file.
One of FileWriter's Overloaded write() methods
public void write(String str, int off, int len) throws IOException
Now
we have enough
pieces to modify the SAXRunner parsing program
supplied to
capture the character( ) method events in a String and
write it to
File.
if
( employeesName.equals(child.getNodeName( ) ){
2 ) Using DOM
to extract out-of-order data clearly demonstrates one
of it's
finer qualities.
Create a simple XML document that has several
pieces of information
on an employee. Using the DOM code, create a
method that selects for
three pieces of data out of order from a
DOM tree that represents this
document. Use logic like the following to secure the information.
Printing
to console will be all that is needed to demonstrate this
query.
Example
static void
getShortEmployeeID(Node child){
System.out.println(
" Employee's Name: " + child.getNextSibling( ).getNodeValue( );
}
//
two more pieces of information, captured out of order
}
Do
a screen capture of your output to demonstrate the results.