JAXP, The Java API for XML Processing

Peter Komisar ©   Conestoga College      v.1.2 / 2005

references:     'Developing Java Web Services, R. Nagappan et.al., Wiley Press
'The J2EE Tutorial',http://java.sun.com/j2ee/1.4/docs/tutorial/doc/index.html
JDK 1.5 Documentation, http://java.sun.com/j2se/1.5.0/docs/api/index.html


Overview


JAXP puts under one umbrella three standards, SAX, DOM and XSLT.
Compliant parsers will conform to a standard set of interfaces, and thus
provide a standard of functionality. Whether processing XML as an event
stream, a document object in memory or transforming an XML document
into another document, the approach is largely similar thanks to the JAXP
API. The uniformity supplied by JAXP also makes it easy to use these
different APIs together.

JAXP includes a new rendering of SAX, an event-driven processing model
that uses the classic 'listener' pattern to different elements that are
encountered in a document as it is streamed through a parser. JAXP also
supplies the DOM API that allows loading a document object into memory
in the form of a tree data structure which can than be traversed to obtain data
selectively. Finally JAXP includes transformer parsers that facilitate using
XSLT or XML Style Sheet Language Transforms to transform source XML
files into entirely new documents. 

'Pluggable' Architecture

Application <--> JAXP Plugaggle Interface <-- > { list of compliant parsers }
 
JAXP is based on a 'pluggable'  architecture which enables different parser
implementations to be exchanged based on some extra parameter information
that is explicitly supplied. This is the Factory design pattern where several classes
objects supply their own specific implementations of a particular interface that a
factory object supplies based on some invocation criteria, for instance, a parameter
to a method or a property supplied from file or the command line.  In the event that
no explicit parameter is supplied, the JAXP 'Factory' classes will supply a default
implementation.

The following table shows the different ways in which a specific parser can
be supplied by a factory.

Options for Specifying a Particular Factory

 Option

Description

 Example

 Default

 The Java system supplies a default
 parser if none is explicitly specified

 // no explicit actions are taken to specify a factory

 System Property

 the particular factory is specified at
 the command line using the -D flag

// java -Djavax.xml.parsers.DocumentBuilderFactory=
org.apache.crimson.jaxp.DoucmentBuilderFactoryImpl
// no space between = and org.apache

Property
File

The name of the factory can be
specified in a property file

 // $JAVA_HOME/jre/lib/jaxp.properties

JAR Service
Provider

 A Factory class may be supplied
 from inside a JAR archive file

// META-INF/services/javax.xml.parsers.SAXParserFactory



How SAX Works



SAX is described as an 'event-based' model as opposed to a 'tree-based'
model such as is described in the DOM, (the Document Object Model). This
means the XML parser reads and parses the XML document as a stream in
real time. During the process, as the parser hits each unique item in the
document, an internal notification or event is created that triggers a call on
a corresponding method defined in a content handler. It is in these methods
that different sorts of processing can be done, for instance extracting element
data and collecting it to send to a database.

The approach used with the DOM API is different. In DOM the whole hierarchal
tree is created in memory before parsing begins. This makes SAX more efficient
to use under circumstances where documents are very large and resources are
limited.

SAXParserFactory  supplies a SAXParser that parses an XML document.
Different listeners, ( ContentHandler, ErrorHandler, DTD Handler & Entity
Resolver) supply responses to document events that occur as the XML
document is streamed through the parser.


SAX Processing Model

SAXParserFactory __  returns parser object
__
                                                                                    |
        XML file --- to Input Stream-->      SAXParser  (parses) -->
                                                             Stream events trigger Handler responses  ->  generates Output


Specifics include, calling the parse( )  method on the parser and supplying 'callback'
methods for the Content, Error and DTD handler classes as well as the Entity Resolver.

SAX Packages

Package               Description

org.xml.sax         - SAX interfaces
org.xml.sax.ext     - extensions for specialized processing
org.xml.sax.helpers - supply default helper classes
javax.xml.parsers   - contains SAXParserFactory, SAXParser
                      & Exception classes



The DOM API

Again we see a Factory is used to supply a processor, this time a DocumentBuilder.
The DocumentBuilder often will make private use of an underlying SAX parser to
parse(  )a document. The stream generated by the parse( ) function is captured in
in memory in the form of a DOM object. This tree data structure can then be queried
for pertinent information.

DOM Processing Model


DocumentBuilderFactory ______
                                                               |
          XML Data ----------DocumentBuilder ---->
                                                              Document Object in Memory
---query ---> Output



Specifics of using the DOM API include using a DocumentBuilderFactory to obtain
a DocumentBuilder object whose parse methods produce a Document object. 
DocumentBuilder's newDocument( ) method can also be used to produce an
empty Document object that is an implementation of the org.w3c.dom.Document
interface.


The DOM Packages

Package               Description

org.xml.dom         - DOM interfaces
javax.xml.parsers   - contains DocumentBuilderFactory,
                      DocumentBuilder and Exception classes




XSLT, The Extensible Stylesheet Language Transformations

XSLT is an XML API that allows one XML document to be 'screened' by a set of
templates in order to produce an entirely different document. The starting  XML
document is called the 'source' document and the document created after
transformation is called the 'result'. Again XSLT builds on other APIs of JAXP.



XSLT Processing Model


TransformerFactory
_ returns transformer
_
                                                       |
   XML file -- to Source Object--> Transformer (transforms) -- templates are applied ->  generates Result Object


Specifics include the use of a TransformerFactory to instantiate and return a Transformer
object. The Transformer object is a custom creation that relates to the particular XSLT
stylesheet it is modeled on. The source object, a Java object version of the XML source
document is transformed into the resultant XML object which may be rendered as a new
XML document.


XSLT Packages

Package                       Description

org.xml.tranform            - Transformer classes
org.xml.transform.dom       - to create source & result objects from a DOM object
org.xml.transform.sax       - to create source & result objects from SAX parser(s)
                              & event handler

javax.xml.transform.stream  - to create source & result objects from an I/O stream




SAX Mechanics



Really there are two important phases in using SAX. The plumbing has to be
handled where a SAXFactory is obtained from which a SAXParser object is
returned. This object then is used to parse our XML documents. The other part
of the formula is to decide what you will do with the XML information during the
parsing process. This is achieved by overriding the methods of the Handler
class. Rather than implementing the Handler interface(s), the stock approach
taken in JAXP is to extend the DefaultHandler class. This has the advantage
of allowing a developer to override only methods of interest.

First of all we need classes. The following imports have everything needed to
use the SAX API.
SAX.

Example of Typical SAX Imports

import org.xml.sax.*;
import org.xml.sax.ext.*;
import org.xml.sax.helpers.*;
import javax.xml.parsers.*;
import java.io.*;


The next two lines show the code to get a SAXFactory and obtain a SAXParser
from it.

Example

 SAXParserFactory factory = SAXParserFactory.newInstance( );
 SAXParser parser = factory.newSAXParser( ); 

Extra Exception handling code will need to be added as the following
detailed look at the newSAXParser( ) method indicates.


Example

public abstract SAXParser newSAXParser( )
throws ParserConfigurationException, SAXException
<>
Once a parser object has been obtained, one of the variations on the parse( )
method is called.  It takes an Java representation of our XML document
and an instance of our DefaultHandler.

The following table, supplied for reference shows different variations of the
parse( ) method.

 

 
The parse( ) Method 
// for reference



Different parse( ) methods are available. The overloaded versions are
duplicated for the older SAX 1 class, HandlerBase, ( which implements
the deprecated DocumentHandler, ) and the newer SAX 2 DefaultHandler.
There are versions for File, InputSource, InputStream and a String
representing a URI.



void parse(File f, DefaultHandler dh)
void parse(File f, HandlerBase hb)

void parse(InputSource is, DefaultHandler dh)
void parse (InputSource is,HandlerBase hb)

void parse (InputStream is, DefaultHandler dh)
void parse (InputStream is, HandlerBase hb)

void parse(InputStream is, DefaultHandler dh, String systemId)
void parse(InputStream is, DefaultHandler dh, String systemId)

parse(String uri, DefaultHandler)

// these versions all are public and throw IOException & SAXException


A detailed description of one the parse method is shown next and shows that
two Exception object  must be handled when using this method. Exception
handling code will also need to be added to our code.


Detailed Sample of the parse( ) method


public void parse ( File f, DefaultHandler dh) throws SAXException, IOException


Let us consider the arguments to the parse(  ) method. One is the handler. For
the moment let us assume the DefaultHandler has been coded and only needs
to be instantiated. In this case an instantiation like the following would supply
us with a reference to a content handler.


Example
  DefaultHandler handler = new MyHandler( );


The other argument is our XML source object. We can use Java's File constructor
to create a handle to our XML document.


Example
  new File( "XMLMessage.xml");


Now we have all the pieces to the the plumbing of the parsing process which we
can place them inside a main( ) method adding exception handling code. Note
we will allow our XML file to be specified at the command line as an argument
following the name of the parser program. This entails adding code to pass the
file in as an argument to the main method and also to report to the user that an
input file is required. Following is a complete main(  ) method example. 


The main( )  method of a SAX Runner Program

public static void main(String[ ] args)
{
   // code if the user fails to supply an input file
  if (args.length != 1) {
    System.err.println("Usage: An xml file name is required as an argument" );
    System.exit(1);
  }
   // the handler instantiation
   DefaultHandler handler = new MyHandler( );
   // the default parser
   SAXParserFactory factory = SAXParserFactory.newInstance( );

  try
    {  // parse input
      SAXParser parser = factory.newSAXParser( );
      parser.parse( new File(args[0]), handler );
    }
   catch (Exception e){
      e.printStackTrace( );
      }
// catching SAXException, IOException and ParserConfigurationException


}


All the Exceptions have been replaced by a single generic Exception catch.
There would be circumstances where catching each of SAXException,
IOException and ParserConfigurationException would be more suitable,
for instance, to supply different courses of action depending on the type
of error that was generated. 

(In one text, an interesting technique, was used where where an IOException
was caught in a catch claus and rethrown as a SAXException with a message
associated stating that an IOException was thrown. )


Setting Features & Properties

In addition to the default parser settings, The SAXParserFactory object
supplies important methods for setting properties and features. The most
common of these are methods to set the validating feature and namespace
support.

Example  factory.setNameSpaceAware(true);
               factory.setValidating(true);


Other useful methods are ones to set a schema and a more generic
method that sets a feature.

Example factory.setSchema( "ASchema.xsd");
              factory.setFeature("A_feature_represented_by_a_URI", true);

<>
// there are a few more details with respect to setting up a parser that

// validates which we will cover when we look at JAXB.

 
Whether such properties are set can be checked by calls to 'isXXX'
methods, such as isValidating( ) which returns a boolean value. In
the following example  whether a parser validates is checked first
in it's default condition which is with validating off and then checked
after validation has been switched on. This code can be cut and pasted
into the main method shown above.

Example

 
System.out.println("Validation On: " + factory.isValidating( ));
      factory.setValidating(true);
  System.out.println("Validation On: " + factory.isValidating( ));

The setFeature( ) method is used to set features using a URI represented

by a String following by a boolean value. The method sets features supported
by the the org.xml.sax.XMLReader implementation. A list of the core features
and properties can be found at the SAX site, 
http://www.saxproject.org/.

ContentHandler 

The other key player in the SAX parsing model is the DefaultHandler which
is a stubbed implementation of the ContentHandler interface along with the
subordinate interfaces, DTDHandler, EntityHandler & ErrorHandler.

We can focus on the methods of the ContentHandler interface which are called
in response to parsing events are generated during the parsing of the XML
document. This popular approach where a stubbed implementation of a set
of interfaces is provided is easy to use as the only requirement is that we
override those interface methods we are interested in.

Practically speaking this involves being careful not to tampe with the given
method signatures of the interface(s), restricting our changes and code additions
to the braced areas of the overridden methods.

// caution: coincidence of names: java.net.ContentHandler class & org.xml.sax.ContentHandler interface

Following a class them provides stub implementations for all the methods listed
in the ContentHandler interface. The DefaultHandler supplies stubs for all four
SAX related interfaces.


Example of a Stub Implementation of the ContentHandler Interface

// all the methods throw SAXException except setDocumentLocator

class AHandler implements ContentHandler {

public void setDocumentLocator(Locator locator) { }
       
// sets Locator for Document
public void startDocument() throws SAXException { }       
        // called at document start
public void endDocument() throws SAXException { }
       // called at end of parse
p
ublic void processingInstruction(String target, String data){ }
       // called when a 'PI' is encountered
public void startPrefixMapping(String prefix, String uri) { }

       // start of  an XML Namespace prefix mapping
public void startElement(String namespaceURI, String localName, String rawName, Attributes atts)
       // reports element namespace URI, local name part and attributes
public void endElement(String namespaceURI, String localName,
String rawName)
       // marks end of element
public void characters(char[] ch, int start, int end)

        // reports character data inside an element
public void ignorableWhitespace(char[] ch, int start, int end)
        // reports whitespace that can be ignored in the originating document
public void skippedEntity(String name)

        // reports skipped entity

   }


Example of a SAX Application that extends DefaultHandler

The following application shows DefaultHandler extended. It also shows
examples of how the Locator object may be used. Notice the Locator is
declared in class scope, instantiated in the constructor and then referenced
in the overridden methods of the DefaultHandler.


Example

import java.io.*;
import org.xml.sax.*;
import org.xml.sax.helpers.*;

class MyHandler extends DefaultHandler{
         Locator locator;
         int count;
        public MyHandler( ){
          super( );
          locator=new LocatorImpl();
          }
   
// Overriding methods DefaultHandler provides in implementing ContentHandler

 public void startDocument(){         
            System.out.println("_____________ \n\nDocument Start\n _____________");

                }

public void startElement(String uri,String localName, String qName, Attributes attributes){
            System.out.println("\nLine Number: " + locator.getLineNumber() + "\n______");
            System.out.println("URI: " + uri);
            System.out.println("Qualified name: " + qName);
            System.out.println("Local name: " + localName);          
            int a = attributes.getLength();
            System.out.println("Number of attributes: " + a);
         
            AttributesImpl ai= new AttributesImpl(attributes);
            // instantiating AttributesImpl is redundant but I wanted to show it
            System.out.println("Value of attribute at second index: " + ai.getValue(2));     
            }

public void characters(char[] ch,int start,int length){
            System.out.println("Text: " + new String(ch, start,length).trim());
            }

public void setDocumentLocator(Locator locator){
            this.locator=locator;
            System.out.println("Initial line number setting: " + locator.getLineNumber());
            } 

 public void endDocument(){
            System.out.println("__________\n\n End of Document\n_____________"); 
            }
}


Now we can supply a complete example.  Note that the above Handler class
is instantiated inside the following code.

Complete SAX Example

import org.xml.sax.*;
import org.xml.sax.ext.*;
import org.xml.sax.helpers.*;
import javax.xml.parsers.*;
import java.io.*;



public class SAXRunner{

public static void main(String[ ] args)
{
   // code if the user fails to supply an input file
  if (args.length != 1) {
    System.err.println("Usage: An xml file name is required as an argument" );
    System.exit(1);
  }
   // the handler instantiation
   DefaultHandler handler = new MyHandler( );
   // the default parser non-validating
   SAXParserFactory factory = SAXParserFactory.newInstance( );
   // investigative methods
   System.out.println("Validation On: " + factory.isValidating());
   System.out.println("Namespace Aware On: " + factory.isNamespaceAware());
   factory.setValidating(true);
    System.out.println("Validation On: " + factory.isValidating());
  try
    {  // parse input
      SAXParser parser = factory.newSAXParser( );
      parser.parse( new File(args[0]), handler );
    }
   catch (Exception e){
      e.printStackTrace( );
      }

  System.exit(0);
}

}

XML Sample To Parse

Following is a simple XML instance that provides us some fodder for our parsing
process. Notice the parser is 'hard-coded' as an argument to the createXMLReader( )
method. 

An XML Instance

<?xml version="1.0"?>
<bug:FlyingObjects xmlns:bug="www.bugs.bug">
<bug:Plane wingspan="30" seats="2" horsepower="150"> Composite KitPlane </bug:Plane>      
<bug:Rocket wingspan="5" seats="1" horsepower="10000"> V1 Rocket </bug:Rocket>
<bug:Mosquito wingspan="1" seats="none" horsepower="2"> Muskoka Mosquito </bug:Mosquito>

</bug:FlyingObjects>

To run this at the command line you will need to use a command line something like
the following.

> java    MySAXApp    UFO.xml

The output is as follows.


Output


Initial line number setting: 1
_________________

Document Start
_________________

Line Number: 2
_______________
URI: www.bugs.bug
Qualified name: bug:FlyingObjects
Local name: FlyingObjects
Number of attributes: 0
Value of attribute at second index: null


Line Number: 3
_______________
URI: www.bugs.bug
Qualified name: bug:Plane
Local name: Plane
Number of attributes: 3
Value of attribute at second index: 150
Text: Composite KitPlane


Line Number: 4
_______________
URI: www.bugs.bug
Qualified name: bug:Rocket
Local name: Rocket
Number of attributes: 3
Value of attribute at second index: 10000
Text: V1 Rocket


Line Number: 5
_______________
URI: www.bugs.bug
Qualified name: bug:Mosquito
Local name: Mosquito
Number of attributes: 3
Value of attribute at second index: 2
Text: Muskoka Mosquito

_____________

 End of Document
________________



The DOM API 



DOM is most suited for processing documents like articles and books. The
J2EE tutorial suggests using 'JDOM' and 'dom4j'  for applications that are
data structure oriented. DOM was designed to be used by a number of
languages including non-object oriented languages like C and Perl.

Documents Versus Data

Unfortunately, what constitutes a node in DOM is different from other models
such as XPath. In simple terms, the DOM makes more things nodes. For
instance CDATA sections are considered nodes where in XPath they are
not. The DOM
model facilitates use with mixed content models, the sort
of model that
common in long documents.

Mixed-Content Model

The Mixed Content Model in XML describes the documentary style where
text and tags are mixed together. Consider the following example.


Mixed Content Example


<role> Mr. Brennar is the <key> President </key > of the company. </role >


From a DOM point of view this would breakdown to a tree representation like
the following.

 
Hierarchical View of the 'role' Tag

ELEMENT: role
          |__ TEXT: Mr.Brennar is the
          |__ ELEMENT: key
          |               |__ TEXT: President
          |__ TEXT: of the company.



Following are the key DOM interfaces used to represent the DOM tree model.


The Node Inteface

The heart of the the DOM model is the Node Interface. The Node interface is

used to represent a single node in the document tree. It supplies a set of methods
that allow querying the DOM tree in a non-sequential fashion. 

As a sample of the methods that the Node inteface supplies consider the
following: The getNodeType( ) method is interesting as the DOM API uses
the 'short' primitive type to create constants that represent each of the
Node types.



Sample of Method Signatures for the Node Interface //  from the JDK1.5 Documentation

String  getNodeName( )     // the name of this node
short   getNodeType( )       // returns constants that represent underlying object type
String getNodeValue( )      // returns the value of this node


The Node interface also serves as the general parent interface for a set of
sub-interfaces that provide details for the various sub-components of the DOM
tree.

Each of these sub-component types uniquely describe the particular part of the
DOM that they represent. For instance, only the 'Element' type will have attributes.
Attributes are stored in the datatype called the 'NamedNodeMap' which supplies
getting and setting attribute names, getting attributes by index and getting the
length of the collection via the getlength( ) method.

The 'Text' type, though implementing the 'Node' inteface cannot have children.
Adding children to a 'Text' Node will result in a DOMException being thrown.

The Document interface deserves mention as it is used to represent the entire
XML or HTML document.


The following table  shows the various sub-interfaces, the name that is associated
with them and  the node value.  Note the value for attributes is null for all types
except the 'Element' type.


Table of Node Sub-interfaces // from the JDK 1.5 documentation

Interface

nodeName

nodeValue

attributes

Attr

same as Attr.name

same as Attr.value

null

CDATASection

"#cdata-section"

same as CharacterData.data,
the content of the CDATA Section

null

Comment

"#comment"

same as CharacterData.data,
the content of the comment

null

Document

"#document"

null

null

DocumentFragment

"#document-fragment"

null

null

DocumentType

same as DocumentType.name

null

null

Element

same as Element.tagName

null

NamedNodeMap

Entity

entity name

null

null

EntityReference

name of entity referenced

null

null

Notation

notation name

null

null

ProcessingInstruction

same as
ProcessingInstruction.target

same as
ProcessingInstruction.data

null

Text

"#text"

same as CharacterData.data,
the content of the text node

null




DOM Mechanics



DOM is a little different than SAX  in that it loads the Document object into
memory. From this vantage point the tree can be traversed to obtain pertinent
information. Instead of the ContentHandler used in SAX, a set of methods of the
Node class are used to query the data structure.

Here is where we benefit from the uniform style supplied by the JAXP API.
You'll will notice that the programming mechanics for the DOM API are very
similar to those used in SAX.

DOM Related Imports

The imports are similar to SAX however also include the DOM packages.

import javax.xml.parsers.*;           // assuming DOM builds on SAX parser
import org.xml.sax.*;
import org.xml.sax.*;
import java.io.*;                        // for File processing and IOException
import org.w3c.dom.*;            // DOM imports



Instead of using SAXFactory and SAXParser we use similar instantiations
of DocumentBuilderFactory and DocumentBuilder.


DOM Factory Code

   DocumentBuilderFactory factory =
   DocumentBuilderFactory.newInstance();

      . . . .
  DocumentBuilder builder = factory.newDocumentBuilder();
  Document document = builder.parse( new File(args[0]) );
   }


Again related exceptions will need to be caught. We will use the
generic catch for Exception. 


Example  

  try {
          DocumentBuilder builder = factory.newDocumentBuilder();
          Document document = builder.parse( new File(args[0]) );
          }
   catch (Exception e){
      e.printStackTrace( );
      }


Again we avoid complexity and catch generic Exception. This
is not as general a approach as it may sound, Catching Exception
still communicates what specific kind of exception is being caught.



Example  // inserting   throw new SAXException(); 
in the try clause, yields the following Exception report

at the command line.


    org.xml.sax.SAXException
        at DOMRunner.main(DOMRunner.java:26)



       // or opening on a non-existent file yields

    java.io.FileNotFoundException: 
    /home/peter/java/../XML/Samples/NoSuchFile (No such file or directory)

// DOM also allows setting namespace awareness and validation


Querying the Document Node 
 // parallels providing the ContentHandler in SAX


We are missing one major component. The above code, when it is stitched
together will leave us with a DOM object in memory. We still need to do
something with it. This we do with the methods of the Node interface and
it's sub-interfaces.

A subtle but important point is to recognize that the Document object that the
DOM parser has returned is in fact one of the sub-interfaces of the Node
interface. Therefore, Node methods can be called on it.

One approach we can take to keep the Node query code somewhat modular
is to create a class method that takes the Document object as an argument
and call nodal methods on the DOM object from inside the method.  Notice
this method exercises five of Node's methods. It is recursive so it picks off
child nodes and iterates them as well.

Example

static void queryNode(Node document){
     Node child = document.getFirstChild();

     do {
           System.out.println(indent + "CHILD NAME: " + child.getNodeName( ) );
           System.out.println(indent + "NODE TYPE: " + child.getNodeType( ));
           System.out.println(indent + "VALUE: " + child.getNodeValue( ));

           if(child.hasChildNodes( )){
            indent = indent + " . . . ";
            queryNode(child);
        indent = "";
        }
       child = child.getNextSibling( );
           }
       while (child != null );

      }


We can collect all the code in a complete DOM sample below.


DOM Sample Code


import javax.xml.parsers.*;
import org.xml.sax.*; // assuming DOM builds on SAX parser
import org.xml.sax.*;
import java.io.*; // for File processing and IOException
import org.w3c.dom.*; // DOM imports

class DOMRunner{
  static Document document;
  static String indent="";
public static void main(String[ ] args)
{
   // code if the user fails to supply an input file
  if (args.length != 1) {
    System.err.println("Usage: An XML file name is required as an argument" );
    System.exit(1);
  }
   // the DOM Factory
  DocumentBuilderFactory factory =
    DocumentBuilderFactory.newInstance( );

  try {
        DocumentBuilder builder = factory.newDocumentBuilder( );
       document = builder.parse( new File(args[0]) );
       // method that calls Node methods on DOM object
       queryNode(document);
      }

  catch (Exception e){
      e.printStackTrace( );
      }
}

// method to query Document Nodes

static void queryNode(Node document){
     Node child = document.getFirstChild();

     do {
           System.out.println(indent + "CHILD NAME: " + child.getNodeName( ) );
           System.out.println(indent + "NODE TYPE: " + child.getNodeType( ));
           System.out.println(indent + "VALUE: " + child.getNodeValue( ));

           if(child.hasChildNodes()){
            indent = indent + " . . . ";
            queryNode(child);
        indent = "";
        }
       child = child.getNextSibling( );
           }
       while (child != null );

      }
}


If we apply this code to the XML Sample supplied earlier we get the following output.

Output


[peter@localhost java]$ java  DOMRunner UFO.xml
CHILD NAME: bug:FlyingObjects
NODE TYPE: 1
VALUE: null
 . . . CHILD NAME: #text
 . . . NODE TYPE: 3
 . . . VALUE:

 . . . CHILD NAME: bug:Plane
 . . . NODE TYPE: 1
 . . . VALUE: null
 . . .  . . . CHILD NAME: #text
 . . .  . . . NODE TYPE: 3
 . . .  . . . VALUE:  Composite KitPlane
CHILD NAME: #text
NODE TYPE: 3
VALUE:

CHILD NAME: bug:Rocket
NODE TYPE: 1
VALUE: null
 . . . CHILD NAME: #text
 . . . NODE TYPE: 3
 . . . VALUE:  V1 Rocket
CHILD NAME: #text
NODE TYPE: 3
VALUE:

CHILD NAME: bug:Mosquito
NODE TYPE: 1
VALUE: null
 . . . CHILD NAME: #text
 . . . NODE TYPE: 3
 . . . VALUE:  Muskoka Mosquito
CHILD NAME: #text
NODE TYPE: 3
VALUE:



This is not the greatest example in that it doesn't show the real power of DOM
which is the ability to query out of sequence different data items on the DOM
tree.  We will leave this aspect of the DOM as part of the assignment.


 Setting Up Validation with the JAXP API  // reference



  Factories need to be configured, error handlers set and the document must be
  associated with at least one schema. The J2EE tutorial start by setting constants
  to represent JAXP properties
and the schema language namespace.

     // J2EE tutorial sets up constants 
    static final String JAXP_SCHEMA_LANGUAGE =
"http://java.sun.com/xml/jaxp/properties/schemaLanguage";
    static final String W3C_XML_SCHEMA ="http://www.w3.org/2001/XMLSchema";

   The J2EE tutorial next shows setting the factory to validate and show
    namespace awareness.

      
     // setting the factory to validate

     DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance( );
    factory.setNamespaceAware(true);
    factory.setValidating(true);
      try {
        factory.setAttribute(JAXP_SCHEMA_LANGUAGE, W3C_XML_SCHEMA);
         } 
      catch (IllegalArgumentException x) {
     // Happens if the parser does not support JAXP 1.2
     }

    There are two ways to associate an XML schema document with an XML
    document,
either via a schema declaration in the XML document or by
    specifying the schema to use in the Java application.  

      // We have seen the first approach in our discussion of schema

    
     Example Showing Specifying schema in the Document Root // from the J2EE Tutorial     

   <documentRoot
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:noNamespaceSchemaLocation='YourSchemaDefinition.xsd'
    >  // this is the noNamespace variation 

    The J2EE Tutorial shows the following code to set the schema in the JAXP
     application.    

   static final String schemaSource = "YourSchemaDefinition.xsd";
   static final String JAXP_SCHEMA_SOURCE =
 "http://java.sun.com/xml/jaxp/properties/schemaSource";
    ...
    DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance()
...
    factory.setAttribute(JAXP_SCHEMA_SOURCE,
  new File(schemaSource));

      // there is also the attractive setSchema( )  method whose use we will  test
        //  when looking at JAXB

    



XSLT, XML  Stylesheet Language Transformations



XSL is an abbreviation of 'Extensible StyleSheet Language'. XSL acts as the
umbrella for three specifications, XSLT, which stands for XSL Transformations,
XSL-FO which is short for XSL Formatting Objects and XPath, the XML Path
Language which is an expression language used by XSLT to access or refer to
different parts of an XML document. The two XSL standards, XSLT and XSL-
FO represent different metaphors in XSL, the former is based on templates and
the latter on formatting objects.


Hierarchy of XML Stylesheet Technologies


      XSL
        |____ XSLT
  // XSL Transformations transform documents based on 'rules' described in templates
        |____ XSL-FO
// XSL Formatting Objects a powerful and fine grained version of CSS
        |____ XPath      
// tree querying language, further relates to XLink & XPointer 


Snapshot of How XSLT Works

We don't have time to describe XSLT in any detail. However we should understand
that XSLT works like a CSS stylesheet in that a page is prepared that can be applied
to an XML document and change it in some way. While in CSS the changes are
confined to visual presentation, XSLT really has no restriction on what can be done.
For instance, a style sheet could be applied to a law document or a medieval poem
and supply simple interpretation for each word, phrase or sentence in the original work.

XSLT is an XML application and so is made up of tags. The key tag, <template>
tag is use to house rules that are associated elements inside an XML document using
'match' and 'select' attributes.  Along with tags that return values and XPath query
statements, filters can be created to return a transformed value for any part or set
of parts of an XML tree structure.

The W3Schools website supply the following example that gives us some idea
of the form XSLT takes. First in the XML document we see an <?xml-stylesheet ?>
processing instruction that references the stylesheet.


W3Schools HTML Example

<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet type="text/xsl" href="cdcatalog.xsl"?>
<catalog>
  <cd>
    <title>Empire Burlesque</title>
    <artist>Bob Dylan</artist>
    <country>USA</country>
    <company>Columbia</company>
    <price>10.90</price>
    <year>1985</year>
  </cd>
</catalog>
 

In the associated XSL StyleSheet that is referenced in the xml-stylesheet
processing instruction, we see the XSLT <template> tag enclosed in the
characteristic <stylesheet> tag.  In this example we see other 'main players'
of XSLT such as the 'match' and 'select' attributes, the <value-of> and the
<for-each> tags. They are are prefixed with the orthodox 'xsl' prefix which
stands in for the XSLT namespace URI.  The 'match' and 'select' attributes
are assigned path phrases written in ' XPath' which is a topic in it's own right.
XPath is a full-powered pattern matching language that can describe minute
details of an XML tree.  The granularity of XPath comes at a price. XPath
can easily compete with regular expression languages like Perl, for cryptic
complexity.


W3Schools StyleSheet Example

<?xml version="1.0" encoding="ISO-8859-1"?>

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="/">  // XPath is used to describe paths into the XML tree
  <html>
  <body>
    <h2>My CD Collection</h2>
    <table border="1">
    <tr bgcolor="#9acd32">
      <th align="left">Title</th>
      <th align="left">Artist</th>
    </tr>
    <xsl:for-each select="catalog/cd">
    <tr>
      <td><xsl:value-of select="title"/></td>
      <td><xsl:value-of select="artist"/></td>
    </tr>
    </xsl:for-each>
    </table>
  </body>
  </html>
</xsl:template>
</xsl:stylesheet>


We can't go into XSLT in any detail, but we have described enough that we may
confortably use a given  XSLT stylesheet and use it to do transformations inside
the context of the JAXP environment. 

Again, kudos th JAXP for maintaining the same basic pattern where a factory returns
an engine, in this case a transformer. The XSLT model is a little different at this point
as the object that is dictating how the input object will be changed or processed is
built into the transformer object. In other words the Transformer the XSLT stylesheet
configures the Transformer object  that is created to transform the input XML object
in a manner uniquely described by that stylesheet.  ( Where in SAX the ContentHandler
supplied the actions, and in DOM calling methods on the Document object methods
supplied output, In the XSLT API the Transformer interprets the XSLT stylesheet and
applies it to the input XML and outputs the transformed result document.


XSLT Processing Model


TransformerFactory__ returns a transformer
____
                                                                     |
           XML file --
to Input Stream-->  Transformer (based on XSL sheet) 
                                                              transforms ---
  applies templates   ->  generates Output



XSLT Mechanics


Again we see the uniform approach used in all divisions of the JAXP APIs.

XSLT Packages

The XSLT API allows inputing reading the input XML from several sources,
including from standard IO, input from a DOM tree or input from SAX input
events. The corresponding packages are listed below as import statements.


import javax.xml.transform.stream.StreamSource;  // InputStream, Reader or String
import javax.xml.transform.dom.DOMSource;         // DOM Node object
import javax.xml.transform.sax.SAXSource;           // SAX InputSource or Reader


Outputs can also be directed to forms useful to different departments of the JAXP API.

import javax.xml.transform.stream.StreamResult;    // OutputStream
import javax.xml.transform.dom.DOMResult;         // writes a DOM Document object
import javax.xml.transform.sax.SAXResult;         
// SAX ContentHandler methods


In addition to inputs and outputs, the XSLT API has it's own exceptions,
namely the TransformerConfigurationException and TransformerException.
TransformerConfigurationException is thrown by the TransformerFactory.
The TransformerException is thrown when an error condition occurs in the
Transformer object.  These classes are carried in the javax.xml.transform 
package so come with that import.

// JAXP also defines an ErrorListener interface with methods, warning( ), error(  ) and
// fatalError( ). Each take a Transformer exception and in turn throw one. The error listener
//  is set via  ' factory.setErrorListener(new MyErrorListener( ) ); '


The standard imports are the javax.xml.transform package to get the Transformer.
Together the imports might take the following form.

import javax.xml.transform.*;
import javax.xml.transform.stream.*;
import javax.xml.transform.dom.*;     
import javax.xml.transform.sax.*;   // SAX ContentHandler methods



The next example shows a code excerpt from the J2EE Tutorial. Here we
see a DOM DocumentBuilder
is returned from a factory that is already provided.
The builder parses our XML document
and represents it in memory as a DOM
object.  A new transformer is returned by the
TransformerFactory instance.
Now the DOM document object is rendered as a
DOMSource.  The result of
transformation is prepared and directed to the console.
Finally the transform( ) 
method is called on the source and result. The transformation
is streamed to
console.



Example     // from the J2EE Tutorial

try {
File f = new File(argv[0]);
DocumentBuilder builder = factory.newDocumentBuilder();
document = builder.parse(f);
 // Use a Transformer for output
TransformerFactory tFactory =
TransformerFactory.newInstance();
Transformer transformer = tFactory.newTransformer();
DOMSource source = new DOMSource(document);
StreamResult result = new StreamResult(System.out);
transformer.transform(source, result);


Because no XSLT file has been supplied here, the result is the same
as the input. The J2EE tutorial have a name for this called this an
'identity
transform.'

Following is a complete example. The earlier examples have been
modified to require two files supplied at the command line, the XML
source document and the XSLT stylesheet.


A Complete DOM Code Example
import javax.xml.parsers.*;
import org.w3c.dom.*; import org.xml.sax.*; import java.io.*; import javax.xml.transform.*; import javax.xml.transform.stream.*; import javax.xml.transform.dom.*;

 class XSLTRunner{ static Document xmlDocument, xsltDocument; static String indent=""; public static void main(String[ ] args) { // code if the user fails to supply an input file if (args.length != 2) { System.err.println("Usage: An XML file name is required as a first argument" ); System.err.println("An XSLT stylesheet file is required as a second argument" ); System.exit(1); } // the DOM Factory DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance(); domFactory.setNamespaceAware(true); // XSLT requires namespace awareness // the Transformer Factory TransformerFactory xsltFactory = TransformerFactory.newInstance(); try{ // The XML and XSL files are loaded into memory as // Document objects then converted to DOMSource objects DocumentBuilder builder = domFactory.newDocumentBuilder( ); xmlDocument = builder.parse( new File(args[0]) ); DOMSource xmlSource= new DOMSource(xmlDocument); // input XML as a DOMSource // The result object is set up to be streamed to standard output ( console ) StreamResult result = new StreamResult(System.out); xsltDocument = builder.parse( new File(args[1]) ); DOMSource xsltSource = new DOMSource(xsltDocument); // xslt source object // a Transformer object is created configured to the XSLT Source object Transformer transformer = xsltFactory.newTransformer(xsltSource); // the transform method applies the transformation and streams result object as output transformer.transform(xmlSource,result); } catch (Exception e){ e.printStackTrace( ); } // catching SAXException, IOException and ParserConfigurationException // TransformerConfigurationException and TransformerException } }

Following is an XML document and an associated XSL stylesheet. These will
serve are the first and second arguments at the command line.


Example
[peter@localhost java]$ java XSLTRunner ../XML/Samples/Z.xml ../XML/Samples/Z.xsl

// Save to Z.xml and Z.xsl respectively


An XML Sample to Transform

<?xml version="1.0" ?>
<?xml-stylesheet version="1.0" type="text/xsl" href="Z.xsl"?>

<AtoZ>

<A>Ardvark</A>
<B>Boomerang</B>
<C>Caterpillar</C>
<D>Demolition</D>
<E>Elephant</E>


<Z keyword="paper"> Paper is made from XML trees </Z>
<Z keyword="rock"> Mines are dug into rock </Z>
<Z keyword="scissors"> Scissors cut cloth </Z>
<Z keyword="water"> Water is hard on paper </Z>
<Z keyword="fire"> Water is hard on fire </Z>

</AtoZ>


A Corresponding XSL Stylesheet

<?xml version="1.0" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="AtoZ">
<html>
<head><title>AtoZ</title></head>
<body >
<xsl:for-each select="Z">
<B>Z : </B> <xsl:value-of select="."/>
<BR/>
</xsl:for-each>
</body>
</html>
</xsl:template>
</xsl:stylesheet>


Output

The output is an HTML page with values of the XML extracted and located
inside HTML tags. This looks visually formatted in a browser.



<html>
<head>
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>AtoZ</title>
</head>
<body>
<B>Z : </B> Paper is made from XML trees <BR>
<B>Z : </B> Mines are dug into rock <BR>
<B>Z : </B> Scissors cut cloth <BR>
<B>Z : </B> Water is hard on paper <BR>
<B>Z : </B> Water is hard on fire <BR>
</body>
</html>



Reading in the Text Associated With Material Covered in this Note:



'Developing Java Web Services', Chapter 8, pages 337 - 383


Exercise  




1)  Printing to console is OK for demonstration but ultimately we will
want to capture information we have processed in other formats.
One useful format is the plain text document.  In the following method
taken from the J2EE tutorial a StringBuffer is used along with it's
append( ) method is used to gather the results each time the
characters( ) method is called in the SAX Parsing process.


Method from the J2EE Tutorial


public void characters(char buf[], int offset, int len)
throws SAXException
{
  String s = new String(buf, offset, len);
  if (textBuffer == null) {
    textBuffer = new StringBuffer(s);
  } else {
    textBuffer.append(s);
  }
}


The echoText( ) method transfers the StringBuffer resulting
from the series appends and converts it into a String object.
The method takes advantage of the simple concatenation
technique where the textBuffer is rendered in it's 'toString( )
and concatenated to an empty String, "". The 'toString( )
method might have been used instead.


Method from the J2EE Tutorial

private void echoText()
throws SAXException
{
  if (textBuffer == null) return;
  String s = ""+textBuffer;     // might have used textBuffer.toString();
  textBuffer = null;
}


This method definition can be modified to return the String object
created.

We can then do some classic IO by writing this String to file.
We can use the following constructor form of the FileWriter class
in the java.io package which allows us to choose whether to replace
or append to a file.


One of FileWriter's Constructors


FileWriter(String fileName, boolean append); // creates a File object with name Specified


The following write method can be used to write the String we created
in the parsing process to the file.


One of FileWriter's Overloaded write() methods

 public void write(String str, int off, int len) throws IOException

Now we have enough pieces to modify the SAXRunner parsing program
supplied to capture the character( ) method events in a String and
write it to File. 

Declare a StringBuffer and a String type in class scope of the DefaultHandler class. 
Override the
the DefaultHandler's character( )  method  to collect the character data
of the
document as it is being parsed inside a StringBuffer.

From inside the endDocument( )
method transfer the contents of the StringBuffer to
a String object and write it to
a log file using the FileWriter class write( ) method. Set
the append value of FileWriter constructor to true so additional entries will be added
to the file.

After you have instantiated the FileWriter and called the write(  ) method on it's
reference you will need to close the FileWriter object before it will complete the
write to file. (i.e. writer.close( ); ) Also you will have to catch for IOException when
using the writer.

// Better yet flush( ) and close( )
 


2 ) Using DOM to extract out-of-order data clearly demonstrates one
of it's finer qualities.

Create a simple XML document that has several pieces of information
on an employee. Using the DOM code, create a method that selects for
three pieces of data out of order from a DOM tree that represents this
document. Use logic like the following to secure the information. Printing
to console will be all that is needed to demonstrate this query.


Example

static void getShortEmployeeID(Node child){ 

 if ( employeesName.equals(child.getNodeName( ) ){
    System.out.println( " Employee's Name: " + child.getNextSibling( ).getNodeValue( );
    }
 // two more pieces of information, captured out of order
 }

Do a screen capture of your output to demonstrate the results.