XML at a Glance
Peter
Komisar version 1.0
© 2008
references:
XML & Web Services Unleashed , R.Schmeizer et.al 'The
XML
Bible' ,/Elliotte Harold,''The Birth of
XML', Jon Bosak,
http://java.sun.com/xml/birth_of_xml.html,
XML in a
Nutshell, ER.
Harold & W.S Means, 'Mastering XML', Nazarro,
White & Burman
'Professional XML Schemas', J. Duckett et.al.
GML,
SGML & HTML
Charles Goldfarb, Ed
Moscher and Ray Lorie, in 1969, at IBM
created GML, General Markup Language to simplify handling
legal documents. The moniker is based on their initials. The
documents were in different formats and relied on different
platforms, so they sought to design a unified cross-platform
mark-up language. The standard version of GML became
known as SGML.
The key design feature of
SGML is it allowed the creation of
custom tags.
SGML
though became complex and difficult. It was also thwarted
by Tim Berners-Lee's HTML which was a simple language written
in SGML. It became a 'best seller' as it was relatively easy, and
worked for 90% of uses.
HTML was though still
limited as it was not customizable.
The
Advent of XML
// a refinement of SGML
Jon,
Bosak, Tim Bray and C.M. Sperberg-Queen, Jean Paoli and
James Clark,
many of whom were SGML pioneers sought to
filter
SGMLs best features and port
them to the web. In a sense XML
was a distillation of SGML a little like Java
is a simplification
of C++.)
John Bosak offers his own recollection of how XML
came
to
be at http://java.sun.com/xml/birth_of_xml.html
//
might have to search at Sun
Hello World in XML
We
can introduce the general look of XML in a quick Hello
World version.
Example
<?xml
version = "1.0" standalone="yes"?>
<Earth>
Hello World in XML!
</Earth>
Write
or copy and paste this text into a simple text editor like
Notepad
and save it to a name with an .xml
ending, such as
HelloWorld.xml. Once saved, it can be opened
into a browser.
The result is not very exciting as there is no
formatting
associated with the text.
If we look at the
<Earth> element, we can see that XML uses
'tags' or sets of
enclosing braces that surround identifiers. A
'start' and 'end'
tag are distinguishable.The 'end' tag includes
a forward slash
ahead of the element identifier.
A named tag is called an
element. Elements can also contain
attributes. In the following
example the attribute called 'type'
holds the value 'planet'.
Example
< Earth type="planet" ></Earth>
Note in this reiteration
of the element we left the content out.
This creates an 'empty'
element. XML supplies an abbreviated
form for an 'empty' element
as
follows.
Example
< Earth type="planet" />
This is a recommended
form as it reduces the risk of creating
an 'orphaned' end tag.
Speaking of form XML
describes what makes a 'well formed'
XML document.
Well Formed and Valid XML
Rules
Governing XML Structure
1) XML Elements must have closing tags. That means all tags.
Example
<Break></Break>
or <Break / >
2) XML Elements unlike HTML are case sensitive.
Example
<GO
/ > is not the same as <go / >
3)
XML tags must all be properly nested. In other words tags
must be
closed in reverse order they are opened.
Below the
tags open One, Two Three,
and close Three, Two One.
Example
<One>
<!--
opens -->
<Two> <!--
opens -->
<Three> <!--
opens -->
</Thee> <!--
closes -->
</Two> <!--
closes -->
</One> <!--
closes -->
4)
XML Documents must have a single root element. This
implies all
elements of a document are nested
inside the
root document. The root identifier is the same
type as is
declared in the document type declaration if one is
present.
5)
Attribute values must all be quoted, by convention using
double
quotes.
Example
number="1029383454738";
6) Attributes may only appear once in an element.
Example
<!-- can't have
-->
< X x = "y"
x = "z" >
7)
Attribute values cannot contain references to external
entities. XML
text can reference XML external
entities but
not tag attributes. Attributes can use
internally defined and
pre-defined entity references.
Example
< ANC nac = "CNA'S" >
8)
Entities must be declared before they are used. Predefined
entities
are already defined so they are
ready to go.
//
entities can't be forward referenced
Well
Formed XML
// defining what a well formed XML document is
In
the first case, XML requires that a document be 'well formed'.
To be
well formed a document must follow
the above stated
rules and in addition, the document
must not contain markup
or characters that XML cannot process.
Formula For Well Formed XML
Adherence
to Structural Rules + Correct Syntax = Well Formed XML
Valid XML
An
XML document is considered valid if it is first well-formed
and in
addition it has a document type
definition, a DTD or an
XML Schema, that describes
constraints that the document is
in
compliance with.
A
well formed document can be used without a schema. This
will
automatically limit it from using
certain advanced XML
features that are available only through
some form of document
type declaration.
This
is just a glimpse of the detailed specification that is
used in XML. It is enough though to make looking at the
XML configuration files used in the J2EE specification more
more meaningful.
Web Applications as
well as the larger enterprise applications
have as part of
the J2EE specification, configuration files that
are written in
XML. Following are some examples.
Following is a brief example of an application.xml file which
comes with the Sun Enterprise download. This is the deployment
descriptor for the EAR file which is an enterprise application
packaged in J2EE format.
Sample Location Inside Sun Download
C:\Sun\SDK\docs\firstcup\example\firstcup\src\conf
Brief Description of the
application.xml File
The following XML
shows an root tag called application.
It contains two modules. One holds a reference to a war
file that contains web resources. The context root, from
where the web pages will be referenced, is also noted.
The second holds a jar that houses a sample enterprise
Javabean.
The long
xsi:schemaLocation basically says where the
XML schema may be found. The schema is the data
structure type definition for the file.
An application.xml Sample //
java.sun.com
<application
version="5"
xsi:schemaLocation="http://java.sun.com/xml/ns/javaee
http://java.sun.com/xml/ns/javaee/application_5.xsd">
<display-name>firstcup</display-name>
<module>
<web>
<web-uri>firstcup-war.war</web-uri>
<context-root>/firstcup</context-root>
</web>
</module>
<module>
<ejb>firstcup-ejb.jar</ejb>
</module>
</application>
Looking around the
directories in the Sun SDK, and we
can find the web.xml file which is the deployment descriptor
for the web jar, ( or 'war' file ), associated with the above
application.
Brief
description of the web.xml File
The root element of
the web.xml file is web-app. Configuration
information is shown for a servlet. It includes the name of the
servlet, it's fully qualified name and an indicator to say to load
on start-up. It also has a mapping tag that shows where the
file will be relative to the context root. There is also a time
specified to time-out a session.
An web.xml
<web-app
xsi:schemaLocation="http://java.sun.com/xml/ns/javaee
http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd" version="2.5">
<servlet>
<display-name>FacesServlet</display-name>
<servlet-name>FacesServlet</servlet-name>
<servlet-class>javax.faces.webapp.FacesServlet</servlet-class>
<load-on-startup>1</load-on-startup>
</servlet>
<servlet-mapping>
<servlet-name>FacesServlet</servlet-name>
<url-pattern>/firstcupWeb/*</url-pattern>
</servlet-mapping>
<session-config>
<session-timeout>
30
</session-timeout>
</session-config>
</web-app>
A Simple
ejb-jar Example
The following can be found at the Apple site.
Source Location
http://developer.apple.com/internet/java/examples/entitybean5_source.html
This code is a little older and uses a DTD definition for the
data structure rather than the newer XML schema language.
The root tag is called ejb-jar and holds an enterprise-beans
tag. This tag in turn holds an 'entity' tag which tells
what type of Enterprise JavaBean this is. Besides a name
and a description, the name of the two RMI interfaces
is included, represented by the 'home' and 'remote'.
Interfaces. The persistance model is specified as
'CMP' or container managed persistence. Also the
key identifier, the primary key field is named. This
is the handle the system uses to access this entity
bean.
This description will make more sense after we have
looked at Enterprise JavaBeans.
Anejb-jar.xml Example
<?xml version="1.0"?>
<!DOCTYPE ejb-jar PUBLIC "-//Sun Microsystems,
Inc.//DTD Enterprise JavaBeans 1.1//EN"
"http://java.sun.com/j2ee/dtds/ejb-jar_1_1.dtd">
<ejb-jar>
<display-name>HelloEntity</display-name>
<enterprise-beans>
<entity>
<description>Extremely Simple Entity bean, models a person</description>
<ejb-name>HelloEntity</ejb-name>
<home>HelloEntityHome</home>
<remote>HelloEntity</remote>
<ejb-class>HelloEntityEJB</ejb-class>
<persistence-type>Container</persistence-type>
<prim-key-class>java.lang.String</prim-key-class>
<primkey-field>name</primkey-field>
<reentrant>False</reentrant>
<cmp-field><field-name>name</field-name></cmp-field>
<cmp-field><field-name>email</field-name></cmp-field>
</entity>
</enterprise-beans>
</ejb-jar>
Just
to get a hands on feel for XML, consider that XML
is also the data conduit for Web Services. To this end
create an XML file that will serve as a sample object for
a Memo object. The root tag will be Memo. Internal tags
will be be 'To', From' 'Date' and 'Message'.
Make
sure the tag view is viewable in a browser. This
tells you, you have correctly formed the