XML Schema Peter Komisar latest version .5.1 ©
references:
The XML Schema Primer http://www.w3.org/TR/xmlschema-0, The XML
Schema Structures (Part 1) http://www.w3.org/TR/xmlschema-1, 'XML
Schema Part 2:
Datatypes',
www.w3.org/TR/2001/REC-xmlschema-2-20010502/,
Examples from
W3Schools Web site,
www.w3Schools.com, 'XML and Web Services Unleashed', R.
Schmelzer
et.al., 'XML in a Nutshell', E.R Harold & W.S. Means,
Professional XML
Schemas, J.Duckett et.al.
Overview
In May, 2001 the W3C
published their recommendation for the XML Schema Definition
Language. The specification allows the creation of different
sorts of simple and complex
XML elements that govern typing in an
associated XML 'instance' document . The
recommendation also
introduces a large set of data types that allow data in an
XML
document to be strongly typed.
Official Documents
In addition to general
information that is available at http://www.w3c.org/TR/Schema,
there
are three documents frequently cited as sources and official
documents for the
XML schema recommendation. They are listed
below.
The XML Schema Primer (Part 0) -http://www.w3.org/TR/xmlschema-0/
The XML Schema Structures (Part 1) -http://www.w3.org/TR/xmlschema-1/
The XML Schema Structures (Part 1) -http://www.w3.org/TR/xmlschema-2/
What
is a Schema?
Definition: A schema is a set of rules that is used to govern data structure and content.
The schema Element
The
<schema> Element
The
root element of an XML schema document is the 'schema' element.
The
schema element houses the set of schema elements that are used
to
create an XML schema definition.
The
following example shows a 'skeleton' schema. Notice in this example
the official namepace for the World Wide Web's consortium's XML
schema
language is specified and assigned to the namespace prefix
'xs'. This is a
hallmark of a schema document. Also shown is the
'targetNamespace'
attribute which holds a URI that represents the
namespace that will be
associated with this particular schema.
A
Skeleton Form of the 'schema' Element
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema"
//
schema language namespace
targetNamespace="http://www.example.com/example">
//
namespace that this schema will be associated with
. . .
</xs:schema>
A complete schema may be
represented in XML by one or more 'schema documents',
that is,
one or more, <schema> element, information items.
// points out that a schema can be comprised of a { set } of schema documents
Form
of the schema Element <schema
id
= ID
|
The schema element is a skeleton container for it's components which may be sets of:
// may include imported definitions and nested elements
simple & complex, type definitions // <complexType> and <simpleType> elements
'top-level' attribute declarations // <attribute> elements
'top-level' elements // <element> elements
attribute groups // <attributeGroup>
model group definitions // <group>
notation elements // <notation>
annotation information items // <annotation>
//
<include> and <import> elements are not schema
constituent components but
// rather are directives that serve the
same purpose as in other programming dialects
// to bring in
definitions that are defined externally to the local document
XML Schema Namespaces
Schema has three
namespaces that are 'hallmarks' of the application and will
typically be present in schema and schema instances.
"http://www.w3.org/2001/XMLSchema" // xmlns:xs="http://www.w3.org/2001/XMLSchema" or with the xsd prefix
The
XMLSchema namespace is used to identify elements that are part of the
schema application
itself. These are the widgets XML Schema
language uses such as the <schema>, <element>,
<attribute> and <complexType>. This
namespace definition appears in the schema document.
"http://www.w3.org/2001/XMLSchema-instance" // xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
The
XMLSchema-instance namespace is found in XML instance documents
The XML
Schema Instance namespace is most often used to qualify the
'xsi:schemaLocation'
attribute or the
'xsi:noNamespaceSchemaLocation' attribute. It is also used to specify
a
'nil' value, in the form xsi:nil. This is a form that
explicitly states there is no given entry
for a field. (This
differentiates from the default value that an empty string supplies,
which
is an empty string which may be an incorrect type for a
given element. ) The Schema
instance namespace is also used with
the 'xsi:type' attribute to specify a derived type
that a schema
defines.
Following is an example of how the xsi:nil attribute
is used
W3C
Example <xs:element name="shipDate"
type="xs:date" nillable="true"/>
// boolean attribute 'nillable' set to true
A subsequent 'shipDate'
tag in a document instance can then have it's 'nil' value set
to
true.
W3C
Example <shipDate
xsi:nil="true"></shipDate>
//
A third namespace,
xmlns:sxdatatypes="http:www.w3.org/2001/XMLSchema-datatypes
//
is defined that may be used in place of the general XMLSchema
namespace for qualifying
//
built in data types. It is a specialized namespace that could be used
to represent Schema
//
data types in applications other than XML schema.
Schema Samples
The following W3Schools
example is a skeleton of more typical form that the
schema tag
might take.
Example 1 From the W3Schools Website
<?xml
version="1.0"?>
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema"
<!--
specifies the official XML Schema namespace and associates it with
the 'xs' prefix -->
targetNamespace="http://www.w3schools.com"
<!--
specifies the namespace to which elements of this particular schema
are associated -->
xmlns="http://www.w3schools.com"
<!--
creates a default namespace -->
elementFormDefault="qualified">
<!--
where different elements go -->
</xs:schema>
Following is an example
of a simple schema and a corresponding document
instance that has
been verified against the latest standards. ( This time we
have a
simple element included. )
Example
2 // save
to SomeName.xsd
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="www.example.com"
elementFormDefault="qualified"
>
<xs:element name="elem1"
type="xs:string"/>
</xs:schema>
Next we show the
corresponding 'instance' document.
Example
2 XML Instance for the above Schema
// was saved as SomeName.xml
<?xml
version="1.0"?>
<elem1
xmlns="www.example.com"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="www.example.com SomeName.xsd">
It's
Schema Time!
</elem1>
//
again note the default namespace provided bring the instance into
//
'qualification' with the target namespace specified
While just serving as
identifiers, the W3C consortium wish you would use real
URIs.
To this end some have been reserved for the purposes of XML. They
are listed b below.
Reserved
Pages http://www.example.com
http://www.example.net
http://www.example.org
Following is a complete
W3Schools Example of a Short XML Schema.
By convention, this file
would be saved to a file with an .xsd extension,
//
Like DTDs, schema has it's own file extension by convention.
Unlike
// DTD's schemas are 'real' xml documents, written in XML.
A Short Schema Example From W3Schools
<?xml
version="1.0"?> <!--
schemas share the hallmark of an xml file -->
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.w3schools.com"
xmlns="http://www.w3schools.com"
elementFormDefault="qualified">
<xs:element
name="note">
<xs:complexType>
<xs:sequence>
<xs:element
name="to" type="xs:string"/>
<xs:element
name="from" type="xs:string"/>
<xs:element
name="heading" type="xs:string"/>
<xs:element
name="body"
type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<!-- this whole section is one element definition! More on
this later! -->
</xs:schema>
( If you are going to test run this code, make sure you use the
same
name as is referenced in the instance document. Below it is
called
note.xsd )
The XML Instance Document
Following is the W3Schools document that references the above XML Schema.
A W3Schools XML Example Implementing the Schema
This XML document has a reference to an XML Schema:
<?xml
version="1.0"?>
<note
xmlns="http://www.w3schools.com"
<!-- the default namespace declaration
-->
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
<!-- the XMLSchema-instance
namespace-->
xsi:schemaLocation="http://www.w3schools.com
note.xsd"> <!--
the schema's location Notice two parts-->
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
Notes on the Instance Document
Inside the instance
document's root element, <note>, there are two namespace
assignments and a 'schemaLocation' attribute.
1) The instance
document has created the 'xsi' namespace prefix that represents
the special W3C namepace,
"http://www.w3.org/2001/XMLSchema-instance"
used in conjunction with the 'schemaLocation' attribute.
(Alternatively, there is a 'noNamespaceSchemaLocation' attribute that
can be
used when no namespace has
been declared in the schema document. )
Example xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
2) Both the
schema and the instance have created a default namespace
for their documents.
Example xmlns="http://www.w3schools.com"
Effectively, this makes
all elements which are not prefixed, 'qualified' and
belonging to
the default namespace. (This keeps it in sync with the
assignment,
elementFormDefault="qualified" , found in
the corresponding schema.)
3)
Once created, we use the 'xsi' namespace prefix in conjunction with
the
'schemaLocation' attribute to
specify the actual location of the schema is
entered. This is shown in the example below that extracts the two
lines
that show this aspect of the
instance document. ( The prefix is highlighted.)
Example
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3schools.com
note.xsd">
4) The
'schemaLocation' element is specified in two parts. This can be
seen
in the second line of the above
example. This points to what might seem
at first glance to be a slightly ambiguous part of the Schema
Language
specification. Although
it would appear that this is a path / file format, the
XSL specification does not require that these identifiers are
related. For
example, the
following example demonstrates that the two values do not
have to be related. It is really up to the parser to interpret these
identifiers.
Example
xsi:schemaLocation="http://www.example.net note.xsd">
The actual XML Schema specification states that
the first name should
"hint"
as to where the parser can find the schema. It is reasonable to
expect
that the first identifier in some way
suggests where the 'note.xsd' file will
be found but there is no obligation to be so accommodating!
Global Elements & Attributes // global is the scope of the schema element
Global elements are
children of the 'schema element'. Because attributes
are
themselves elements, the same can be said about attributes.
Attributes
are global when they are children of the schema
element.
Global element or
attributes can be referenced using the 'ref' attribute.
The
following is an incomplete example that emphasizes the use of
the
'ref' attribute.
<>Example
. . . .
<xs:element ref="History">
// element references global 'History' element
. . . .
<xs:element name="History" >
<xs:complexType>
<xs:sequence >
// etc.
Where
to go to validate
Java
Validating Parsers
1) The Java world now
supports a unifying architecture for parsing and transforming
XML
documents, called JAXP or the Java API for XML Processing. Within
this
context several parsers can be made available that are able
to validate schema.
Because JAXP is our next topic we will leave
looking at validating with the JAXP
API to the next note.
DecisionSoft's Online Validation Tool
2) If you don't have the
the Java JDK or Xerces on your machine you can access
the same
functionality by using 'DecisionSoft's online tool that runs your
documents
against the Xerces parser for you. Go to the following
link and using the browsing
functions to find your files.
DecisionSoft's Schema Validation Page
http://tools.decisionsoft.com/schemaValidate.html
Microsoft's
XML Validation Page
3) Microsoft has an XML
validation page, (which most of you have found will add
a link for later. ) It is important that you are using IE6 or have
downloaded the
MSXML 4.x package which
supports the latest version of the W3C's standards.
The MSN XMLValidator is available at the Microsoft Developer Network.
http://msdn.microsoft.com/downloads/samples/internet/xml/xml_validator/default.asp?frame=true
XMLWriter
4) XMLWriter, as was
pointed out in class is based on Microsoft's XML engine so
for it to do the most recent forms of validation, it too must have
the MSXML 4.x
package present.
XMLSpy
5)
XMLSpy is an all-in-one dedicated XML application that is a leader in
the field. IBM also
supplies validation
parsers which can be found at their web site.
Complex & Simple Types
Simple
& Complex Types
An XML document will
consist of a main element and sub-elements.
Sub-elements may in
turn contain other sub-elements. In XML schema
descriptions,
elements that contain sub-elements or carry attributes are
generic
complex types. Elements that contain primitive data,
numbers,
strings, dates etc.) and that do not contain any sort of
sub-element are
considered generic simple types. Attributes
themselves are represented
as simple types. It follows then that
only complex types can have attributes.
//
If the element will just hold primitive data --> they are simple
type
// If an element has attributes or other elements -->
then they are complex types
A <complexType>
element is involved in creating a generic complex type
element.
There is a parallel situation that exists for generic simple types
where a <simpleType> element is used.
We begin by describing
the simple element type that is used to create
basic definitions
in our schema specifications.
Simple Elements // contains text data and not other elements or attributes
A simple element is an
XML element can only contain text. It cannot contain
any other
elements or attributes. "Only text" in this context means
one of a
number of different types, whether 'built-in' types like
'token', 'string' or 'boolean'.
Custom types or types derived from
built-in basic types can also be used to
specify content type.
Following is the base syntax for a simple element.
Simple Element Form <xs:element name="xxx" type="yyy"/>
Following are three
simple elements as they might appear nested inside
an XML Schema
document.
Simple
Element Examples
. . . .
<xs:element
name="treeType" type="xs:string" />
<xs:element name="leafPoints"
type="xs:integer"/>
<xs:element name="idDate" type="xs:date"
/>
. . . .
The above schema
definitions would map to corresponding instance
element such as
the following.
Corresponding Elements in Instance Document
.
. . .
<treeType>
Basswood </treeType>
<leafPoints> 1
</leafPoints>
<idDate> 2005-06-01 </idDate>
. . . .
//
We will see late that simple elements can be further 'extended' or
'restricted'
// to create custom sets of simple types, using the
'simpleType' element.
Complex
Elements
Several XML schema
elements work together to create complex types. Complex
types may
contain nested elements and attributes. They may also be formulated
to create a 'mixed' form that contains elements mixed with text.
The complex type
is hallmarked by the presence of the
<complexType> tag.
The
complexType element - The <complexType> element acts
as a container
for a set of elements which may include attribute
declarations. The following
skeleton example is simplified by
leaving out nested elements and shows a
named version of the
element called 'skeleton'. This name can be assigned
to the
type attribute of an element. ( Note the 'xsd' prefix is an
alternative to the
'xs' prefix that is conventionally used in XML
schema namespace declarations.
It is popularly being replaced by
the shorter 'xs' prefix which you also see in
many examples.)
The following example is also referred to as a complex type declaration.
Example of a Named Form of a Complex Type Declaration
<xsd:complexType
name="skeleton" >
<!-- needs a top-level nested compositor
either <sequence>, <all> or <choice> -->
<!-- elements nested in the compositor -->
</xsd:complexType>
The above
example is the named form of the complexType element, the name
here
is "skeleton". This allow the type to be used as a template
that can be
referenced at some other point in the schema document
as is shown is the
second part of the example below. Notice the
'type' attribute is used to specify
what the type of the element
named 'PostalExtension' will be.
Complex
Element Example
// elsewhere in the schema
<element name="PostalExtension" type="USZip">
The Anonymous or Unnamed Form
Elements are frequently
composed with anonymous or 'inlined' 'complexType'
elements nested
inside them. The following example from 'XML in a Nutshell'
shows
this form. Notice, the <complexType> element in the example has
no
name.
Abbreviated Example of the Anonymous Form of a Complex Element
// from 'XML in a Nutshell', Harold & Means
<xs:element
name="fullName">
<xs.complexType>
// here the complexType is not
named and is 'inlined' into another element
<xs:sequence>
<xs:element name="first"
type="addr:nameComponent"/>
<xs:element
name="lastt" type="addr:nameComponent"/>
</xs:sequence>
</xs.complexType>
</xs:element >
The next example shows
an inlined complexType element that is made
up of simple elements
that use built-in schema types.
Example
2
<xs:element
name="root">
<xs:complexType>
<xs:sequence>
<xs:element name="treeType" type="xs:string"
/>
<xs:element
name="leafPoints" type="xs:integer"/>
<xs:element name="idDate" type="xs:date"
/>
</xs:sequence>
</xs:complexType>
</xs:element>
The
'element' element - In many examples so far we have seen
the <element>
element. It is used both as a nesting unit
and as a base to house different
complex types. Consider the
inlined example above where the <complexType>
element is
wrapped inside a named element while at the same time there are
nested simple type elements.
Example of an Element Element
<xsd:element
name="name" type="xsd:string"/>
The
'attribute' element - The 'attribute' element is used to
create attribute values
for the 'complexType'. In the following
attribute example, the attribute value is fixed
and must be "US".
Notice that the attribute type is a predefined, XML schema
simple
type called NMTOKEN. All attribute declarations must reference simple
XML data types. (As just mentioned we will see how we custom
define simple
types later.) Unlike element declarations,
attributes must be simple and cannot
contain other elements or
other attributes.
Example of an Attribute Element
<xsd:attribute
name="country" type="xsd:NMTOKEN" fixed="US"/>
By looking at the
following example of a complex type you can see all three
element
types used in conjunction with each other. The example includes
the
'sequence' element which contains an ordered grouping of elements.
The elements
grouped in a sequence must appear in the exact order
that
they are specified within the 'sequence' element of the
schema. This
has
an effect
equivalent to an 'Element Only' definition in a DTD.
W3 Example of a ComplexType Declaration // from the W3C example
<xsd:complexType
name="USAddress" >
<xsd:sequence>
<xsd:element name="name"
type="xsd:string"/>
<xsd:element
name="street" type="xsd:string"/>
<xsd:element name="city"
type="xsd:string"/>
<xsd:element
name="state" type="xsd:string"/>
<xsd:element name="zip"
type="xsd:decimal"/>
</xsd:sequence>
<xsd:attribute name="country"
type="xsd:NMTOKEN" fixed="US"/>
</xsd:complexType>
<!-- just above is the attribute element added to this
example -->
This definition can
then be referenced in another tag from inside the schema
document.
For instance, in the W3C Schema example at the end of the note,
the
above defined type is referenced as follows.
Complex
Type Declaration Referenced Later in Schema
<xsd:element name="shipTo" type="USAddress"/>
The specification
that is dictated by this schema element is adhered to by
the
corresponding element in the schema document. ( This code can be
found
in the second part of the W3C example at the end of the
note.)
Valid Form of Element Referencing Complex Type in Subsequent Instance Document
<shipTo
country="US">
<name>Alice
Smith</name>
<street>123 Maple
Street</street>
<city>Mill
Valley</city>
<state>CA</state>
<zip>90952</zip>
</shipTo>
Occurrence
Constraints For Elements
Occurrence constraints
decide exactly how many times an element can occur.
This is is an
improvement over DTDs which can limit occurrences of elements
to
zero, one or many.
The following table
captures a good comparison of DTD cardinality controls
compared
to those used in XML Schema.
Table Comparing DTD Cardinality Controls with Values of minOccurs & maxOccurs
// facsimile of a similar table found in 'Professional XML Schemas', J.Duckett et.al, Wrox Press
DTD
Cardinality |
minOccurs
|
maxOccurs |
Element Occurences |
none // default |
1 |
1 |
Once & only once |
? |
0 |
1 |
Zero or one |
* |
0 |
unbounded |
Zero or more |
+ |
1 |
unbounded |
One or more |
// unbounded is assigned as a literal value inside double quotes i.e. maxOccurs="unbounded"
Schema
Cardinality Controls
'minOccurs'
&
'maxOccurs' - The attributes that control element
occurrences
'minOccurs' and 'maxOccurs'. The default value for
both the 'minOccurs' and
the 'maxOccurs' attributes is 1. With
'minOccurs' at the default value, the element
is required to
appear at least once. With maxOccur set to the default value of 1,
the element is only allowed to appear at most only once.
'maxOccurs' can be set
to whatever value is appropriate. By
setting minOccur to 0, the element because
optional. The next
example explicitly states that this element is optional.
Example
<xsd:element ref="comment" minOccurs="0"/>
'unbounded'
- The 'unbounded' value can be used with 'maxOccurs' to
emulate
the behaviour found in DTDs when * and + are used.
The following is a
modification of the W3Schools example allowing the note to
specify
up to 5 receivers. We have explicitly stated the default vale for the
'minOccurs' attribute.
W3Schools Schema Example Modified
<?xml
version="1.0"?>
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.w3schools.com"
xmlns="http://www.w3schools.com"
elementFormDefault="qualified">
<xs:element
name="note">
<xs:complexType>
<xs:sequence>
<xs:element
name="to" type="xs:string" minOccurs="1"
maxOccurs="5"/>
<xs:element name="from"
type="xs:string"/>
<xs:element name="heading"
type="xs:string"/>
<xs:element name="body"
type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
Matching
Schema Instance
<?xml
version="1.0"?>
<note
xmlns="http://www.w3schools.com"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3schools.com Note.xsd">
<to>Jack</to>
<to>Jill</to>
<to>Elvis</to>
<to>Nephilum</to>
<to>Santa</to>
<!--
<to>Hercules</to> adding this 'to' element is one
too many & won't validate
-->
<from>Elmo</from>
<heading>Clarification</heading>
<body>Concerning
Fairy Tales, Myths and Legends</body>
</note>
Attributes
Behave Differently
Attributes, on the other
hand, can appear once or not at all. They are controlled
with a
different syntax. The 'use' attribute can be used with them to make
their
appearance 'required', 'optional' or 'prohibited' which is
exampled below`.
Attributes
Attribute
Simple Form
Attributes have the same
form as simple elements. A simple representation
of the form
of attributes is as follows.
Simple Form of the XML Schema Attribute
<xs:attribute
name="xxx" type="yyy"/>
The following, more
imposing W3 representation indicates that attribute definitions
have
many exotic features. They may be set to defaults or have fixed
values.
Attributes may be qualified or unqualified. They may have
a unique identifier,
a name and perhaps may reference another type
definition. Their use may
be specified as optional, prohibited or
required. They may be annotated
( documented) and may be custom
composed using a <simpleType > element.
The W3 Representation of an attribute element // for reference
<attribute
default = string
fixed
= string
form = (qualified |
unqualified )
id = ID
name = NCName // An
NCName as defined by XML-Namespaces
ref
= QName
type = QName
use = (optional | prohibited | required) : optional
{any attributes with non-schema namespace . .
.}>
Content: (annotation?,
(simpleType?))
</attribute>
Example
<xs:attribute
name="age" type="xs:positiveInteger"
use="required"/>
Default Values for Elements and Attributes
//
Attributes can have a default value OR a fixed value specified
Attribute Default Values
In XML Schema Language,
there is an actual 'default' attribute which is used
to assign a
default value to an attribute. Default values only make sense if
attributes are optional. In fact in XML Schema Language, it is an
error to
specify a default for anything but an optional
value.
Both 'attribute' elements and 'element' element
have the 'default' attribute and
may use this attribute to provide
a default value With attributes, the default
value will be
assigned based on whatever is provided in the instance document.
If
the attribute does not appear in the instance document, the schema
processor
provides the default attribute value that was supplied
in the schema.
In other words, a
default value is automatically assigned to the attribute when
no
other value is specified. In the following example the default value
is "EN":
<xs:attribute name="lang" type="xs:string" default="EN"/>
//
in the instance document, if the attribute wasn't provided it would
default to "EN".
Element
Default Values - When an element is declared with a
default
value, it is given the value specified in the element's
content area as found
in the instance document. If the element
appears without content, (is empty),
the schema provides the
value that is given to the default attribute. If the
element
doesn't appear at all in the document, however, the schema doesn't
provide the element at all.
The W3C paper summarizes,
"
Default attribute values apply when attributes are missing,
and default element values apply when elements are empty."
They might have added, but felt it went without
saying, 'nothing is applied
if elements are missing'.
Creating
Optional and Required Attributes
The
'use' attribute is used to control whether an attribute may, must or
can't
appear in a schema governed
document. All attributes are optional by default.
To
explicitly specify that the attribute is optional, the "use"
attribute is used as
is shown in the following W3Schools example.
W3Schools
Example
<xs:attribute
name="lang" type="xs:string" use="optional"/>
To make the attribute
required the use attribute is assigned the 'required'
value.
Example
<xs:attribute name="lang" type="xs:string" use="required"/>
There is a third value
that can be assigned to the 'use' attribute. This is the
'prohibited' value, indicating the attribute cannot appear at all
in the parent
element.
//
testing showed default didn't need to be specified in optional case,
which may be
// interpreted as ' optional with no default
specified'
The
Fixed Attribute
Both attribute and
element declarations use the 'fixed' attribute to 'fix' specific
values. We saw this with the country attribute which was declared
with the fixed
value, "US". Consider the following
example. The use attribute is not specified,
therefore, it has
the default value which is 'optional'. Accordingly, if the country
appears it must have the value 'US'. If the country attribute
doesn't appear the
schema processor will provide the value, 'US'.
//
optional mixed with fixed means that the instance may or may
// not be specified however
in both cases it will be the fixed value.
Example
<xsd:attribute name="country" type="xsd:NMTOKEN" fixed="US"/>
A fixed value is
automatically assigned to the attribute. You cannot specify
another
value. In the following W3Schools example the fixed value is "EN":
W3Schools
Example
<xs:attribute name="lang" type="xs:string" fixed="EN"/>
As talked about
earlier in other words, the idea of providing a fixed value, or,
providing a default value is mutually exclusive. It is an error
to declare both
fixed and default attributes in an element.
Schema Data Types Are Used by Attribute & Simple Elements
We saw in the schema we
looked at earlier, examples of simple types being
used in both
elements and attributes. Simple types are like the primitive types
of the Java programming language. They are the atomic and prime
types of the
XML Schema Language. For instance in the following
two examples, the first
being an element and the second an
attribute, we see the type decimal and the
type NMTOKEN being
used.
Examples of Simple Types used in both Element and Attribute
<xsd:element
name="zip" type="xsd:decimal"/>
<!--
. . . . -->
<xsd:attribute
name="country" type="xsd:NMTOKEN" />
Built in Schema Data Types
The
following table lists the imposing collection of type definitions
that are described
in the XML Schema Primer at hosted at the W3C
site. Although the list is imposing,
you will note that to
built an equivalent set of types in a standard programming
language
would call for the creation of a library of classes to represent each
of
the variations provided. XML does this at the primitive type
level.
Simple
Types Built In to XML Schema //
http://www.w3.org/TR/xmlschema-0
Simple Type |
Example (comma delimited) |
Notes |
string |
Confirm this is electric |
|
normalizedString |
Confirm this is electric |
3 |
token |
Confirm this is electric |
4 |
byte |
-1, 126 |
2 |
unsignedByte |
0, 126 |
2 |
base64Binary |
GpM7 |
|
hexBinary |
0FB7 |
|
integer |
-126789, -1, 0, 1, 126789 |
2 |
positiveInteger |
1, 126789 |
2 |
negativeInteger |
-126789, -1 |
2 |
nonNegativeInteger |
0, 1, 126789 |
2 |
nonPositiveInteger |
-126789, -1, 0 |
2 |
int |
-1, 126789675 |
2 |
unsignedInt |
0, 1267896754 |
2 |
long |
-1, 12678967543233 |
2 |
unsignedLong |
0, 12678967543233 |
2 |
short |
-1, 12678 |
2 |
unsignedShort |
0, 12678 |
2 |
decimal |
-1.23, 0, 123.4, 1000.00 |
2 |
float |
-INF, -1E4, -0, 0, 12.78E-2, 12, INF, NaN
|
2 |
double |
-INF, -1E4, -0, 0, 12.78E-2, 12, INF, NaN
|
2 |
boolean |
true, false, 1, 0 |
|
time |
13:20:00.000, 13:20:00.000-05:00 |
2 |
dateTime |
1999-05-31T13:20:00.000-05:00
|
2 |
duration |
P1Y2M3DT10H30M12.3S |
2 |
date |
1999-05-31 |
2 |
gMonth |
--05--, May |
2, 5 |
gYear |
1999 |
2, 5 |
gYearMonth |
1999-02 the month of February 1999,
|
2, 5 |
gDay |
---31, the 31st day |
2, 5 |
gMonthDay |
--05-31, every May 31st |
2, 5 |
Name |
shipTo , XML 1.0 Name type |
|
QName |
po.USAddress, XML Namespace QName |
|
NCName |
USAddress, XML Namespace NCName
|
|
anyURI |
http://www.example.com/ |
|
language |
en-GB, en-US, fr, valid for xml:lang as defined in XML 1.0 |
|
ID |
XML 1.0 ID attribute type |
1 |
IDREF |
XML 1.0 IDREF attribute type |
1 |
ENTITY |
XML 1.0 ENTITY attribute type |
1 |
ENTITIES |
XML 1.0 ENTITIES attribute type |
1 |
NOTATION |
XML 1.0 NOTATION attribute type |
1 |
NMTOKEN |
XML 1.0 NMTOKEN attribute type |
1 |
NMTOKENS |
XML 1.0 NMTOKENS attribute type |
1 |
Notes From the Table
(1)
To retain compatibility between XML Schema and XML 1.0 DTDs, the
simple types
ID, IDREF, IDREFS, ENTITY,
ENTITIES, NOTATION, NMTOKEN, NMTOKENS
should
only be used in attributes.
(2) A value
of this type can be represented by more than one lexical format, e.g.
100 and
1.0E2 are both valid float formats
representing "one hundred". However, rules have been
established for this type that define a canonical
lexical format, see XML Schema Part 2.
(3)
Newline, tab and carriage-return characters in a normalizedString
type are converted to
space characters before
schema processing.
(4) As
normalizedString, and adjacent space characters are collapsed to a
single space
character, and leading and
trailing spaces are removed.
(5) The "g" prefix signals time periods in the Gregorian calendar.
Custom Simple Types
XML Schema language
also permits defining custom simple types which are
extensions of
the XML built-in simple types. In fact many of the 'built in' types
are extensions of more primitive built in types. Such extensions
are called
restrictions, lists and unions. The following W3C
schools description of the
'simpleType' element, shows that the
'restriction', 'list' or 'union' are at the
heart of the elements
form.
Following is a form
description of the simpleType element which is the base
element
used to build custom simple types.
W3Schools Form Description of the simpleType element.
<simpleType
id=ID
// ID is optional, takes a unique identifier
name=NCName //
only used if the simpleType is a child of the schema element
any attributes //
optional, any other attributes
>
( annotation?,( restriction | list | union ) )
// bolded for emphasis
</simpleType>
// NCName - a 'non-colonized' name, a rname without a prefix.
Restrictions
Custom simple types are
enclosed in a <simpleType> element. In the case
or a
restriction, a <restriction> element is used and declared to a
'base' 'built-in'
type on which this restriction will be
configured. In the following example,
the 'xs:integer'
built-in type is used. We use elements called 'facets' to
further
restrict the range of integers we want to select. In the
next example the range
is limited to an integer between 100 and
999 inclusive.
Defining
myInteger, Range 100-999
<xs:simpleType
name="myInteger">
<xs:restriction
base="xs:integer">
<xs:minInclusive value="1000"/>
<xs:maxInclusive
value="9999"/>
</xs:restriction>
</xs:simpleType>
In an example from the
W3C site, the base of the restriction is 'xs:string'
type and
restricted by the 'pattern' facet to a range of uppercase
characters
between A and Z inclusive. The W3C recommendation
describes this type
as having been derived by restriction from the
simple type 'string'.
Example
<xs:simpleType
name="SKU">
<xs:restriction
base="xs:string">
<xs:pattern value="[A-Z]"/>
// hyphen denotes a range
</xs:restriction>
</xs:simpleType>
The
XML Schema 'pattern' element uses a regular expression language which
includes support for Unicode and is described in 'XML
Schema Part 2',
http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/#regexs
One more example of a
restriction based on the string built-in type is listed
below.
Here the facet is the 'enumeration' type. Here the strings provided
are abbreviations for states. An 'enumeration' element is
used to provide
a set of choices from which a single value
can be selected.
Enumeration Facet Example
<xsd:simpleType
name="USState">
<xsd:restriction
base="xsd:string">
<xsd:enumeration value="AK"/>
<xsd:enumeration value="AL"/>
<xsd:enumeration value="AR"/>
<!-- and so on ... -->
</xsd:restriction>
</xsd:simpleType>
Facets
Using Facets in Restrictions
The above examples use
'minInclusive', 'maxinclusive', 'patterns' and 'enumeration'
elements
to apply the limits to the restriction being created. Each of these
is a
member of a special class of elements called 'facets'.
Facets are used to fine
tune the specification of a custom simple
type. For those who like formal
statements, the following two
statements are each a part of the official definition
of a facet
and a value space.
Abbreviated
W3C Formal Definition of a Facet and a Value Space
Definition:
A 'facet' is a single, defining aspect of a value space. A
'value
space' is in turn defined as the set of values for a given
data type.
// paraphrase
A facet is a defining aspect of a value or set of values.
XML
Schema Language defines 15 facets, 12 of which can be applied to
simple types. Facets are themselves
built-in simple element types. Facets
allow greater control over
the specificity of definitions for
simple types.
Following is a bulleted list of the facets in
alphabetical order.
XML Schema Facets
enumeration // a list of acceptable values
fractionDigits // the max. # of decimal places allowed *
length // exact # characters or list items allowed *
maxExclusive // upper bounds for numeric values ( less than a value )
maxInclusive // " " " " ( less than or equal to a value )
maxLength // max. # of characters or list items allowed *
minExclusive // lower bounds for numeric values (must be greater than this value)
minInclusive // " " " ( must be greater than or equal to this value)
minLength // the min. # of characters or list items allowed *
pattern // the exact sequence of characters that are acceptable
totalDigits // the exact number of digits allowed. +
whiteSpace // white space handling (line feeds, tabs, spaces, and carriage returns)
//
"zero or greater, + greater than zero
Element Content
So
far, we have seen many combinations of elements nested inside other
elements.
The XML Primer goes on to describe three special cases. The
case
where an element contains only character data and has attributes,
the
case of the mixed type, where elements and character data
represent
the
combined content of a document and case where an element is
defined
that has
no content.
1) Declaring an element that has an attribute and contains a simple value
Declaring an element
that has an attribute and contains a simple value sounds
simple.
So what is the problem? The instance document might have an element
that looks like the following W3C example.
A tag in an instance document
<internationalPrice currency="EUR">423.46</internationalPrice>
As a starting point we
attempt to create a simple type as follows.
A simple type example
<xsd:element
name="internationalPrice" type="xs:decimal"/>
// can't add attribute to definition of a simple type
Here is the but! Simple
types can't have attributes. Solution? The solution
is to
derive a complex type that is based on simple content, using the
<simpleContent> element. In the next example the
xs:decimal type is
specified as an attribute of an 'extension'
element The extension element
itself contains an attribute set to
the 'string' type. The <complexType>
element is used to
house the structure. In the following example, the
complexType
element is in the 'anonymous' form being an inlined
sub-section
of the 'internationalPrice' element. ( In ''Developing Java
Web
Services" by R. Nagappan et. al the 'anonymous' form is
described
as 'implicit' or 'nameless'. )
Deriving a Complex Type from a Simple Type
<xsd:element
name="internationalPrice"> <!--
no type supplied in tag-->
<xsd:complexType>
<xsd:simpleContent>
<xsd:extension base="xsd:decimal">
<xsd:attribute name="currency" type="xsd:string"/>
</xsd:extension>
</xsd:simpleContent>
</xsd:complexType>
</xsd:element>
Summary
of Steps
// to
create an element that holds a simple value and has an attribute
1) Inside an element
nest a complexType tag.
2) Nest a simpleType content tag to
describe content.
3) Use an extension specify the base.
4)
Specify in an attribute tag the attribute name and it's type.
// A named comptexType tag of this variety would also be possible
2)
How to create elements that support mixed content
Notice in the following
W3C example, the text appears between elements
and their child
elements. The form 'inlines' text, elements and sub-elements.
Example
<letterBody>
<salutation>Dear Mr.<name>Robert
Smith</name>.</salutation>
Your
order of <quantity>1</quantity> <productName>Baby
Monitor</productName> shipped from our
warehouse on
<shipDate>1999-05-21</shipDate>.
....
</letterBody>
Following is the
schema that make this xml possible. The key to making
the mixture
possible is the use of the mixed attribute (highlighted). Setting
the
'mixed' attribute to 'true' allows character data to appear between
child
elements.
Example
<xsd:element
name="letterBody">
<xsd:complexType
mixed="true">
<xsd:sequence>
<xsd:element name="salutation">
<xsd:complexType mixed="true">
<xsd:sequence>
<xsd:element
name="name" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element name="quantity"
type="xsd:positiveInteger"/>
<xsd:element name="productName"
type="xsd:string"/>
<xsd:element
name="shipDate" type="xsd:date"
minOccurs="0"/>
<!--
etc. -->
</xsd:sequence>
</xsd:complexType>
</xsd:element>
The
XML Primer notes the mixed model in XML Schema is fundamentally
different
from the DTD mixed model used in
XML 1.0. Under the XML Schema mixed model,
the order and number
of child elements appearing in an instance must agree
with the
order and number of child elements
specified in the model. In contrast, under the XML
1.0 mixed
model, the order and number of child elements
appearing in an instance
cannot be constrained. This means,
the XML Schema provides full validation of
mixed
models while XML 1.0 provided only a partial
schema validation.
The
following example from the W3Schools web site shows an element from
an
XML instance that adheres to the mixed content schema dictated
in the element
declaration that follows.
Example from W3Schools Website
<letter> Dear Mr.<name>John Smith</name>. Your order <orderid>1032</orderid> will be shipped on <shipdate>2001-07-13</shipdate>. </letter>
Corresponding Schema
<xs:element name="letter"> <xs:complexType mixed="true"> <xs:sequence> <xs:element name="name" type="xs:string"/> <xs:element name="orderid" type="xs:positiveInteger"/> <xs:element name="shipdate" type="xs:date"/> </xs:sequence> </xs:complexType> </xs:element>
Empty Content
Now
suppose that we want the internationalPrice element to convey both
the
unit of currency and the price
as attribute values as in the following example.
Example <internationalPrice currency="EUR" value="423.46"/>
Such
an element has no content at all; its content model is empty. To
define
a type whose content is
empty, we use a type that first disallows anything
but elements in
it's content and then goes on to prevent
any elements from
being added. This way the type's content
model is left empty.
An Empty Complex Type
<xsd:element
name="internationalPrice">
<xsd:complexType>
<xsd:complexContent>
// complexContent yet no elements defined
<xsd:restriction
base="xsd:anyType">
<xsd:attribute name="currency" type="xsd:string"/>
<xsd:attribute
name="value" type="xsd:decimal"/>
</xsd:restriction>
</xsd:complexContent>
</xsd:complexType>
</xsd:element>
But, the complexContent
element with the restriction to 'anyType'
is the default form so
these elements may be eliminated to create
the more natural form
that follows.
//
'anyType' is the primordial root type of the schema data types.
Shorthand for an Empty Complex Type
<xsd:element
name="internationalPrice">
<xsd:complexType>
<xsd:attribute name="currency"
type="xsd:string"/>
<xsd:attribute name="value"
type="xsd:decimal"/>
</xsd:complexType>
</xsd:element>
Lists & Unions
The
XMLs Concept of a List Type
XML Schema makes use of
the concept of a 'list' type. List types are categorized
as simple
types because they are comprised of sets of 'atomic' types. Atomic
types
are the simple types and are considered indivisible. For
instance, the name token,
or NMTOKEN value "US" is an
atomic type. There are no intended sub-units of "US"
such
as "U" or "S".
List types are
represented as a white-space separated sequence of atomic
types.
Following
we rephrase the form element for 'simpleType' without the attribute
descriptions to
emphasize that 'simpleType' may include a list element.
Simplified
simpleType Form
<simpleType
> (annotation?,( restriction | list | union ))
</simpleType>
Creating New List Types
You can create new list
types by derivation from existing atomic types.
(You cannot
however create list types from existing list types, nor from
complex
types.) Following is the W3C example of a list of integer values.
//
implies you cannot extend an extension
W3C Example of a List of 'myInteger', Custom Integer Types
<xsd:simpleType name="listOfMyIntType"> <xsd:list itemType="myInteger"/> </xsd:simpleType>
The subsequent example
shows that a conforming XML instance element
can contain a space
separated list of this element's type.
W3C
Example of that Conforms to the Above list Type Definition
<listOfMyInt>
20003 15037 95977 95945 </listOfMyInt>
Union Types
While a list defines an
aggregation of several similar type values, a union adds
a level
of complexity allowing the creation of types that contain
multiple atomic
types which may include list types. A union type
is always a derived type and
by definition must be made up of at
least two different 'member' types.
The following example of
a union type from W3Schools. Notice the union type
element is
composed from the two following simple type definitions,
'sizebyno
and 'sizebystring'.
W3Schools Example
<xsd:element
name="jeans_size">
<xsd:simpleType>
<xsd:union memberTypes="sizebyno
sizebystring" /> // notice the space
separated types
</xsd:simpleType>
</xsd:element>
<xsd:simpleType
name="sizebyno">
<xsd:restriction
base="xsd:positiveInteger">
<xsd:maxInclusive="42"/>
</xsd:restriction>
</xsd:simpleType>
<xsd:simpleType
name="sizebystring">
<xsd:restriction
base="xsd:string">
<xsd:enumeration
value="small"/>
<xsd:enumeration
value="medium"/>
<xsd:enumeration
value="large"/>
</xsd:restriction>
</xsd:simpleType>
In the above examples,
legal values for the union type are an integer value from
1 to
42, inclusive, or one of either "small", "medium"
or "large".
All, Choice & Sequence Groups
Recall
the <complexType> element required a 'compositor' element, one
of
'sequence', 'all' or 'choice'. Compositors get their name from
the fact that
they create groups of elements.
Choice
Groups
XML
Schema language offers the ability to choose which element is shown
in an instance document using the
'choice' element. The choice group element
allows only one
of its children to appear in an instance. (Note 'choice'
allows
selecting between child elements while the earlier
'enumeration' element was
used in simpleTypes to provide a choice
of values for the type.) Following is
an example that allows the
choice of either an 'Air', 'Rail' or 'Sea' element.
Example
<xs:element name ="carrier"
>
<xs:complexType >
<xs:choice>
<xs:element ref`="Air" />
<xs:element ref="Rail"/>
<xs:element ref="Sea"/>
</xs:choice>
<xs:complexType>
</xs:element>
. . . .
The 'all' Element
The <all> element
allows elements to occur in any order. The elements in
an
all group appear as dictated by minOccurs and maxOccurs. In the
default
scenario, where both attributes are set to ' 1 ', all
elements must appear though
in any order. If 'minOccurs' is set
to zero in an element, this makes this element
optional. The
'maxOccurs' attribute cannot be greater than ' 1 '. In other
words,
no element in the content model can appear more than once.
The following W3C
example would permit child elements to appear in any order,
with
the comment type appearing optionally.
An all group Example
<xsd:complexType
name="PurchaseOrderType">
<xsd:all>
<xsd:element name="shipTo"
type="USAddress"/>
<xsd:element
name="billTo" type="USAddress"/>
<xsd:element ref="comment" minOccurs="0"/>
<xsd:element name="items"
type="Items"/>
</xsd:all>
<xsd:attribute name="orderDate" type="xsd:date"/>
</xsd:complexType>
The attributeGroup Element
It
is often convenient to group a set of attributes together and then
reference the group into an element. The following attribute
group
shows three attributes, that together form a complicated
data structure.
Bundling them into a group that can be used in
different sorts of similar
elements may provide a convenient way
to keep schema code readable.
<!-- attributeGroup replaces a set of individual declarations -->
W3C
Example of an attributeGroup Element Definition
<xsd:attributeGroup name="ItemDelivery">
<!--
an attribute based on a reference to an externally defined
attribute type -->
<xsd:attribute
name="partNum" type="SKU" use="required"/>
<!--
a regular simple attribute definition -->
<xsd:attribute name="weightKg" type="xsd:decimal"/>
<!--
a custom attribute type based on a restriction to an enumeration
-->
<xsd:attribute name="shipBy">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:enumeration
value="air"/>
<xsd:enumeration value="land"/>
<xsd:enumeration
value="any"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:attribute>
</xsd:attributeGroup>
The line that
references the definition below into an element reuses
the
attributeGroup element in conjunction with the 'ref' attribute.
W3C
Excerpt Showing an Attribute Group Being Referenced
.
. .
<xsd:attributeGroup ref="ItemDelivery"/>
. . .
Annotations
XML
Schema provides three elements for commenting schemas for human or
application information.
The
annotation element -The Annotation
tag is the parent element of the
documentation
and the appInfo elements. The 'documentation' and 'appInfo'
elements are nested inside a annotation
element.
The
documentation element - The
documentation element is recommended
for
providing humanly readable material. It is also recommended that the
xml:lang
attribute is used to
indicate the language of this information. You may also indicate
the language of all information in a schema
by placing an xml:lang attribute on the
schema
element.
Example <schema xml:lang="en"> // language for all schema information
The
appInfo element - The appInfo
element provides information for applications
that may be
associated with processing the XML document such as stylesheets or
graphics programs.
The
annotation will often appear at the beginning of a schema
construction. Following
is a W3C
example that shows annotation elements with enclosed documentation
elements used to comment the
internationalPrice element. The comments are offset
with
italics to show more clearly the role they play in the tag.
Annotations in Element Declaration & Complex Type Definition
<xsd:element name="internationalPrice">
<xsd:annotation>
<xsd:documentation
xml:lang="en">
element declared with anonymous type
</xsd:documentation>
</xsd:annotation>
<xsd:complexType>
<xsd:annotation>
<xsd:documentation
xml:lang="en">
empty anonymous type with 2 attributes
</xsd:documentation>
</xsd:annotation>
<xsd:complexContent>
<xsd:restriction base="xsd:anyType">
<xsd:attribute
name="currency" type="xsd:string"/>
<xsd:attribute
name="value" type="xsd:decimal"/>
</xsd:restriction>
</xsd:complexContent>
</xsd:complexType>
</xsd:element>
Self Test Self Test With Answers
1) True or False? The
root element of a schema document is the schema
element.
True \ False
2) True or False? A
simple Element can contain text and attributes.
True \
False
3)
True or False? The default value for minOccurs
and maxOccurs is 1.
True \
False
4) True or
False? Attributes, like simple elements, use minOccurs and
maxOccurs to control occurences.
5) True or False?
Default values only make sense if attributes are optional.
True \
False
6) True or False?
The Union element can be composed of different sorts of
simple and complex types. True \
False
7) True or False?
Declaring an element that has an attribute and contains a
simple value would require something like the following. True \
False
<xsd:element name="message" type="xsd:string"/>
8)
What is the attribute that allows an element to support mixed
content?
________
Exercise
Create a schema to
govern a client record for a company.
The record itself will
be a complex element with attributes
that specify a date the
record was created, and an attribute
that holds a unique
identifier. ( This might be created as
an attribute element that
is typed to the built in 'ID' type.)
<>Nested in the
record element will be a complex element that
contains a sequence
of elements representing a client's first
name, an initial and a
last name. Make the initial an optional
element utilizing the
'minOccurs' attribute.
Create a second
complexType element that holds address
information. This element
should contain elements that
reference elements named 'street',
'city', 'country' and 'postalCode'.
The 'street' and 'city'
element will be a simple element that uses
the built in schema
string type. The 'country' element will be a
simple type derived
by restriction that allows a enumeration of
abbreviated tokens
limited to countries in North America.
The postalCode element
will be a complexType element that
offers a choice of two simple
elements that represent ZIP or
POSTAL code.
//
You can do the ZIP and POSTAL code as simple string types
//
or if you wish use a derived simpleType and the pattern facet
//
to limit characters to those appropriate for each format.
Create an element using
a 'all' element to classify this client
as a cash customer, an
private account holder, a corporate
representative or all three.
If a corporate representative, the
name of the company should be
specified.
//
an all group with all elements marked optional including
//
company name would allow combinations of elements
//
to be specified.
Use schema shells
described earlier in the note to complete your
schema
document. Create an instance document that adheres to
your
schema definition and validate the document using any of the
validation
methods suggested.
If
this is all mystifying and you are quite new to XML you may
use
the following summary to help you organize this assignment.
Summary of Requirements
Client_Record date_attribute ID_attribute // root element
name
element
// a complex type
sequence
first name
initial // optional
last name
//
closing tags
address
element
sequence
element ref="street"
element ref="city"
element ref="country"
element ref="postalCode"
element
street
element city
element country
// simple derived type based on enumeration facet
element postalCode
// choice of two simple or derived simple types
// closing tags
clientType
all
cash
private account
corporate account
companyName
// closing tags