Database Types & Modeling
Peter Komisar         ©           Conestoga College

References:  "Draft Document for Review August 22,2007
5:25 pm SG24-7501-00 Rational Application Developer
V7 Programming Guide"
Ueli Wahli, IBM, et.al.
, 'UML Data
Modeling Profile'  Davor Gornik, IBM, MySQL 5.0.5 Manual,



Database Types


It is important to understand the limitations

of the the different database types. All databases
support many similar types but frequently they
have their own proprietary type names as well.
Following are types used in MySQL 5.0.5 leaving
out many extra details.


Sample Numeric Types Using MySQL

A summary of the numeric data types follows. 

The maximum legal display width is 255, represented by an
m in the type definition forms listed below. This is just how
a number shows and doesn't relate to it's range.

ZEROFILL will fill empty spaces, where not significant
with zeros.

UNSIGNED, causes the full range of the type to be
expressed as positive numbers.

Following is a general description of types found in
MySQL. You will find variations with types described
in your favorite database.  First we look at the Number
types. There are three sub-categories of number types,
integral types, floating point types and fixed decimal
point types.


Integral Types


Discrete Number Types
// BIT is a synonym for TINYINT(1).
Example


Note though, the values TRUE and FALSE are aliases for 1 and 0.


Floating Point Types


Fixed Point Number Type


String Types


In MySQL the string types are:

The CHAR and VARCHAR Types

Both CHAR and VARCHAR are declared with the
length indicating the maximum number of characters
to be stored.


Example


CHAR(24) // holds up to 24 characters.

The length of a CHAR column is fixed to the length that you
declare when you create the table, a value from 0 to 255.
When CHAR values are stored, they are right-padded with
spaces to fill to the specified length. On retrieval, trailing
spaces are removed.

Values in VARCHAR columns on the other hand, are variable-
length strings. The length can be specified as a value from
0 to 255 before MySQL 5.0.3, and 0 to 65,535 in 5.0.3 and
later versions.

// The maximum column length is subject to a row size of 65,532 bytes.

VARCHAR only stores as many characters as are needed,
plus one byte to record the length. VARCHAR doesn't pad.

// two bytes for columns that are declared with a length longer than 255
// as of MySQL 5.0.3 trailing spaces are stored and retrieved

Exceeding column maximum length, results in a value that
is truncated to fit. If not spaces, a warning is generated.


The BINARY and VARBINARY Types

BINARY and VARBINARY are as a pair similar to CHAR
and VARCHAR however they store binary rather than
non-binary strings. They have no character set associations.

The BINARY and VARBINARY data types are distinct from the
CHAR BINARY
and VARCHAR BINARY data types. In the latter
types, the BINARY attribute does not cause the column to
be treated as a binary string column.

// Instead, it causes the binary collation for the column character
// set to be used, and the column itself contains non-binary
// character strings rather than binary byte strings. 

BINARY type is right padded. VARBINARY has no padding on
insert and no bytes are stripped on select.

Example

mysql> CREATE TABLE t (c BINARY(3));
Query OK, 0 rows affected (0.01 sec)

mysql> INSERT INTO t SET c = 'a';
Query OK, 1 row affected (0.01 sec)

mysql> SELECT HEX(c), c = 'a', c = 'a\0\0' from t;
+--------+---------+-------------+
| HEX(c) | c = 'a' | c = 'a\0\0' |
+--------+---------+-------------+
| 610000 | 0 | 1 |
+--------+---------+-------------+
1 row in set (0.09 sec)


The BLOB and TEXT Types

A BLOB is a binary large object that can hold a variable
amount  of data. The four BLOB types are:

BLOB columns hold binary strings.

The variations differ only in maximum length.

The four text types are:

TEXT columns hold character strings. TEXT
columns are associated with character sets
and their values are sorted and compared based
on collation of the character set.  

In most respects, you can regard a BLOB column as
a VARBINARY column that has no size limitation. Similarly,
a TEXT column may be viewed as an unlimited VARCHAR
column. 

There is no trailing-space removal for BLOB or TEXT columns.

Because BLOB and TEXT values can be extremely long, you
may encounter some limitations when using them.  (For
example the default value of max_sort_length is 1024).

// this variable can be increased in size

The maximum size of a BLOB or TEXT object is practically
limited by the memory that is available and the size of the
communication buffers.


The ENUM Type


"An ENUM is a string object with a value chosen from a list of
allowed values that are enumerated explicitly in the column
specification at table creation time.

An enumeration value must be a quoted string literal; it may not
be an expression, even one that evaluates to a string value. "
                                                           - MySQL documentation

You can create a table with an ENUM column as follows.


Example

CREATE TABLE sizes (
name ENUM('small', 'medium', 'large')
);

However, following does not work:

CREATE TABLE sizes (
c1 ENUM('small', CONCAT('med','ium'), 'large')
);

// a string literal form must be used

Each enumeration value has an index, numbered starting
at 1 and the index of a NULL value is NULL. The index
used with enums have nothing to do with indexes associated
with tables. Another limit, in MySQL an enumeration is limited
to 65,535 elements. Trailing spaces are automatically deleted
from ENUM member values in table definitions.


The SET Type


"A SET is a string object that can have zero or more values,
each of which must be chosen from a list of allowed values
specified when the table is created." - MySQL Documentation

SET column values are specified separated by commas.
Accordingly, member values should not themselves contain
commas.

Example

A column specified as SET('one' , 'two') NOT NULL can have
any of the following values:

Values

''
'one'
'two'
'one,two'

A SET can have a maximum of 64 different members. Trailing
spaces are automatically deleted from SET member values in
the table definitions.


Date and Time Types

The MySQL types in this category are:

Each temporal type has a range of legal values, as well as a “zero
value to represent illegal values. 


DATETIME, DATE & TIMESTAMP Types

These types are related are all related in that they deal
with time values.

The DATETIME type is used when both date and time information
is needed and is displayed in  'YYYY-MM-DD HH:MM:SS' format.
The supported range is '1000-01-01 00:00:00' to
'9999-12-31 23:59:59'.

The DATE type supplies only a date value without a time part.
The  supported range is '1000-01-01' to '9999-12-31'.

You can specify DATETIME, DATE, and TIMESTAMP values
using any of a common set of formats:

  • 'YYYY-MM-DD HH:MM:SS' or 'YY-MM-DD HH:MM:SS' format.
    A “relaxed” syntax is allowed: Any punctuation character may be
    used as the delimiter between date parts or time parts.


Example of MySQL Equivalent Forms

  • '98-12-31 11:30:45'
  • '98.12.31 11+30+45'
  •  '98/12/31 11*30*45'
  • '98@12@31 11^30^45' are equivalent.
Many other forms are permitted. // see the manual

For interest, following is a microsecond selection.


Example
// from MySQL manual
mysql> SELECT MICROSECOND('2010-12-10 14:12:09.019473');
+-------------------------------------------+
| MICROSECOND('2010-12-10 14:12:09.019473') |
+-------------------------------------------+
| 19473 |
+-------------------------------------------+

The Time Type

"MySQL retrieves and displays TIME values in 'HH:MM:SS' format
or 'HHH:MM:SS' format for large hours values. TIME values may
range from '-838:59:59' to '838:59:59'." A number of formats
are permitted. // see manual


The YEAR Type


YEAR is a one-byte type used for representing years
and is displayed in YYYY format. The range is 1901
to 2155. As with the other temporal types variations
in format are permitted.

// MySQL itself is / was Y2K safe!


Database Modeling

Types of Database Users


As a general categorization there are three types of
database users:
// the database developer
// the database designer


Unified Modeling Language

Database technology is extremely reliable and proven
over three decades of use. However, the coupling of
database and and object-oriented technology has been
a little like mixing oil with vinegar.

One technology that has taken a high ground to
see past the differences between different programming
languages and data management processes is UML.
Ratified as a standard in 1997, UML has sought to
create ways to design better software systems.

While it's primary focus has been software development,
it has introduced the idea of 'profiles' to accommodate
designs in specific information niches.

"Profiles customize the UML to a domain without leaving
the standard of the language" - UML Data Modeling Profile

Following is a brief overview of the sorts of thing UML
is designed to model and then we will look at some of
the details of the new UML Data Modeling Profile.


UML Brief Overview


"A model - is a description of a system from a particular
perspective, omitting irrelevant details so that the
characteristics of interest are seen more clearly."

Models are useful for
Modeling promotes
UML is a standardized language for modeling different
aspects of an application.

UML can be used to:
// a UML favorite word, 'artifact'  -'anything made
// by human work or art' - Webster's dictionary

 

The language uses three kinds of building blocks:

Things

Things are the basic elements of a model.

The UML defines four kinds of things:

Relationships


UML defines five kinds of relationships describing
how element are related:

Diagrams


"A diagram -is a graphical presentation of a set of
elements, most often rendered as a connected graph
of things and their relationships."

UML provides thirteen types of diagrams allowing
capturing, communicating and documenting all aspects
of an application in a standard graphical notation.

The diagrams are categorized into three groups:



UML Data Modeling Profile


Modeling Benefits for Databases


"The data model diagram with tables, views, relationships
between tables, dependencies of views, and stored procedure
containers is the exact representation of a part of the data
dictionary. The data administrator can recognize the structure of
the database in a more readable graphical representation.

// a way to map in an easily readable form the complexity
// of a database system

On the design side, the tuning of the database is much easier
with a graphical representation, because you are able to see
the content of a table and the documentation of every detail."

                                         - 'UML Data Modeling Profile', IBM


UML's Authors are Behind the UDMP
// high momentum value

The original authors of UML which is an open standard
managed by the OMG or Object Management Group,
had formed a software company called Rational.
This company,  was recently bought by IBM. So now we
refer to IBM Rational’s UML Data Modeling Profile.

This profile provides an easy to use and understand
adoption of UML for the need of database modeling,
and database design.

The basic object set  of UML Data Modeling Profile


Node

"Node -  The representation of the physical entity (computer) where
the database can be located is a node. The representation is part
of the core UML.The nodes are used in a deployment diagram,
which represents the physical configuration of the software deployment.
The diagram includes the nodes and the connections between nodes.
The connections represent the communication protocols."

                                                           - UML Data Modeling Profile, IBM,


Nodes in deployment diagrams are useful for mapping physical
system layouts.


Information from a Deployment Diagram Sample


WebServer__
XML__Application Server__Database Protocol__Database Server(1..n)

// UML Diagrams use specific symbols , for this diagram each square should
// appear in a 3D box.


Tablespace // physical storage unit --> db files

The Tablespace represents a database system. It is the memory
real estate on the node where the database resides, representing
the layer between the users notion of the database and the physical
machine or node.The Tablespace is represented as a UML 'stereotyped
component'  in the Data Modeling Profile.

The UML Stereotype


Stereotypes are one of three extensibility mechanisms in
Unified Modeling Language.

Stereotypes allow you to extend the vocabulary of the UML so that you can create new model elements, derived from existing ones, but that have specific properties that are suitable for your problem domain. They are used for classifying or marking the UML building blocks in order to introduce new building blocks that speak the language of your domain and that look like primitive, or basic, model elements.

For example, when modeling a network you might need to have symbols for representing routers and hubs. By using stereotyped nodes you can make these things appear as primitive building blocks.

Graphically, a stereotype is rendered as a name enclosed by guillemots, <<tablespace >>,  and placed above the name of another element.

                                          -  Stereotype(UML) - Wikipedia



"Tablespace can be best understood as an area on the physical storage,
which is maintained by a database." // the database data files

The following diagram parallels the earlier physical deployment
diagram but at the software component level. Here the database
files are employed by the networked application.

Tablespace Image
 //
image from
- 'UML Data Modeling Profile',IBM,

IBM's Tablespace diagram


Database


The Database is the system. It is used to access the data stored
on the physical machines. The Database is represented as a
stereotyped component in the UML Data Modeling Profile.

The database defines:

The Database component is used with other component types
in a component diagram. This diagram defines the dependencies
between the application and databases.


Schema

The basic organization unit of tables is the schema. In UML,
a schema is represented by a 'package'. The package is the
basic organizational unit of UML. A Schema is a stereotyped
package in the UML Data Modeling Profile.

Facsimile of a Stereotyped Schema Package in UML
 // looks like a folder with a tab

 ____
|    |___________
  <<schema>>    |
|                |
|________________|


Schemas work at the next level of design, and
are organized into another UML unit, the class
diagram. The schema also works as a security
element and is used to enforce access privileges.

A schema element should be assigned to a database
to define database language constraints, data types,
triggers available and stored procedures types.


Table


The table is a basic database modeling structure
database, representing rows of data which are sets
of records sharing the same structure. Information
about the structure of a table is itself stored in the
database.

In the UML Data Modeling Profile, a table is a
stereotyped class.

Facsimile of Table Diagram in the
UML Data Modeling Profile
 // adapted from - 'UML Data Modeling Profile', IBM,

                        Chat         // table symbol
PK  Chat_ID : NUMBER
FK  Employee_ID : NUMBER
FK  Customer_ID : NUMBER
      active : NUMBER
<<Check>> YesNo( )
<<PK>> PK_Chat26( )
<<FK>> FK_Chat27( )
<<FK>> FK_Chat26( )
<<Index>> TC_Chat59( )
<<Index>> TC_Chat61( )



View

Views are virtual tables. They represent data in the same
form as tables. The only difference is that the data is being
obtained from other tables.

Views have a symbol in the UML Data Profile similar to
a table system however the T symbol in the upper right
hand corner is has a broken-line boundary. They may
also been shown in data model diagrams with the tables
they are 'derived' from.


Diagram Showing a View Derived from Two Tables
// image from - 'UML Data Modeling Profile', IBM


IBM's View Diagram

Column

The column is a basic organizational element in a database. All data is
stored table column ( in association with rows). Columns are described
as stereotyped attributes in, UDMP, if we may now use an acronym.
Other tagged values with columns, specify details such as whether the
value may be null and whether the value is unique.

// NULL | NOT NULL, PK | FK


Key


Primary keys uniquely identify rows and are used to access
tables. Foreign keys access data in related tables. Primary
Keys are often automatically generated to facilitate data
updates. Foreign keys  are always derived from a relationship
to other tables.

Key columns are tagged with one of the following stereotypes:

Table with Primary and Foreign Keys
// image from - 'UML Data Modeling Profile', IBM

IBM UML Table with Keys
Index

An index is a data structure that supports faster data access.
Indexes have no impact on the actual data stored. The Index is
represented as a stereotype in the operation section of a UML
class diagram. Data Modeling Profile.

An Index, like a Key, may contain several columns. The order
of the columns in an index is critical. The Index specification
contains not only the columns of the index, but also the index
type.


Constraint


"A constraint is a rule applied to the database’s structure. The
rule can be applied to the column and/or table and can be
limited to a schema or a database." - 'UML Data Modeling Profile', IBM

The UML Data Modeling Profile defines several types of
constraints. All are defined as stereotypes.


List of Constraints


Relationship

"A dependency of any kind between tables in a data model
is called a relationship. A relationship is a summary of a
stereotyped association and a set of primary and foreign keys.
Every relationship is between a parent and child table, where
a parent table must have a primary key defined. The child table
creates a foreign key column and foreign key constraint to address
the parent table."                  - 'UML Data Modeling Profile', IBM

Non-identifying Association

A non-identifying association represents a relationship between
two independent tables. The foreign key of the child table does
not contain all of the primary key columns.

// a relationship between two independent tables.


Non-Identifying Relationship
// image from - 'UML Data Modeling Profile', IBM

IBM's Non Indentifying Relationship

An Identifying Relationship


"An identifying relationship is a relation between two dependent
tables, where the child table cannot exist without the parent table.
All of the primary keys of the parent table (Person, in this example)
become both primary and foreign key columns in the child table
(Account).

A relationship has two roles associated with it. They define the role
of one table in association with the other. It is possible to assign
more than one relationship between two tables using different roles.
Each relationship creates migrated keys from the parent to the
child table."                                - 'UML Data Modeling Profile', IBM


Identifying Relationship
// image from - 'UML Data Modeling Profile', IBM

IBM's diagram of an identifying relationship

Assignment


We have to jump ahead and use create, and insert
statements which we will formally cover later.


Part A

Following is the specification for the 'STORE' table used
in the course manual.

STORE

STORE NUMBER     NUMBER
MANAGER_ID          NUMBER
STREET                     VARCHAR2(45)
CITY                           
VARCHAR2(25)
STATE                        CHAR(2)
ZIP                               CHAR(9)
AREA_CODE            CHAR(3)
PHONE_NUMBER    CHAR(7)
SALES_TAX_RATE  NUMBER(5.5)

1 ) Convert this table to a Canadian table.
 
(i)   state to province
(ii)  zip to postal code
(iii) tax to GST & PST  // perhaps decimal type

Modify types as appropriate for the database
you have selected.

//  STORE_NUMBER  is underlined which probably indicates it is the primary key.
// VARCHAR2 is probably an intermediary sized VARCHAR type

2 ) In the database you have selected create a database
with an identifier that indicates this is a business of
some type.

3 ) Create a the Canadian version of the above
      table.

4) Use INSERT statements to enter fictitious (or not)
stores two for most provinces. You might have one for
P.E.I. and three or four for Ontario and one other province.

5 )Hand in a printshot of a SELECT ALL statement
on the database table after it has been created.


PART B


Following is a chart from the Sun site comparing
the types for different databases. Create a list
of the 'common denominators' or those types
supported by all the databases.

Put asterisks after types that see a lot of variation
between databases. Note the lowest range value
that would be supported by all the databases.

Submit the list.


Tables from 'Mapping SQL and Java Types', The Sun site
// for reference only
 

http://java.sun.com/j2se/1.5.0/docs/guide/jdbc/getstart/mapping.html#table1

Sun's Database Types Comparison