SQL & Databases Introduction


SQL Overview       


Structured Query Language

SQL, an acronym for Structured Query Language, is,
for the most part, a vendor-neutral programming language
designed for exchanging data with databases. SQL,
(many pronounce it 'sequel'),  has been standardized
by ANSI, the American National Standards Institute.
The standard version is called ANSI SQL. Fortunately,
though all database vendors provide their own extensions,
they all support the ANSI version of SQL.

// everyone supports ANSI SQL


A Brief History

SQL was originally developed at IBM in the late 1960s
under the direction of Ted Codd. Many companies have
developed different versions of the language. In the
early 1980's, ANSI, the American National Standards
Institute, began work on the standardization of SQL.
ANSI published a specification in 1986 and ISO, the
International Standards Organization published their
standard in 1987. ANSI and ISO worked jointly on a
standard was called variably, SQL2, SQL-92 or SQL/92.

// ANSI published in 1986 & ISO in 1987

This version is today's de facto standard while the newer
SQL3, which supports object-oriented and other scientific
features is now gaining wide-spread acceptance.

Relational Database Overview

Databases are all around us. When you shop for a car
part or use your bank card you are accessing a database.
A 'database' is a collection of stored data, the collection
normally referring to a set of tables. The software that
manages a database is called a Database Management
System or DBMS.
  A 'relational database' is one that
stores data in data structures called 'tables'.

// DBMS manages a database, relational databases use tables


Tables Are Made Up of Columns

// columns store data of the same type

Tables provide a useful way to organize data in files.
Tables are made up of one or more columns. The data
in any given column is all of the same type. ( SQL has
specific names for the types that can be stored in
columns. )

Rows Are Distinct Sets of Values

Data in tables are stored in 'rows'. Each row will have a
value entry for each column in the table. A 'record' refers
to the set of data contained in a single row. In the relational
model, a table is referred to as a 'relation' which is where
the table model gets the 'relational database' name.

// a table is referred to as a relation

Below are two very simple tables.

Example 1
 
          Customer 
          John Doe 

//  here's a table of one column and one row with
//  a single field containing the value John Doe

Example 2
 
   Customer    Address
  John Doe   RR# 2 Oron
  Bill May   12 June St., Springstown

//  here's a table of two columns and two rows
//  there are two records stored here in the two rows

 
The Primary Key Holds the Unique Identifier for Each Row

Each row in a table requires a unique identifier. The column
containing each row's unique identifier is called the primary
key
.
This identifier may exist in one of the columns of the
table or may be composed from a set of columns.

Typically a unique number is used to describe each record
whether a customer, order or an account. The primary key
makes extraction of the information from each row possible.
Integers make good primary keys as is shown in the following
example where the Account_No values are the primary key
column.

Example 3
 
Account_No
   Customer    Address
 1000001
  John Doe   RR# 2 Oron
 1000002
  Bill May   12 June St., Springstown

// a key is a column, each row will have a primary key value that
// identifies it


Primary Keys

There are some rules governing establishing primary keys.
No two rows can share the same primary row value. Every
row must have a primary key value ( that is not NULL ).
The column containing key values can't be modified.
Primary keys can't be reused.

These rules serve to ensure every record has a unique
identifier which cannot be replaced or reassigned. When
multiple columns are used to create keys, the rules
extend to all the columns.

Rules Governing Primary Keys



Foreign Keys

A table may contain a column (or columns) corresponding
to primary key of another table. This is called a foreign key.


These and other rules came out of seminal work done by
Ted Codd who worked at IBM in the late 60s. The rules
Codd espoused made the use of a Structured Query
Language a requirement of the relational model. This led
to the development of SQL as we know it today.

Popular Terminology

Following is one definition of a Relational Database
Management Systems Definition
.

RDMS - A database management software system that
organizes data into a series of records that are stored in
linked tables. This provides the ability to relate different
records, fields and tables, and aids data access and data
transformation.

reference:  www.agriculture.purdue.edu/ssmc/Frames/newglossery.htm

Relational Database Management Systems allow for
multiple users to use a database under the overview
of a system administrator. A user can create tables
and control who may use them.

Following is a definition for schema.

schema - "The word schema comes from the Greek word
"σχήμα" (skhēma), which means shape or more generally
plan.                

reference: http://en.wikipedia.org/wiki/Schema

Schema is used in computer programming languages
to describe the overall structural relationship between
entities. In databases, a schema describes how a set
of tables are related. To offer a different context, a
schema in XML describes the hierarchical structural
pattern that a document will take.

A 'System Catalog' or 'Data Dictionary' are tables a
database system uses to describe the tables in a
schema.

A connection to a database is often referred to as a
session.



SQL Language Organization


Data Manipulation Language (DML)

The set of SQL commands that are used to manipulate the
contents of a database
is called the 'Data Manipulation
Language'. These commands are summarized in the

following table.

DML Commands
 
  SELECT    Retrieves data from the database. 
  INSERT    Inserts rows into a table
  UPDATE   Updates rows in a table.
  DELETE   Deletes rows from a table.


Data Definition Language (DDL)  // structural commands

The set of  commands used to change the structure of
database tables is called the Data Definition Language.
Following are SQL commands are used to create, delete
or alter database tables.


DDL Commands

 
 CREATE   Creates a new database object.
 DROP    Drops an existing database object. 
 ALTER    Modifies the structure of a database object. 


A Few SQL Examples

Usually a database will already exist and information
is retrieved from it using SELECT statements or if there
needs to be a change to the table an UPDATE statement
is used. If you are creating a database from scratch you
will need to create tables to store your data.
 

The CREATE Statement

The CREATE command is followed by the keyword
TABLE and the name the table will be given. Following
these keywords are the column names in parentheses
which must be unique, followed by their type.

In addition a setting is specified to determine if the initial
value of fields of columns can be NULL. NULL represents
an absence of a value. A row does not need to specify a
value for columns that are specified as NULL. Here is an
example of a table being created using the CREATE
command. In this case NULL values are not accepted.


CREATE Example

CREATE TABLE  Cars
    (
    InventoryNo  INTEGER     NOT NULL,
    Make         CHAR(40)    NOT NULL,
    Model        CHAR(40)    NOT NULL,
    Year         CHAR(10)    NOT NULL
    );

// column names are followed by type

Once a table has been created data can be inserted into
it. Values stored can be changed via updates. Data also
can be selectively removed. The commands to do these
operations are Insert, Update and Delete.
 

The INSERT command

Once a table has been created data can be inserted into
it. INSERT is used to insert rows or parts of rows of data
into a table. Following the INSERT command the name of
the table is specified. Following the VALUES keyword,
the row items to be inserted are specified.


INSERT Example

INSERT INTO Cars
   VALUES ( 82123,
            'Ford',
            'Focus',
             2000
          );

  // inserts actual data values


The SELECT Statement

Once a table is created, perhaps containing a companies
inventory, users will want to access the information. This
is where the SELECT statement comes in. It is perhaps
the most used statement in SQL.  A select statement is
also called a 'query' because it is asking for information
from the database. In it's simplest form it can be used
to select all the information in the rows and columns of
a table as in the following example.

SELECT Example   

SELECT * FROM Cars;

Assuming that a number of INSERT statements were
performed the result of a SELECT everything statement
like the above might be a table that looks like the following.


OUTPUT

 
 InventoryNo  Make    Model   Year
 00123  Ford  Taurus Wagon  1996
 00456  Toyota   Camry   1997
 00789  Chrysler   Intrepid   2000
 01401   Jeep  Cherokee  1993
 01578  GM   Pickup  1988


If a Single column is specified the column of values will
be returned.
 

Example   SELECT Make             
                FROM  Cars;

// specify column(s) after SELECT

The output of this statement would be a table like the following.

Output
 
 Make
 Ford 
 Toyota
 Chrysler
 Jeep
 GM


WHERE Clauses

A number of conditions can be placed on what is selected
using WHERE clauses. The following example selects
vehicles whose year is less than 1996.  


Example

SELECT Make, Model, Year
FROM cars
WHERE Year > 1996;


Output

 
 Make    Model   Year
 Toyota  Camry  1997
 Chrysler   Intrepid   2000


The UPDATE command

To change data in a table the UPDATE command
is used. The statement takes the following form.

UPDATE Example

UPDATE table_name
SET column_name = 'new_value'
WHERE key_name= '10101';

// for example reset a price

Notice the UPDATE statement first addresses the table
name and then specifies the column that the new value
will be assigned to with the SET keyword. The condition
specifies which row will have it's data updated. If there
was no WHERE clause, all the row's values in this
column would be changed to 'new_value'.


UPDATE Example 2

UPDATE Cars
SET Model = 'sedan DX'
WHERE key_name= 'lexus';

// with no conditional where phrase all row
// values in column would be effected


The DELETE Command

To complete our mini-tour of SQL we add the command
that is used to remove data from a table.  One, many or
all the rows of a table can be deleted using the
DELETE
command. To delete a row use the following form.


DELETE Example

DELETE FROM Cars
WHERE key_name= 'lexus';


Omitting the WHERE clause will cause every row to be
deleted from the specified
table so you have to use
DELETE with care! Delete is a row specific operator
and
takes no column names or wild cards.

// DELETE is a row specific operator

There of course is a lot more to SQL. This gives you
the basic idea and the most commonly used commands.
 

Result Sets

The rows that satisfy the conditions specified in an SQL
query are called the 'result set'. The number of rows
returned can be any number including zero. A cursor is
supplied which is a kind of file pointer that provides the
means of accessing the data from the result set one row
at a time.


Transactions // makes a set of SQL commands 'atomic'

If a database is being accessed concurrently by two
users and both are accessing the same range of data,
where one is updating while the other is selecting, there
is the possibility that the user doing the selection will
get an incorrect result set containing only partially
updated information. To avoid this situations database
management systems use transactions to ensure data
consistency.

// transactions ensure data consistency, like a lock

Transactions allow several users to use a database at
the same time without there being a danger of obtaining
inconsistent data.

A Transaction defines a set of SQL statements with a
well-defined start and end point. The set of statements
work as a unit of change for a database. A transaction
is ended with either a COMMIT or ROLLBACK command.

The COMMIT command makes the changes described
in the set of SQL statements permanent. The ROLLBACK
statement 'undoes' any changes that were made by the
transaction statements.  A lock is a device that prevents
two transactions from manipulating data at the same time.
A lock synchronizes access to data. A row lock prevents
two transactions from modifying the the same row
simultaneously.


An Oracle Example

insert into R values (1, 2);
rollback;
insert into R values (3, 4);
commit;

// from 'Oracle 9i SQL versus Standard SQL,
// from http://infolab.stanford.edu/~ullman/fcdb/oracle/or-nonstandard.html#transactions


Local vs Global Transactions

A 'local transaction' is one executed against a single
Data
Resource. A transaction that is coordinated by
a transaction manager across multiple
heterogeneous
Data Resources is called a 'global transaction'. 


Stored Procedures

A stored procedure is a set of SQL statements that can
be called by name the same way a function or method
can be called. A stored procedure is a routine that is
stored with the database. Normally stored procedures
are written in scripts provided by the database system
software.  Once a stored procedure is written it is ported
to the DBMS and stored so that it can be called by name
anytime it is needed.

The use of stored procedures have advantages. The code
can be executed quickly and efficiently on the database
which may lead to an improvement in  the efficiency of
network exchanges.

Metadata 

Most databases maintain tables that describe layout details
of the various tables that the database contains, for example,
"What are the column names of a table?".
 
Each database management system supplies functions
for
getting this information.


Database Ecology



There are a plethora of databases available for our use.
Each has it's own idiosyncrasies so all our learning efforts
are not entirely transferable between databases.

Just to mention a few of the more famous database system,
there are:

Commonly Known Commercial Database Systems

Home Office Databases


Popular Open Source Database Systems


Which one to Use

You are free to use any of these databases if you have
access to them. This may be the case if you are learning
SQL for work purposes in which case a particular database
is available to you. At the school, life is simplified if we use
one of the open source databases.

MySQL

MySQL is freely available, and a proven performer that
doesn't impose costly licensing agreement the the user
unless the database is being used in a commercial
application. 

Derby

Derby is a new database being developed by the Apache
foundation. It is widely supported in IBM's Eclipse. One might
need to supply a 'front end' for it or use Eclipse. Eclipse itself
is a handful to master, so only consider this path if you are
familiar with the editor.

PostgreSQL

Not as famous as MySQL, but popularly distributed with
Linux, along with MySQL, PostgreSQL is another option
that may be investigated.

General Strategy

If you have no particular preferences, MySQL should probably
be our base database and each student might carry one other
database along to explore the variations in approach to building
database systems.


Assignment


Find the main pages for three of the commercial
and three of the open source databases, and cut
and paste one paragraph that represents the companies
description of their database product. (Show the paragraph
in quotes and note the web page URL. )

// make sure you read the paragraph!

This survey may influence the product(s) you wish to
investigate and use while doing the assignments for the
course. You may optionally include a statement of your
general impression of that database product.

Submit the six paragraphs and any comments you have
added as Assignment 1, SQL Overview,  and include your
name at the top of the first page.