1_SQL_Introduction

SQL & Databases Introduction

SQL Overview

Structured Query Language

SQL, an acronym for Structured Query Language, is,
for the most part, a vendor-neutral programming language
designed for exchanging data with databases. SQL,
(many pronounce it 'sequel'), has been standardized
by ANSI, the American National Standards Institute.
The standard version is called ANSI SQL. Fortunately,
though all database vendors provide their own extensions,
they all support the ANSI version of SQL.

// everyone supports ANSI SQL

A Brief History

SQL was originally developed at IBM in the late 1960s
under the direction of Ted Codd. Many companies have
developed different versions of the language. In the
early 1980's, ANSI, the American National Standards
Institute, began work on the standardization of SQL.
ANSI published a specification in 1986 and ISO, the
International Standards Organization published their
standard in 1987. ANSI and ISO worked jointly on a
standard was called variably, SQL2, SQL-92 or SQL/92.

// ANSI published in 1986 & ISO in 1987

This version is today's de facto standard while the newer
SQL3, which supports object-oriented and other scientific
features is now gaining wide-spread acceptance.

Relational Database Overview

Databases are all around us. When you shop for a car
part or use your bank card you are accessing a database.
A 'database' is a collection of stored data, the collection
normally referring to a set of tables. The software that
manages a database is called a Database Management
System or DBMS. A 'relational database' is one that
stores data in data structures called 'tables'.

// DBMS manages a database, relational databases use tables

Tables Are Made Up of Columns
// columns store data of the same type

Tables provide a useful way to organize data in files.
Tables are made up of one or more columns. The data
in any given column is all of the same type. ( SQL has
specific names for the types that can be stored in
columns. )

Rows Are Distinct Sets of Values

Data in tables are stored in 'rows'. Each row will have a
value entry for each column in the table. A 'record' refers
to the set of data contained in a single row. In the relational
model, a table is referred to as a 'relation' which is where
the table model gets the 'relational database' name.

// a table is referred to as a relation

Below are two very simple tables.

Example 1

Customer

John Doe

// here's a table of one column and one row with
// a single field containing the value John Doe

Example 2

Customer Address

John Doe RR# 2 Oron

Bill May 12 June St., Springstown

// here's a table of two columns and two rows
// there are two records stored here in the two rows

The Primary Key Holds the Unique Identifier for Each Row

Each row in a table requires a unique identifier. The column
containing each row's unique identifier is called the primary
key. This identifier may exist in one of the columns of the
table or may be composed from a set of columns.

Typically a unique number is used to describe each record
whether a customer, order or an account. The primary key
makes extraction of the information from each row possible.
Integers make good primary keys as is shown in the following
example where the Account_No values are the primary key
column.

Example 3

Account_No
Customer Address

1000001
John Doe RR# 2 Oron

1000002
Bill May 12 June St., Springstown

// a key is a column, each row will have a primary key value that
// identifies it

Primary Keys

There are some rules governing establishing primary keys.
No two rows can share the same primary row value. Every
row must have a primary key value ( that is not NULL ).
The column containing key values can't be modified.
Primary keys can't be reused.

These rules serve to ensure every record has a unique
identifier which cannot be replaced or reassigned. When
multiple columns are used to create keys, the rules
extend to all the columns.

Rules Governing Primary Keys

two rows cannot share a primary key value
key value column cannot be modified
every row must have a primary key value (not NULL)
primary keys cannot be reused

Foreign Keys

A table may contain a column (or columns) corresponding
to primary key of another table. This is called a foreign key.

These and other rules came out of seminal work done by
Ted Codd who worked at IBM in the late 60s. The rules
Codd espoused made the use of a Structured Query
Language a requirement of the relational model. This led
to the development of SQL as we know it today.

Popular Terminology

Following is one definition of a Relational Database
Management Systems Definition.

RDMS - A database management software system that
organizes data into a series of records that are stored in
linked tables. This provides the ability to relate different
records, fields and tables, and aids data access and data
transformation.

reference: www.agriculture.purdue.edu/ssmc/Frames/newglossery.htm

Relational Database Management Systems allow for
multiple users to use a database under the overview
of a system administrator. A user can create tables
and control who may use them.

Following is a definition for schema.

schema - "The word schema comes from the Greek word
"σχήμα" (skhēma), which means shape or more generally
plan.

reference: http://en.wikipedia.org/wiki/Schema

Schema is used in computer programming languages
to describe the overall structural relationship between
entities. In databases, a schema describes how a set
of tables are related. To offer a different context, a
schema in XML describes the hierarchical structural
pattern that a document will take.

A 'System Catalog' or 'Data Dictionary' are tables a
database system uses to describe the tables in a
schema.

A connection to a database is often referred to as a
session.

SQL Language Organization

Data Manipulation Language (DML)

The set of SQL commands that are used to manipulate the
contents of a database is called the 'Data Manipulation
Language'. These commands are summarized in the
following table.

DML Commands

SELECT Retrieves data from the database.

INSERT Inserts rows into a table

UPDATE Updates rows in a table.

DELETE Deletes rows from a table.

Data Definition Language (DDL) // structural commands

The set of commands used to change the structure of
database tables is called the Data Definition Language.
Following are SQL commands are used to create, delete
or alter database tables.

DDL Commands

CREATE Creates a new database object.

DROP Drops an existing database object.

ALTER Modifies the structure of a database object.

A Few SQL Examples

Usually a database will already exist and information
is retrieved from it using SELECT statements or if there
needs to be a change to the table an UPDATE statement
is used. If you are creating a database from scratch you
will need to create tables to store your data.

The CREATE Statement

The CREATE command is followed by the keyword
TABLE and the name the table will be given. Following
these keywords are the column names in parentheses
which must be unique, followed by their type.

In addition a setting is specified to determine if the initial
value of fields of columns can be NULL. NULL represents
an absence of a value. A row does not need to specify a
value for columns that are specified as NULL. Here is an
example of a table being created using the CREATE
command. In this case NULL values are not accepted.

CREATE Example

CREATE TABLE Cars
    (
    InventoryNo INTEGER     NOT NULL,
    Make         CHAR(40)    NOT NULL,
    Model        CHAR(40)    NOT NULL,
    Year         CHAR(10)    NOT NULL
    );

// column names are followed by type

Once a table has been created data can be inserted into
it. Values stored can be changed via updates. Data also
can be selectively removed. The commands to do these
operations are Insert, Update and Delete.

The INSERT command

Once a table has been created data can be inserted into
it. INSERT is used to insert rows or parts of rows of data
into a table. Following the INSERT command the name of
the table is specified. Following the VALUES keyword,
the row items to be inserted are specified.

INSERT Example

INSERT INTO Cars
   VALUES ( 82123,
          'Ford',
          'Focus',
             2000
          );

// inserts actual data values

The SELECT Statement

Once a table is created, perhaps containing a companies
inventory, users will want to access the information. This
is where the SELECT statement comes in. It is perhaps
the most used statement in SQL. A select statement is
also called a 'query' because it is asking for information
from the database. In it's simplest form it can be used
to select all the information in the rows and columns of
a table as in the following example.

SELECT Example

SELECT * FROM Cars;

Assuming that a number of INSERT statements were
performed the result of a SELECT everything statement
like the above might be a table that looks like the following.

OUTPUT

InventoryNo Make Model Year

00123 Ford Taurus Wagon 1996

00456 Toyota Camry 1997

00789 Chrysler Intrepid 2000

01401 Jeep Cherokee 1993

01578 GM Pickup 1988

If a Single column is specified the column of values will
be returned.

Example SELECT Make
FROM Cars;

// specify column(s) after SELECT

The output of this statement would be a table like the following.

Output

Make

Ford

Toyota

Chrysler

Jeep

GM

WHERE Clauses

A number of conditions can be placed on what is selected
using WHERE clauses. The following example selects
vehicles whose year is less than 1996.

Example

SELECT Make, Model, Year
FROM cars
WHERE Year > 1996;

Output

Make Model Year

Toyota Camry 1997

Chrysler Intrepid 2000

The UPDATE command

To change data in a table the UPDATE command
is used. The statement takes the following form.

UPDATE Example

UPDATE table_name
SET column_name = 'new_value'
WHERE key_name= '10101';

// for example reset a price

Notice the UPDATE statement first addresses the table
name and then specifies the column that the new value
will be assigned to with the SET keyword. The condition
specifies which row will have it's data updated. If there
was no WHERE clause, all the row's values in this
column would be changed to 'new_value'.

UPDATE Example 2

UPDATE Cars
SET Model = 'sedan DX'
WHERE key_name= 'lexus';

// with no conditional where phrase all row
// values in column would be effected

The DELETE Command

To complete our mini-tour of SQL we add the command
that is used to remove data from a table. One, many or
all the rows of a table can be deleted using the DELETE
command. To delete a row use the following form.

DELETE Example

DELETE FROM Cars
WHERE key_name= 'lexus';

Omitting the WHERE clause will cause every row to be
deleted from the specified table so you have to use
DELETE with care! Delete is a row specific operator
and takes no column names or wild cards.

// DELETE is a row specific operator

There of course is a lot more to SQL. This gives you
the basic idea and the most commonly used commands.

Result Sets

The rows that satisfy the conditions specified in an SQL
query are called the 'result set'. The number of rows
returned can be any number including zero. A cursor is
supplied which is a kind of file pointer that provides the
means of accessing the data from the result set one row
at a time.

Transactions // makes a set of SQL commands 'atomic'

If a database is being accessed concurrently by two
users and both are accessing the same range of data,
where one is updating while the other is selecting, there
is the possibility that the user doing the selection will
get an incorrect result set containing only partially
updated information. To avoid this situations database
management systems use transactions to ensure data
consistency.

// transactions ensure data consistency, like a lock

Transactions allow several users to use a database at
the same time without there being a danger of obtaining
inconsistent data.

A Transaction defines a set of SQL statements with a
well-defined start and end point. The set of statements
work as a unit of change for a database. A transaction
is ended with either a COMMIT or ROLLBACK command.

The COMMIT command makes the changes described
in the set of SQL statements permanent. The ROLLBACK
statement 'undoes' any changes that were made by the
transaction statements. A lock is a device that prevents
two transactions from manipulating data at the same time.
A lock synchronizes access to data. A row lock prevents
two transactions from modifying the the same row
simultaneously.

An Oracle Example

insert into R values (1, 2);
rollback;
insert into R values (3, 4);
commit;

// from 'Oracle 9i SQL versus Standard SQL,
// from http://infolab.stanford.edu/~ullman/fcdb/oracle/or-nonstandard.html#transactions

Local vs Global Transactions

A 'local transaction' is one executed against a single
Data Resource. A transaction that is coordinated by
a transaction manager across multiple heterogeneous
Data Resources is called a 'global transaction'.

Stored Procedures

A stored procedure is a set of SQL statements that can
be called by name the same way a function or method
can be called. A stored procedure is a routine that is
stored with the database. Normally stored procedures
are written in scripts provided by the database system
software. Once a stored procedure is written it is ported
to the DBMS and stored so that it can be called by name
anytime it is needed.

The use of stored procedures have advantages. The code
can be executed quickly and efficiently on the database
which may lead to an improvement in the efficiency of
network exchanges.

Metadata

Most databases maintain tables that describe layout details
of the various tables that the database contains, for example,
"What are the column names of a table?".

Each database management system supplies functions
for getting this information.

Database Ecology

There are a plethora of databases available for our use.
Each has it's own idiosyncrasies so all our learning efforts
are not entirely transferable between databases.

Just to mention a few of the more famous database system,
there are:

Commonly Known Commercial Database Systems

Oracle
Sybase
IBM DB2
Informix
Microsoft SQL Server

Home Office Databases

Microsoft Access

Popular Open Source Database Systems

MySQL // provides commercial licensing for major users
PostgreSQL
Derby // new supported by the Apache Foundation

Which one to Use

You are free to use any of these databases if you have
access to them. This may be the case if you are learning
SQL for work purposes in which case a particular database
is available to you. At the school, life is simplified if we use
one of the open source databases.

MySQL

MySQL is freely available, and a proven performer that
doesn't impose costly licensing agreement the the user
unless the database is being used in a commercial
application.

Derby

Derby is a new database being developed by the Apache
foundation. It is widely supported in IBM's Eclipse. One might
need to supply a 'front end' for it or use Eclipse. Eclipse itself
is a handful to master, so only consider this path if you are
familiar with the editor.

PostgreSQL

Not as famous as MySQL, but popularly distributed with
Linux, along with MySQL, PostgreSQL is another option
that may be investigated.

General Strategy

If you have no particular preferences, MySQL should probably
be our base database and each student might carry one other
database along to explore the variations in approach to building
database systems.

Assignment

Find the main pages for three of the commercial
and three of the open source databases, and cut
and paste one paragraph that represents the companies
description of their database product. (Show the paragraph
in quotes and note the web page URL. )

// make sure you read the paragraph!

This survey may influence the product(s) you wish to
investigate and use while doing the assignments for the
course. You may optionally include a statement of your
general impression of that database product.

Submit the six paragraphs and any comments you have
added as Assignment 1, SQL Overview, and include your
name at the top of the first page.

Customer	Address
John Doe	RR# 2 Oron
Bill May	12 June St., Springstown

Account_No	Customer	Address
1000001	John Doe	RR# 2 Oron
1000002	Bill May	12 June St., Springstown

SELECT	Retrieves data from the database.
INSERT	Inserts rows into a table
UPDATE	Updates rows in a table.
DELETE	Deletes rows from a table.

CREATE	Creates a new database object.
DROP	Drops an existing database object.
ALTER	Modifies the structure of a database object.

InventoryNo	Make	Model	Year
00123	Ford	Taurus Wagon	1996
00456	Toyota	Camry	1997
00789	Chrysler	Intrepid	2000
01401	Jeep	Cherokee	1993
01578	GM	Pickup	1988