SQL
Objective #1: Understand the role of SQL in its use
with relational database systems.
- SQL stands for "Structured Query Language" and is usually
pronounced "Ess-Kue-Ell". But some people pronounce it as "sequel".
- There
are many different database platforms including Oracle, SQL Server, mySQL,
MS
Access,
Informix,
Sybase,
and DB2.
- Most database platforms follow the SQL standards defined by the international
standards organization ANSI. However, some of the database platforms add
additional commands to their version of SQL.
Objectives #2: Design a database and its tables
with the appropriate
fields and data types.
- It is important to plan (i.e. design) a database before creating it.
- You must determine how many tables the database will contain. Then, for
each table, you must determine the columns (i.e. fields) that will be used
in the table. Finally, you must determine the data types for each of the
columns in each table
.
- Tables and columns must be named with alphanumeric names that contain no
spaces. I recommend:
- using only lowercase letters in names for databases, tables and fields.
Note that database and table names are case-sensitive on Linux but
insensitive on Windows. Column names are case-insensitive.
- Name your tables in the plural form and your
columns in the singular form. For example, students would
be the name of a table and first_name would
be the name of one of its columns.
- End primary and foreign keys (will be discussed
later) with the suffix id as in user_id or user_id.
- Use descriptive names but use underscores
to separate multiple words. For example, first_name is better
and more descriptive than name.
- Field names should be unique across every table except for the keys.
- Limit your names to 64 characters.
- Each field must be given a data type. These choices can
affect the usability and performance of your Web site. Data types include
CHAR, VARCHAR,
TEXT, INT, DOUBLE, DATE, and others. However, data types may differ on different
database platforms (i.e. Access, Oracle, MySQL, Sybase, etc.)
Objective #3: Explain the role of primary keys in a relational
database.
- A primary key is the field in a table that is used to uniquely identify
each record. The data stored in the primary key field for one record (i.e.
row) cannot be used for any other record.
- It is a good idea to add the suffix _id to
the name of a primary key field.
- NULL cannot be stored in the primary key
field for any record.
- You cannot change the data stored in the primary key field for any record.
- Only one primary key field is allowed per table.
- Sometimes a primary key field is the combination of several other fields
but remember the data stored for each record in the primary key field must
be unique.
- The best data type to use for a primary key field is one that stores integers.
Often we set the primary key field to be AUTO_INCREMENT so
the database automatically creates a new integer for each record as it is
inserted into the table.
- A foreign key is a field in one table that is linked to the primary key
of another table through a relationship.
Objective #4: Create a database that includes one or more tables using
the SQL CREATE command.
- The SQL statement
CREATE DATABASE school;
creates a database named school using the CREATE command.
- The statement
CREATE TABLE students(
student_id MEDIUMINT UNSIGNED NOT NULL AUTO_INCREMENT,
first VARCHAR(15) NOT NULL,
last VARCHAR(15) NOT NULL,
email VARCHAR(40),
PRIMARY KEY (student_id));
creates a table with the name students and 4 columns (i.e. fields)
named student_id,
first, last, and email. The field student_id is
set as the table's primary key field. The student_id field has a
type of MEDIUMINT set to be UNSIGNED.
That means that positive whole numbers (i.e. integers) can be stored there
up to 16,777,215. Furthermore, the student_id field is set as AUTO_INCREMENT meaning
that the database server will enter consecutive, ascending integers in this
field automatically when new records are created. The fields that are marked
as NOT NULL must contain values and cannot
be left blank, in which case the NULL value
would be stored there. Note that the statement can be typed over several
lines to make it easier to read as long as the statement ends with a single
semicolon.
- To confirm the existence of a table and columns within a table, you can
use the SHOW command as in
SHOW TABLES;
or
SHOW COLUMNS FROM tablename;
- Technically, you do not have to type an SQL command in uppercase letters
but it is conventional to do so to make it easier to see the commands from
the table and column names.
Objective #5: Insert records into a table using the SQL INSERT command.
- The INSERT command is used to insert
new records into a table. Records are horizontal rows of course.
- The statement
INSERT INTO users (first_name, last_name, password,
registration_date)
VALUES ('John', 'Doe', PASSWORD('default'), NOW());
could be used to insert a record into the users table described
above. The command INTO is optional and
is not used in the examples below. The PASSWORD function is used to encrypt
passwords. The NOW function is MySQL can
be used to obtain the current date from the server.
- The statement
INSERT users (first_name, last_name, password,
registration_date)
VALUES ('John', 'Doe', PASSWORD('default'), NOW()),
('Jane', 'Doe', PASSWORD('default'), NOW());
could be used to insert two records into the users table. Notice
the comma between the first set of parentheses after the VALUES command.
You can insert more than two records by using additional commas.
- The statement
INSERT users VALUES ('John', 'Doe', NULL, PASSWORD('default'),
NOW());
could be used to insert a record where values for all fields, including those
that may be set to NULL as in the email
field above, must be assigned.
- As in the examples above, string values like the word John should be typed
in single quotes within a SQL statement. Numbers however such as 12 or 45.67
should not be surrounded with single quotes. The value NULL and function
calls (as with PASSWORD and NULL above) should not be enclosed in quotes.
Objective #6: Select data using a SQL SELECT statement.
Objective #7: Sort & limit query results using a WHERE clause
along with conditionals and the keywords AND and OR.
- The WHERE command and conditionals ( < , >, <=, >=, =) can
be used to limit the results of a query.
- The statement
SELECT * FROM users WHERE first_name = 'John';
would only return the records that have 'John' stored in the first_name field.
- The statement
SELECT * FROM users WHERE user_id >= 10;
would return the records that have a user_id field that is greater than or
equal to 10.
- The statement
SELECT * FROM users WHERE (first_name = 'John') OR (user_id >= 10);
would return the records that have first_name of 'John' as well as the records
that have a user_id that is greater than or equal to 10.
- The statement
SELECT * FROM users WHERE (first_name = 'John') AND (user_id >= 10);
would only return records that have a first_name of 'John' and that also
have a user_id that is greater than or equal to 10.
- The statement
SELECT * FROM users WHERE email IS NULL;
would only return records that have NULL stored in the email field.
- The statement
SELECT first_name, last_name FROM users WHERE user_id = 15;
would select the first_name and last_name fields of the record that has user_id
equal to 15.
- Sometimes it is useful to use LIKE or NOT
LIKE as well as wildcard characters to efficiently find records
that contain a range of string values. The statement
SELECT * FROM users WHERE last_name LIKE 'Min%';
can be used to find all users who have a last name that begins with the letters
Min including Minich, Minnich, Minnick, Minick and minich, minnich, minnick
since it is case-insensitive by default. The % symbol is the wildcard character
that matches zero or more characters. The underscore ( _ ) symbol is a wildcard
character that is used to match a single character.
The statement
SELECT * FROM users WHERE last_name NOT LIKE '%s';
can be used find all users who have a last name that does not end with the
letter s. Note how the % is used at the beginning of the string.
The statement
SELECT * FROM users WHERE last_name LIKE 'Minic_';
can be used to find all users who have a last name Minich or Minick.
- Results of a query can be automatically sorted with the ORDER BY command.
The query
SELECT * FROM users ORDER BY registration_date ASC;
will sort the records in ascending order by the registration_date field.
Since ASC is the default sorting order, the ASC command is optional. The
statement
SELECT * FROM users WHERE registration_date >= '2004-07-27' ORDER BY last_name
DESC, first_name DESC;
will sort the records that have registration dates that are later than 7/27/2004
in sorted order first by last_name in descending order (Z to A) and ties
will be broken by the first_name field in descending order. If any NULL values
are found in the field that you are sorting by, those records will be pulled
out first whether you are sorting in ascending or descending order.
- The LIMIT command can be used to limit the number of records that are retrieved.
The statement
SELECT * FROM users LIMIT 10;
only returns the first 10 records in the table. The statement
SELECT * FROM users LIMIT 10, 20;
returns twenty records starting with the 11th one. Note that the first record
is considered to have index position zero. The statement
SELECT * FROM users ORDER BY registration_date ASC LIMIT 1, 1;
would return the first person to register.
Objective #8: Update and delete data in a table using the SQL
commands
UPDATE, DELETE, & DROP TABLE.
- The UPDATE command can be used to modify an existing record.
- The statement
UPDATE users SET last_name = 'Doe';
would change the last_name field in every record to 'Doe'. This would probably
not be desirable. However, the statement
UPDATE users WHERE last_name = 'Doe' AND first_name = 'John' SET email =
'john@comcast.net';
would overwrite John Doe's previous email address with the new email address
of john@comcast.net.
- The DELETE command can be used to permanently delete one or more records.
- The statement
DELETE FROM users;
would delete every record in the users table. This would probably not be
desirable. However, the statement
DELETE FROM users WHERE last_name = 'Doe';
would delete all records with a last name of Doe from the table.
- The DROP command should be used to delete a whole table (and not just the
records in the table) as in
DROP TABLE users;
- The statement
DROP DATABASE wyomissing;
would delete the whole database named wyomissing including all of its tables
and records.
Objective #9: Understand the principles of database normalization.
- Normalization is used to reduce redundancies and other problems that can
affect the integrity of the data stored in a database.
- A relational database system is a database that contains multiple tables
which can have relationships.
- Normalization rules are called First Normal Form, Second Normal Form, and
Third Normal Form. We will mainly be concerned in this course only that our
databases follow the First Normal Form.
- The First Normal Form (1NF) requires that each field contains data that
is indivisible. A field named name should not contain first names for some
records but last names for other records. That is, you should break up any
fields that store different types of data in different records into separate
fields.
- The Second Normal Form (2NF) requires that a database is in 1NF and that
all fields which have repeating values across multiple records be placed
into separate tables. This reduces the redundancy of a database. The act
of pulling such fields out of one table and placing them into another table
creates a natural one-to-many relationship (explained below.)
- The Third Normal Form (3NF) requires that a database is in 2NF and that
every nonkey field is dependent on the primary key.
- Unfortunately, the act of normalizing a database into 3NF causes the database
to be broken up into many tables. This makes it more difficult and slow to
update the database and retrieve data. Normalization is a trade-off between
data integrity and scalability versus simplicity and speed.
Objective #10: Explain how the role of relationships between tables
in a database and perform joins.
- A relationship exists between two tables within a database. Actually, it
is a relationship between one field (i.e. column) of one table (often a primary
key) with a field of another table (usually a foreign key).
- The data entries that are stored in the user_id primary
key field of a table named users (which is denoted as users.user_id)
may also be stored in the user field of a table named quiz_scores (which
is denoted as quiz_scores.user). If there
is exactly one occurrence of any given user_id data
entry in users for every one occurrence of the same data entry in the user
field of quiz_scores then the relationship is called one-to-one.
If there are multiple occurrences of each unique users.user_id entry
in the quiz_scores.user field then the relationship
is called one-to-many. If the two fields are foreign keys
and neither one is a primary key, then the relationship is considered to
be many-to-many if there are multiple occurrences of certain
data entries in the field from one table and there are also multiple occurrences
of the same data entries in a field of the other table.
- You should avoid creating tables with many-to-many relationships because
it leads to data redundancy and data integrity problems.
- A join is a query that cross-references multiple tables
in order to obtain more usable data.
- An inner join is one that retrieves information from the
records from multiple tables where the entries in two corresponding fields
are the same.
- A left join is one that retrieves the information from
the records from multiple (usually two) tables where the entries in two corresponding
fields are the same as well as the rest of the entries in the table that
was listed first (i.e. is "on the left"). A left join can also
called an outer join.
- Joins can be created using any fields, not just key fields.
- You can create a join between tables in different databases using the notation database.table.column but
both databases must be stored on the same server.
- Joins that don't use the WHERE clause
are called full joins. They will return every record from both tables.
- NULL values in one table's column are
not considered to match NULL values in another
table's column through a join.
- An alias can be used after the keyword AS in
order to make it easier to write long queries.
Objective #11: Use and evaluate SQL functions.
- You can use built-in MySQL functions in queries.
- The query
SELECT CONCAT(first_name, ' ', last_name) FROM
users;
will concatenate the first_name fields with a blank space and the last_name
fields.
- Function names are case-insensitive just like SQL keywords (i.e. SELECT, FROM ,
etc.)
- Some common text functions include CONCAT, LENGTH, LEFT, RIGHT, TRIM, UPPER, LOWER,
and SUBSTRING.
- Functions can be used along with INSERT to
insert formatted data into a table.
- Some common numeric functions include ABS, CEILING, FLOOR, FORMAT, MOD, RAND, ROUND, SIGN,
and SQRT.
- The % symbol can be used to perform modulus
instead of using the MOD function.
- Some common date and time functions includeHOUR, MINUTE, SECOND, DAYNAME, DAYOFMONTH, MONTHNAME, MONTH, YEAR, ADDDATE, SUBDATE, CURDATE, CURTIME, NOW,
and UNIX_TIMESTAMP.
- The DATE_FORMAT and TIME_FORMAT functions
are useful to format dates and times.
Objective #12: Use the aggregate functions AVG, MIN, MAX and group
data using the GROUP BY command.
- There are some useful aggregate functions including AVG, MIN, MAX, SUM,
and COUNT.
- The GROUP BY command is used to collect
blocks of information.
- The DISTINCT function is used to eliminate
duplicate values in a column.
- Aggregate functions are used to compute against a "returned column
of numeric data" from your SELECT statement. They basically summarize
the results of a particular column of selected data. We are covering these
here since they are required by the next topic, "GROUP BY". Although
they are required for the "GROUP BY" clause, these functions can
be used without the "GROUP BY" clause. For example:
SELECT AVG(salary)
FROM employee;
Objective #13: Add a column to an existing table.
- You can add a column to an existing table using the ALTER
TABLE command as in
ALTER TABLE users ADD COLUMN last_name VARCHAR(40)
AFTER first_name;
would add the column named last_name to
the table named users in the position just
after the column first_name. The column would have type VARCHAR(40).
The AFTER clause is optional.
- You can change the data type of a column using the clause CHANGE
COLUMN as in
ALTER TABLE users CHANGE COLUMN last_name VARCHAR(20);
- You can delete a column from a table using the DROP
COLUMN clause as in
ALTER TABLE users DROP COLUMN last_name;