MySQL 101 - Referential Integrity

In our last episode we learned how to modify data and table definitions.  This will come in handy as we look at building in referential integrity constraints into our database.  To begin we will need the database definition resulting from last episode's changes. You can download it here »

A word on Storage Engines

Before we can begin we need to understand a little about MySQL Storage Engines.  MySQL actually does only part of the job of parsing SQL, creating query plans, executing them and returning data sets.  Where the data is stored and retrieved there is a Storage Engine at work.  The original storage engine was MyISAM, based on the industry stalwart of ISAM (Indexed Sequential Access Method).  The idea behind MyISAM was (and is) that it is a low overhead, low complexity storage engine.  It gained rapid early adoption because of its simplicity and ease of use.  Since then other storage engines have been developed, with InnoDB being the most commonly used in anything other than simple web apps.  There are others, for instance the CSV engine for interfacing with CSV files, the Blackhole engine for interfacing with nothing and the Federated engine for interfacing with just about anything.

A database may contain tables that use a number of different storage engines, although each table can only have a single storage engine managing its data.

The big difference between MyISAM and InnoDB is that InnoDB is a fully transactional storage engine, and supports more of the ANSI SQL standard related to referential integrity and transactions. Prior to MySQL 5.5 MyISAM was the default storage engine but MySQL 5.5 and later now have InnoDB as the default.  No matter what the default, you can always change it for a particular table. While we will deal with transactions later in the series we need to ensure our tables are using InnoDB.

Using the SHOW CREATE TABLE syntax we can determine what engine our tables are using

SHOW CREATE TABLE `book` \G
*************************** 1. row ***************************
Table: book
Create Table: CREATE TABLE `book` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`title` varchar(200) NOT NULL,
`author` int(11) NOT NULL DEFAULT '0',
`publisher` int(11) NOT NULL DEFAULT '0',
`format` int(11) NOT NULL DEFAULT '0',
`published_date` date DEFAULT NULL,
`isbn` varchar(13) DEFAULT NULL,
`price` decimal(5,2) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `title` (`title`),
KEY `author` (`author`),
KEY `publisher` (`publisher`),
KEY `format` (`format`)
) ENGINE=MyISAM AUTO_INCREMENT=9 DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

Here we used the \G command to the mysql command line client so that it returns the data in a row format rather than the columnar format used by default.  Since we have only one field, this makes the output cleaner.

Most of what is presented will be familiar, what follows the closing bracket is a set of attributes that have been picked up by default (as we didn't specify these when we created the tables).  The first of these, and most important for our current purposes, is the ENGINE attribute.  In the output we can see that it is using MyISAM.  This is not good for data integrity, as MyISAM doesn't support any of the referential integrity constraints, nor does it support ACID transactions.  We need to change this.

ALTER TABLE `book` ENGINE=InnoDB;
Query OK, 8 rows affected (0.34 sec)
Records: 8 Duplicates: 0 Warnings: 0

When changing engines, be aware that it will need to rebuild the data.  For large tables this can quite some time.  We need to do this for all of our tables, so we need to use the ALTER TABLE for author, format and publisher.

 ALTER TABLE `author` ENGINE=InnoDB;
ALTER TABLE `format` ENGINE=InnoDB;
ALTER TABLE `publisher` ENGINE=InnoDB;

I've omitted the output from MySQL to be more concise.  Now we can look at referential integrity.

TIP: In the MySQL command line client on most systems you can use the up-arrow to retrieve the last comand run, and edit it in place, saving having to retype large amounts of text if the new command is similar to the last.

Referential Integrity

We now have data in several tables and that data is interdependent. For instance, if we were to delete an author, then we would have records in the book table that would be orphaned.  We would therefore need logic in our application to be able to ensure the corresponding book records were deleted, or indeed to prevent the deletion of the author record in the first place.  Rather than having to write and test all of this code, we could ask the database to look after this for us.  This ensures our data will always be consistent and follow clearly defined rules that allows our application logic to worry about other things.

To enable these features we need to identify a parent and child relationship between tables, and in the child table we need to build the constraints to control how changes to the parent are reflected in the child.  To do this we need to identify an indexed field in the child that references a unique field in the parent.  We have this situation already in that we have a primary key on each of our tables that satisfies the unique key requirement, and we have indexes on those fields in the book table that reference the primary keys of its parents (the author, publisher and format).  So we need to add a constraint, called a FOREIGN KEY constraint on the book table:

ALTER TABLE `book`
 ADD CONSTRAINT `fk_book_author` FOREIGN KEY (`author`)
 REFERENCES `author` (`id`)
 ON DELETE RESTRICT
 ON UPDATE CASCADE,
 ADD CONSTRAINT `fk_book_publisher` FOREIGN KEY (`publisher`)
 REFERENCES `publisher` (`id`)
 ON DELETE RESTRICT
 ON UPDATE CASCADE,
 ADD CONSTRAINT `fk_book_format` FOREIGN KEY (`format`)
 REFERENCES `format` (`id`)
 ON DELETE RESTRICT
 ON UPDATE CASCADE;
Query OK, 8 rows affected (0.37 sec)
Records: 8 Duplicates: 0 Warnings: 0

Rather than do each alter separately we've done them all in the one ALTER TABLE command.  You will notice that the constraint has a name, fk_book_author, fk_book_publisher and fk_book_format.  The name is not mandatory, the system will generate one for us, but this way ensures we have a name that means something to us and can be used in other ALTER TABLE operations.

The FOREIGN KEY defines a relationship between this table (the child) and the parent table. The bracketed list is the list of fields that form the child end of the relationship.  In our case we only need a single field.  The REFERENCES clause then indicates the table and matching field list for the parent table.  Following this are two clauses, the ON DELETE, and ON UPDATE clauses.   These are what determine the action taken on the parent and child tables on the corresponding action.

I've used the two most common constraints, CASCADE and RESTRICT.  For CASCADE the update or delete does exactly that, cascades down the foreign key chain.  So in the case of author, for instance, if we changed the id of an author, all records in book that reference that author record will have their author field updated with the new value.  For DELETE we've used RESTRICT, what this does is denies a DELETE on any parent record that has children referencing it.  So we would manually need to delete all books to delete an author, for instance.  We could have chosen CASCADE for delete as well, and the book records would be deleted upon delete of an author.  But we'd better try it out and see.

 DELETE FROM `author` WHERE `last_name` = 'Corris';
ERROR 1451 (23000): Cannot delete or update a parent row: a foreign key constraint fails
 (`bookshop`.`book`, CONSTRAINT `fk_book_author` FOREIGN KEY (`author`)
 REFERENCES `author` (`id`) ON UPDATE CASCADE)

Great, we can't delete authors that have books, but what about one that we know doesn't have a book?

DELETE FROM `author` WHERE `first_name` = 'Merrilee' AND `last_name` = 'Moss';
Query OK, 1 row affected (0.00 sec)

Yep, so we know that the RESTRICT works as expected.  Let's take a look at the UPDATE.

 SELECT * FROM `author` WHERE `id` = 1;
+----+------------+-----------+
| id | first_name | last_name |
+----+------------+-----------+
| 1 | Peter | Temple |
+----+------------+-----------+
1 row in set (0.00 sec)

SELECT `title` FROM `book` WHERE `author` = 1;
+------------------+
| title |
+------------------+
| The Broken Shore |
| Shooting Star |
+------------------+
2 rows in set (0.00 sec)

UPDATE `author` SET `id` = 15 WHERE `id` = 1;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0

SELECT * FROM `author` WHERE `id` = 15;
+----+------------+-----------+
| id | first_name | last_name |
+----+------------+-----------+
| 15 | Peter | Temple |
+----+------------+-----------+
1 row in set (0.00 sec)

SELECT `title` FROM `book` WHERE `author` = 1;
Empty set (0.00 sec)

SELECT `title` FROM `book` WHERE `author` = 15;
+------------------+
| title |
+------------------+
| The Broken Shore |
| Shooting Star |
+------------------+
2 rows in set (0.00 sec)

We can see when updating the author ID, which could potentially break the link between author and book, the corresponding book records are also updated, maintaining the link and ensuring that the integrity of the references are maintained.

In our next episode we'll start to look at extending our tables to support sales, and the use of transactions.

Disclaimer

I am employed by SkySQL Ab, which provides support, training and services for MySQL, MariaDB and Drizzle and is staffed by many ex-MySQL AB employees. While I try to be unbiased in all of my public opinions, and my opinions are all my own and not those of my employer, I work for SkySQL because I believe in what they are doing and therefore cannot be truly unbiased.

MySQL is a registered trademark of Oracle Corporation,  MariaDB is a trademark of MontyProgram, and Drizzle is a trademark.

1 comment

Comment from: SQL dev [Visitor]
SQL devI have rarely seen programming taking advantage of Referential Integrity constraint e.g. CASCADE on UPDATE. It's immensely useful specially in large application where data is subject to change by multiple people. Here is another good example of referential integrity in MySQL http://javarevisited.blogspot.com.ar/2012/12/what-is-referential-integrity-in-database-sql-mysql-example-tutorial.html which explains cascading.
09/01/13 @ 14:30

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)
This is a captcha-picture. It is used to prevent mass-access by robots.
Please enter the characters from the image above. (case insensitive)