MySQL 101 - Changing data and schema, UPDATE, ALTER

In our last episode we covered sorting, searching and grouping. We found out that using the COUNT(*) can be problematic when we have unexpected NULL data. Now we look at how to resolve data issues by updating the data, and perhaps even the table schema. We'll use the same database we did for the last episode. You can download it here ».

Updating Data

Let's recap.  If we pull the list of books, and authors, we find that "The Broken Shore" from Peter Temple has no price.  Not that it has a zero price, but it has a NULL value.

mysql> SELECT CONCAT_WS(' ',`author`.`first_name`,`author`.`last_name`) AS `author`,
 `book`.`title`, `book`.`price`
 FROM `author` INNER JOIN `book` ON `author`.`id` = `book`.`author`
 ORDER BY `book`.`price`
 LIMIT 1;
+--------------+------------------+-------+
| author | title | price |
+--------------+------------------+-------+
| Peter Temple | The Broken Shore | NULL |
+--------------+------------------+-------+
1 row in set (0.00 sec)

We only have one result here.  This is because I made sure the order was in ascending order of price, and used the LIMIT clause to reduce the number of rows being returned.  This does exactly what its name suggests.  LIMIT 1 returns the first row from the ordered result set.  LIMIT also supports an offset (which is zero by default) followed by the number of rows required.  So LIMIT 1 is equivalent to LIMIT 0, 1.   If we wanted the second and third record from the result set we could use LIMIT 1, 2.

OK, we really need to figure out the record that is the problem if we need to replace or fix it.  Lets see if we can find the record using a simple query.

SELECT * FROM `book` WHERE `price` < 1;
Empty set (0.02 sec)

Huh?  But NULL is less than 1 isn't it?  Not exactly.  No matter what value you use in a comparison, a NULL value will always fail.  This is because NULL is the absence of a value, not a value itself.  So we can't find it by checking against a valid value, what about checking against NULL itself?

SELECT * FROM `book` WHERE `price` = NULL;
Empty set (0.00 sec)

Well, this doesn't seem right.  But then, read again the description above.  Any comparison with any value will fail if the value is NULL.  You cannot compare a value against a non-value.  So how do we actually find the row with a NULL in it?  There is a construct that allows that, the IS NULL and its opposite, IS NOT NULL.

SELECT * FROM `book` WHERE `price` IS NULL;
+----+------------------+--------+-----------+--------+----------------+---------------+-------+
| id | title | author | publisher | format | published_date | isbn | price |
+----+------------------+--------+-----------+--------+----------------+---------------+-------+
| 1 | The Broken Shore | 1 | 3 | 1 | 2010-09-27 | 9781921656774 | NULL |
+----+------------------+--------+-----------+--------+----------------+---------------+-------+
1 row in set (0.00 sec)

Ah, there we go.  OK, we have the record id, which we know to uniquely identify this record, so we can do one of two things.  Either we can delete the record and replace it,  or we can update the record in situ.  Let's have a look at both options.

DELETE FROM `book` WHERE `id` = 1;
Query OK, 1 row affected (0.05 sec)

INSERT INTO `book` VALUES (1, 'The Broken Shore', 1, 3, 1, '2010-09-27',
 '9781921656774', '21.50');
Query OK, 1 row affected (0.00 sec)

SELECT * FROM `book` WHERE `id` = 1;
+----+------------------+--------+-----------+--------+----------------+---------------+-------+
| id | title | author | publisher | format | published_date | isbn | price |
+----+------------------+--------+-----------+--------+----------------+---------------+-------+
| 1 | The Broken Shore | 1 | 3 | 1 | 2010-09-27 | 9781921656774 | 21.50 |
+----+------------------+--------+-----------+--------+----------------+---------------+-------+
1 row in set (0.00 sec)

In this example we delete using a WHERE clause to identify the record.  If we don't provide a WHERE clause, DELETE FROM will delete all data in the table.

This time we are using an INSERT INTO with the full list of fields in the record, replacing everything.  This seems a little over the top, and could result in errors creeping in because of changing so many fields at once.  Fortunately SQL allows us to just update a single field if we need to.

UPDATE `book` SET `price` = '22.50' WHERE `id` = 1;
Query OK, 0 rows affected (0.01 sec)
Rows matched: 1 Changed: 1 Warnings: 0

SELECT * FROM `book` WHERE `id` = 1;
+----+------------------+--------+-----------+--------+----------------+---------------+-------+
| id | title | author | publisher | format | published_date | isbn | price |
+----+------------------+--------+-----------+--------+----------------+---------------+-------+
| 1 | The Broken Shore | 1 | 3 | 1 | 2010-09-27 | 9781921656774 | 22.50 |
+----+------------------+--------+-----------+--------+----------------+---------------+-------+
1 row in set (0.00 sec)

Here we see that we can use UPDATE to just update the field (or fields) that we need to.  This makes it far easier to maintain consistency in our data as we are only changing the data we need to and not touching fields we don't need to.

We can update multiple rows by adding each field = value pair, comma separated, after the SET keyword.  As an example:

UPDATE `book` SET `price` = '22.50', `format` = 1 WHERE `id` = 1;
Query OK, 0 rows affected (0.00 sec)
Rows matched: 1 Changed: 0 Warnings: 0

Note how in this case because we haven't changed anything we see the message from MySQL for the number of rows changed is now 0, while it was 1 in our last example.  Each case shows Rows matched: 1, which indicates that there was one row that matched the WHERE clause.

Altering table definitions.

So we can now update data effectively, but we also had a problem with the isbn field.  Initially we believed that ISBN is a 13 digit number, but when we loaded data we found that there are two types of ISBN.  The newer version is in fact a UPC (Universal Product Code) and matches the 13 character format, but the older version was a 10 digit version, with leading zeroes.  We probably need to do something about that.  We can change the field format to a string format, which allows us to keep the leading zeroes, but we still have those that were truncated when we loaded them.  We need a two pronged approach.  Firstly lets convert the field format

ALTER TABLE `book` CHANGE `isbn` `isbn` VARCHAR(13);
Query OK, 8 rows affected (0.13 sec)
Records: 8 Duplicates: 0 Warnings: 0

SELECT `isbn` FROM `book`;
+---------------+
| isbn |
+---------------+
| 9781921656774 |
| 9781741752236 |
| 732268133 |
| 9781741750966 |
| 207172137 |
| 1863590285 |
| 732276748 |
| 1863252509 |
+---------------+
8 rows in set (0.00 sec)

Note the use of ALTER TABLE.  This takes a table name and then we supply a field name, and its changed definition.  Since the field name is part of the definition, and we aren't changing the name, we need to supply it again.  We have now changed the type to VARCHAR(13) which is a variable length character field that can hold a maximum of 13 characters.  For values that are shorter than this, less space is used than for the fixed length CHAR(13) equivalent. In this case it is debatable which version is best, as all books will have an ISBN, so we don't really save much space using VARCHAR vs CHAR.

ALTER TABLE can pretty much change anything about a table definition, from field definitions right up to storage engine specifications.  It is a very powerful tool in your SQL toolkit.  Well worth looking up the MySQL Reference Manual page.

We've changed the format, now we need to fix those short ones.  We need two string functions available to us with MySQL to accomplish this, one to work out which are the short ones, and the other to fix the problem.

UPDATE `book` SET `isbn` = LPAD(`isbn`, 10, '0')
 WHERE CHAR_LENGTH(`isbn`) < 10;
Query OK, 3 rows affected (0.00 sec)
Rows matched: 3 Changed: 3 Warnings: 0

SELECT `isbn` FROM `book`;
+---------------+
| isbn |
+---------------+
| 9781921656774 |
| 9781741752236 |
| 0732268133 |
| 9781741750966 |
| 0207172137 |
| 1863590285 |
| 0732276748 |
| 1863252509 |
+---------------+
8 rows in set (0.00 sec)

The WHERE clause used the CHAR_LENGTH function, which returns the length of the character string, in characters.  Note that if using a multi-byte character set this will be different to the storage space required.  The LENGTH function in these cases would return a number higher than the CHAR_LENGTH function.  We will cover character sets later in the series.

By using WHERE CHAR_LENGTH(`isbn`) < 10 we find those records that have less than 10 digits in them.  We then need to pad those out to 10 digits with leading zeroes.  This is what LPAD does.  It is short for 'left pad', or padding the start of a string with the required number of digits.  It takes three parameters, the field (or string) to be modified, the legnth to pad out to, and the string to use for padding.

Why couldn't we just LPAD the entire isbn field?  LPAD not only pads, but also truncates to the length provided.  So we would lose information on those values longer than 10 digits.

You can find more information about string (and other data type) functions in the MySQL Reference Manual section 'Fucntions and Operators'.

In our next episode we will look at using the power of the database to control data integrity.

Disclaimer

I am employed by SkySQL Ab, which provides support, training and services for MySQL, MariaDB and Drizzle and is staffed by many ex-MySQL AB employees. While I try to be unbiased in all of my public opinions, and my opinions are all my own and not those of my employer, I work for SkySQL because I believe in what they are doing and therefore cannot be truly unbiased.

MySQL is a registered trademark of Oracle Corporation,  MariaDB is a trademark of MontyProgram, and Drizzle is a trademark.

No feedback yet


Form is loading...