MySQL 101 - Retrieving data: SELECT and JOIN

In our last episode we started building up our online bookshop database, with tables for publishers, authors, formats and books.  At the moment we only have one book in there, so before we go too far, lets add a few more:

INSERT INTO `book` VALUES
( NULL, 'The Big Score', 2, 4, 1, '2007-01-01', 9781741752236, 29.95 ),
( NULL, 'Split', 3, 2, 1, '2003-01-01', 0732268133, 29.95 );

So what is this NULL thing, and why have I used it?  If you remember we set the first field to an auto_increment id.  Because we don't want to supply a value for this, but let the database create the next value, we need to give a value that indicates we want this to happen.  For this instance, NULL is the value to use.  We must supply a value because we didn't restrict our insert by supplying a field list, so we need to supply values for all fields in the table.

Let's have a look at the data in the book table now:

mysql> SELECT * FROM `book`;
+----+------------------+--------+-----------+--------+----------------+---------------+-------+
| id | title | author | publisher | format | published_date | isbn | price |
+----+------------------+--------+-----------+--------+----------------+---------------+-------+
| 1 | The Broken Shore | 1 | 3 | 1 | 2010-09-27 | 9781921656774 | NULL |
| 2 | The Big Score | 2 | 4 | 1 | 2007-01-01 | 9781741752236 | 29.95 |
| 3 | Split | 3 | 2 | 1 | 2003-01-01 | 732268133 | 29.95 |
+----+------------------+--------+-----------+--------+----------------+---------------+-------+

 

Note how the auto_increment id field has been updated to create a sequential number for each new entry.

Hmm, we seem to have a few issues here.  In our last episode we created the Peter Temple book by explicitly naming the fields, and one field we left out was price.  Before we can sell that we will need to update the value, however for the moment it shows that leaving fields out will cause the database to supply a default value for us.  In the case of the price field it was not marked as 'NOT NULL', so NULL values are allowed, and we didn't give a DEFAULT so the default is NULL.  We'll look at updating data next week.

One other thing to note is that our ISBN field was defined as a numeric field - DECIMAL(13), and this has the effect of losing leading zeroes, as for a number, the initial zero does not change the value.  In a future episode we'll look at ways around this.  For now it simply serves to show that to get the database definition correct you need to analyse the data requirements in order to get your table definitions right.

Picking the fields we want

So far we have just been using SELECT * to pull all the available fields out of a table.  Now lets look at getting just the fields we want.  Title, Author and Price would be useful for a price list.

SELECT `title`, `author`, `price` FROM `book`;
+------------------+--------+-------+
| title | author | price |
+------------------+--------+-------+
| The Broken Shore | 1 | NULL |
| The Big Score | 2 | 29.95 |
| Split | 3 | 29.95 |
+------------------+--------+-------+
3 rows in set (0.00 sec)

Note that we can provide the fields we want, in the order we want them.  But 'author' isn't the name of the author, it is the id of the author field.  We can use an alias to make this explicit:

SELECT `title`, `author` AS `author_id`, `price` FROM `book`;
+------------------+-----------+-------+
| title | author_id | price |
+------------------+-----------+-------+
| The Broken Shore | 1 | NULL |
| The Big Score | 2 | 29.95 |
| Split | 3 | 29.95 |
+------------------+-----------+-------+
3 rows in set (0.00 sec)

Same data, but note the column heading.  We've effectively renamed the column for the purposes of our query. We haven't changed anything in the database, but this allows us to provide convenient names for use in clarifying data or simplifying queries.  We'll see more on this later.  But we can't provide a number to people instead of the author's name, that wouldn't look very good or make a lot of sense.

Getting data from multiple tables - JOIN

Fortunately there is a solution.  The JOIN clause.  There are a few of them, and each has a different purpose.  The simplest is the comma operator, also known as an implicit join:

SELECT `book`.`title`, `author`.`last_name` FROM `book`, `author`;
+------------------+-----------+
| title | last_name |
+------------------+-----------+
| Split | Temple |
| The Big Score | Temple |
| The Broken Shore | Temple |
| Split | Corris |
| The Big Score | Corris |
| The Broken Shore | Corris |
| Split | Moss |
| The Big Score | Moss |
| The Broken Shore | Moss |
| Split | Moss |
| The Big Score | Moss |
| The Broken Shore | Moss |
+------------------+-----------+
12 rows in set (0.00 sec)

Wait a minute! That isn't right.  We only have 3 books in the database.  What is going on?

This is one of the problems with an implicit join.  Unless you constrain the join it will be what is called a cartesian product.  That is for every row in each table, every row in every other table listed in the join will be matched, so in our case we have four authors and three books, so we get (4 * 3) = 12 results.  Clearly this isn't what we want, and is one of the reasons the comma operator is frowned upon when developing complex queries - it is far too easy to get it wrong.

Instead lets use the JOIN syntax and provide a join condition.

SELECT `book`.`title`, `author`.`last_name` FROM `book`
 JOIN `author` ON `book`.`author` = `author`.`id`;
+------------------+-----------+
| title | last_name |
+------------------+-----------+
| The Broken Shore | Temple |
| The Big Score | Corris |
| Split | Moss |
+------------------+-----------+
3 rows in set (0.00 sec)

Much better! Now we have each book and the correct author.  Oh, by the way, JOIN is a synonym for INNER JOIN, and the result shows records where records from both tables on the left and right of the JOIN match the join condition. There are other possiblities which we'll look at shortly.

We now have a way of getting the data we want out of the database.  What about where we only want books from one publisher?  This is where the WHERE clause comes in.

SELECT `book`.`title`, `author`.`last_name` FROM `book`
JOIN `author` ON `book`.`author` = `author`.`id` WHERE `publisher` = 2;
+-------+-----------+
| title | last_name |
+-------+-----------+
| Split | Moss |
+-------+-----------+
1 row in set (0.00 sec)

Lets now pull a full list of our books, including publisher and format.  This involves a number of joins, but each one builds on the last, so it isn't really a stretch.

SELECT `title`,
 CONCAT_WS(' ',`author`.`first_name`,`author`.`last_name`) AS `author`,
 `publisher`.`name` AS `publisher`,
 `format`.`name` as `format`,
 `price`, `isbn` FROM `book`
 JOIN `author` ON `book`.`author` = `author`.`id`
 JOIN `publisher` ON `book`.`publisher` = `publisher`.`id`
 JOIN `format` ON `book`.`format` = `format`.`id`;
+------------------+--------------+-----------------+-----------+-------+---------------+
| title | author | publisher | format | price | isbn |
+------------------+--------------+-----------------+-----------+-------+---------------+
| The Broken Shore | Peter Temple | Text | Paperback | NULL | 9781921656774 |
| The Big Score | Peter Corris | Allen and Unwin | Paperback | 29.95 | 9781741752236 |
| Split | Tara Moss | Harper Collins | Paperback | 29.95 | 732268133 |
+------------------+--------------+-----------------+-----------+-------+---------------+
3 rows in set (0.11 sec)

I've taken the liberty of adding a few more features to this query.  So lets go through it.   In all our multi-table queries so far we've used the full `table`.`field` syntax.  This is the preferred syntax and is in fact mandatory if there are similarly named fields in the tables in the query.  For example, we have a `name` field in both the pubisher and format tables, so simply specifying `name` would make it impossible for MySQL to determine which table you were talking about. In this query we've used just `title`, `price` and `isbn` without specifying which table.  We can do this because these fields only exist in one table, so there is no ambiguity.

Rather than pull out the author name as two separate fields, I've used the CONCAT_WS function to create a single output field from the two name fields in the author table.  CONCAT_WS is short for concatenate with separator, where concatenate means to place together.  The first argument is the string to use to place between the output fields, in our case a space, and the following fields are the data fields to concatenate.

Note that we don't have any entry for Merrilee Moss, because we don't have a book entry for her.  This is the way an INNER JOIN works.  What does this mean for getting a list of authors and their books?  Let's take a look at two ways of doing this:

SELECT CONCAT_WS(' ',`author`.`first_name`,`author`.`last_name`) AS `author`,
 `book`.`title`
 FROM `author`
 INNER JOIN `book` ON `author`.`id` = `book`.`author`;
+--------------+------------------+
| author | title |
+--------------+------------------+
| Peter Temple | The Broken Shore |
| Peter Corris | The Big Score |
| Tara Moss | Split |
+--------------+------------------+
3 rows in set (0.00 sec)

SELECT CONCAT_WS(' ',`author`.`first_name`,`author`.`last_name`) AS `author`,
 `book`.`title`
 FROM `author`
 LEFT JOIN `book` ON `author`.`id` = `book`.`author`;
+---------------+------------------+
| author | title |
+---------------+------------------+
| Peter Temple | The Broken Shore |
| Peter Corris | The Big Score |
| Tara Moss | Split |
| Merrilee Moss | NULL |
+---------------+------------------+
4 rows in set (0.00 sec)

Note that in the first version, with INNER JOIN (and remember this is a synonym for JOIN) we are seeing only those authors who have books.   The second version, using LEFT JOIN (which itself is a synonym for LEFT OUTER JOIN) we see all our authors, and those with books have their books listed.  Merrilee Moss is listed with NULL in the book title field, indicating that she has no books listed yet.  LEFT JOIN will return all records from the table to the left of the LEFT JOIN and matching records from the table to the right, but if there are no matching records a record will be created with all fields set to NULL.  There is an analagous RIGHT JOIN operator for compatibility with other systems, however in most cases you can use LEFT JOIN with the tables reversed to achieve the same result.

In our next episode we'll look at searching using WHERE, and ordering and grouping our data.

Disclaimer

I am employed by SkySQL Ab, which provides support, training and services for MySQL, MariaDB and Drizzle and is staffed by many ex-MySQL AB employees. While I try to be unbiased in all of my public opinions, and my opinions are all my own and not those of my employer, I work for SkySQL because I believe in what they are doing and therefore cannot be truly unbiased.

MySQL is a registered trademark of Oracle Corporation,  MariaDB is a trademark of MontyProgram, and Drizzle is a trademark.

4 comments

Comment from: Sheeri [Visitor]
Sheeri

I recommend naming fields in your INSERT. Not doing so saves keystrokes but at the expense of having your code break if you add a field.

Remember that folks will copy and paste your example, so you have to set a good one!

09/09/11 @ 21:51
Comment from: [Member]
aj

Thanks, Sheeri, a very good point. Hopefully I explained both the correct way and why the short version was used in that particular example, however you are correct, people will often just use the example code.

09/09/11 @ 23:09
Comment from: Jose [Visitor]
Jose

Hey,

Great guides for starters (like me) only comment so far is about the website. The link for your ‘last episode’ has an extra http// in it (so it doesn’t work unless you remove it). And on your last episode (creating the database) you don’t seem to have a hyperlink to the next episode - just text saying what will be covered. Thought I’d comment just so people who are looking for these in order will have an easier time (I googled for this page to find it when the hyperlink on the previous wasn’t working).

Thanks again!
Jose

11/02/14 @ 02:53
Comment from: [Member]
aj

Thanks, Jose, I’ve updated the links.

26/02/14 @ 18:33


Form is loading...