MySQL 101 - Sorting and Searching: ORDER BY, WHERE, GROUP BY

In our last episode we were able to select some information from our bookshop database, this time we look at putting this into some semblence of order.  To fully investigate this topic we need a few more entries in our database, so rather than detail them here, I've put together this SQL file you can download and build your database to follow along.

To install the database, unpack the SQL file from its ZIP archive, and use the SOURCE command to pull the data into your database:

SOURCE mysql101_bookshop_20110912.sql;

You can also pass the file to the mysql command line interpreter from the shell:

mysql -uroot -p bookshop < mysql101_bookshop_20110912.sql
Password:

If you need a reminder of how to connect to your MySQL server, check out the second episode in this series.

Ordering data

We have already been able to use SELECT and JOIN to pull together information from our tables to get a composite display.  It would be nice to be able to order the data to make it easier to search.  Lets give that a try, ordering authors and their book titles.  In this example, we are using the CONCAT_WS function outlined in our previous episode to display the author name, but using the separate fields to sort.

SELECT CONCAT_WS(' ',`author`.`first_name`,`author`.`last_name`) AS `author`,
 `book`.`title` FROM `author`
 INNER JOIN `book` ON `author`.`id` = `book`.`author`
 ORDER BY `author`.`last_name`, `author`.`first_name`;
+--------------+--------------------+
| author | title |
+--------------+--------------------+
| Peter Corris | Aftershock |
| Peter Corris | The Azanian Action |
| Peter Corris | Appeal Denied |
| Peter Corris | The Big Score |
| Tara Moss | Hit |
| Tara Moss | Split |
| Peter Temple | Shooting Star |
| Peter Temple | The Broken Shore |
+--------------+--------------------+
8 rows in set (0.00 sec)

We now have a list sorted by author name, sorting first by last_name, then by first_name.  The ORDER BY clause takes the name of a field or comma separated list of fields to sort.  By default the sort order is ascending in the collation of the fields in question (we'll get onto collation and character sets in a later episode).  You can make this explicit by using the ASC keyword.  Alternatively you can use DESC to reverse the sort order.  We can now restructure the list to give the publication date in most recent order first.

SELECT CONCAT_WS(' ',`author`.`first_name`,`author`.`last_name`) AS `author`,
 `book`.`title`,
 YEAR(`book`.`published_date`) AS `published`
 FROM `author` INNER JOIN `book` ON `author`.`id` = `book`.`author`
 ORDER BY `book`.`published_date` DESC;
+--------------+--------------------+-----------+
| author | title | published |
+--------------+--------------------+-----------+
| Peter Temple | The Broken Shore | 2010 |
| Peter Corris | Appeal Denied | 2007 |
| Peter Corris | The Big Score | 2007 |
| Tara Moss | Hit | 2006 |
| Tara Moss | Split | 2003 |
| Peter Temple | Shooting Star | 1999 |
| Peter Corris | Aftershock | 1992 |
| Peter Corris | The Azanian Action | 1991 |
+--------------+--------------------+-----------+
8 rows in set (0.00 sec)

Using the DESC keyword has allowed us to display the data in the reverse order of date.  I've also used a date function YEAR() to just pick the year portion of the date for display.  There are other date functions (and indeed functions for most data types) to provide formatting or selection criteria for your data.  As this series progresses we will introduce many of these.

Aggregation and Grouping

What if we want to present a count of the books of a particular author?  For this we need a function to count the records, which is conveniently called COUNT(), and we need to be able to group the counts, and GROUP BY provides this functionality.  But lets take a look at why we need grouping.

SELECT CONCAT_WS(' ',`author`.`first_name`,`author`.`last_name`) AS `author`,
 COUNT(*) as `number_of_books`
 FROM `author` LEFT JOIN `book` ON `author`.`id` = `book`.`author`;
+--------------+-----------------+
| author | number_of_books |
+--------------+-----------------+
| Peter Temple | 9 |
+--------------+-----------------+
1 row in set (0.00 sec)

That isn't right.  Peter Temple doesn't have 9 books in our database,  and there are no books for Tara Moss or Peter Corris.  This is why we need grouping.  What has happened here is that the COUNT(*) function counts all the rows that meet the requirements.  In our case we have just told it to count all records, we haven't told it when to stop counting or how to associate the count with data in the record.  This is where GROUP BY helps.  It allows us to use functions that result in a summary of information, and correctly correlate it with its associated data.  A picture is worth a thousand words:

SELECT CONCAT_WS(' ',`author`.`first_name`,`author`.`last_name`) AS `author`,
 COUNT(*) as `number_of_books`
 FROM `author` LEFT JOIN `book` ON `author`.`id` = `book`.`author`
 GROUP BY `author`.`last_name`, `author`.`first_name`;
+---------------+-----------------+
| author | number_of_books |
+---------------+-----------------+
| Peter Corris | 4 |
| Merrilee Moss | 1 |
| Tara Moss | 2 |
| Peter Temple | 2 |
+---------------+-----------------+
4 rows in set (0.00 sec)

But wait a minute, this shows Merrilee Moss as having 1 book, but we know we don't have any books for her.  How is this possible?  It turns out that it is a result of two effects.  Firstly we used LEFT JOIN, which we know from our last episode that this will put out a row with all of the book table fields set to NULL.  But there is a record, and therefore COUNT(*) correctly returns that fact.  However, the record represents the absence of a book, not its presence, so we can use another effect to get the correct count.  If we supply a field name to COUNT instead of *, and that field is NULL, COUNT won't count it.  You can verify this using the price field on the book table, which we know has a null value for Peter Temple's "The Broken Shore".

SELECT COUNT(*) FROM `book`;
+----------+
| COUNT(*) |
+----------+
| 8 |
+----------+
1 row in set (0.00 sec)

SELECT COUNT(`price`) FROM `book`;
+----------------+
| COUNT(`price`) |
+----------------+
| 7 |
+----------------+
1 row in set (0.00 sec

This allows us to use a field that should exist in all books, but will be NULL when there is no matching entry in a LEFT JOIN.

SELECT CONCAT_WS(' ',`author`.`first_name`,`author`.`last_name`) AS `author`,
 COUNT(`book`.`id`) as `number_of_books`
 FROM `author` LEFT JOIN `book` ON `author`.`id` = `book`.`author`
 GROUP BY `author`.`last_name`, `author`.`first_name`;
+---------------+-----------------+
| author | number_of_books |
+---------------+-----------------+
| Peter Corris | 4 |
| Merrilee Moss | 0 |
| Tara Moss | 2 |
| Peter Temple | 2 |
+---------------+-----------------+
4 rows in set (0.00 sec)

Much better.  All that we have done is replace the * with `book`.`id` in the COUNT function.  The id field is unique to each book, and is never NULL.  When we do a LEFT JOIN and there is no matching record on the right table, however, we get the effect of having a record generated with all fields set to NULL, so we get the NULL id field which is exactly what we need in this instance.

A short word on GROUP BY.  The SQL standard requires that all fields in the SELECT statement, other than the grouping function, should appear in the GROUP BY statement.  Many other databases enforce this, although MySQL is a little more flexible in this regard.  Regardless it makes sense to stick to the standard if you want to write even remotely portable SQL code.

There are other summary or grouping functions, like AVG, MAX and MIN.  We will see more on these when we get to reporting.

Searching and constraining results

We now have the ability to sort, and group our data.  Now if we want to search, we need to be able to select just those records that match our search term.  We've already seen this in summary in our last episode, but we can explore it a little more.

SELECT CONCAT_WS(' ',`author`.`first_name`,`author`.`last_name`) AS `author`,
 `book`.`title` FROM `author`
 LEFT JOIN `book` ON `author`.`id` = `book`.`author`
 LEFT JOIN `publisher` ON `publisher`.`id` = `book`.`publisher`
 WHERE `publisher`.`name` = 'Bantam';
+--------------+---------------+
| author | title |
+--------------+---------------+
| Peter Corris | Aftershock |
| Peter Temple | Shooting Star |
+--------------+---------------+
2 rows in set (0.00 sec)

Note that to search on a publisher name, we need to make sure it is JOINed in the query, otherwise there is no way we can associate the name with the books from that publisher.   What about an alphabetical listing of titles?  Can we do that?  Sure, we could, for instance find all those books that start with the letter A in their title.

SELECT `book`.`title` FROM `book`
 WHERE `book`.`title` LIKE 'A%'
 ORDER BY `book`.`title`;
+---------------+
| title |
+---------------+
| Aftershock |
| Appeal Denied |
+---------------+
2 rows in set (0.00 sec)

Here we are using the LIKE keyword to find those that have an A as the first character.  The percent sign '%' is used as a wildcard to indicate all matching fields.  When using the % wildcard you need to be aware that if the percent is used at the start, MySQL will not be able to use an index, and will instead need to match against every row (a full table scan).  With a large data set this can be very slow, and should be avoided.

Unless the field you are searching is using a binary collation both LIKE and = perform case insensitive searches.  I.e. both "LIKE 'a%'" and "LIKE 'A%'" will produce identical results.

With GROUP BY we can also use the HAVING keyword to provide constraints.

SELECT CONCAT_WS(' ',`author`.`first_name`,`author`.`last_name`) AS `author`,
 COUNT(`book`.`id`) as `number_of_books`
 FROM `author` LEFT JOIN `book` ON `author`.`id` = `book`.`author`
 GROUP BY `author`.`last_name`, `author`.`first_name`
 HAVING `number_of_books` > 2;
+--------------+-----------------+
| author | number_of_books |
+--------------+-----------------+
| Peter Corris | 4 |
+--------------+-----------------+
1 row in set (0.00 sec)

Note that we could not have put the constraint on the number of books in a where clause, as it would be evaluated before the group by, and hence before we had a count.  Also note that we have used the field alias in the HAVING clause.

That's probably enough for now.  In our next episode we'll look at updating data and changing table definitions.

Disclaimer

I am employed by SkySQL Ab, which provides support, training and services for MySQL, MariaDB and Drizzle and is staffed by many ex-MySQL AB employees. While I try to be unbiased in all of my public opinions, and my opinions are all my own and not those of my employer, I work for SkySQL because I believe in what they are doing and therefore cannot be truly unbiased.

MySQL is a registered trademark of Oracle Corporation,  MariaDB is a trademark of MontyProgram, and Drizzle is a trademark.

4 comments

Comment from: francis tudlong [Visitor] Email
francis tudlongHello i have this millions of records and im having problems with the retrieval of the data because its very slow. even for just fetching a single table what is the problem and what is the solution? any idea thanks in advance
13/09/11 @ 19:52
Comment from: ajdonnison [Member] Email
ajdonnisonFrancis, it is nearly impossible to answer the question without knowing more about your setup, and this probably isn't the right forum. Millions of records should not be an issue, there are plenty of sites out there with many millions of records in MySQL databases not having issues with single table queries.

You really should first check that you are using indexes effectively. This is pre-empting a future episode, however you can use the EXPLAIN syntax to find out how the query is likely to use indexes. The episode "Creating your first database" does try to explain the use of indexes and why you need them. You might want to take another look at that.

If you want more information, there are plenty of places where you can get some interactive help. There is the #mysql channel on irc.freenode.net, and the official forums site at http://forums.mysql.com/.
13/09/11 @ 20:09
Comment from: David [Visitor]
DavidHello,

in the previous episode to this, a SELECT query was made with 3 joins, namely:

SELECT `title`,
CONCAT_WS(' ',`author`.`first_name`,`author`.`last_name`) AS `author`,
`publisher`.`name` AS `publisher`,
`format`.`name` as `format`,
`price`, `isbn` FROM `book`
JOIN `author` ON `book`.`author` = `author`.`id`
JOIN `publisher` ON `book`.`publisher` = `publisher`.`id`
JOIN `format` ON `book`.`format` = `format`.`id`;

As I understand it, the tables `author`, `publisher`, and `format` each have a separate link back to the book table. Is that correct? Now in this episode you have the query:

SELECT CONCAT_WS(' ',`author`.`first_name`,`author`.`last_name`) AS `author`,
`book`.`title` FROM `author`
LEFT JOIN `book` ON `author`.`id` = `book`.`author`
LEFT JOIN `publisher` ON `publisher`.`id` = `book`.`publisher`
WHERE `publisher`.`name` = 'Bantam';

I can see in the first join that the `book` table is linked back to the `author` table. But the second join to `publisher` is a join to the `book` table. I the thought that with multiple joins all tables following JOIN statements had to link back to the initital table referred to in the FROM statement.
08/08/14 @ 10:09
Comment from: ajdonnison [Member] Email
ajdonnisonHi David,

The JOIN statement creates a complex statement that builds on previous components. If you look at the first join (between author and book) you now have effectively a meta-table that includes fields from both. The next join needs only to reference fields from the meta-table, which in this case includes fields from book as well as author.

This is no to say that you couldn't restructure the query to match your supposition, however you'd need to think carefully about the direction (LEFT/RIGHT) of the join as I'm using LEFT JOIN knowing that it will always provide me with a result provided there is a valid author, regardless of if that author has a book or publisher. If I structured the query such that book was before the FROM then I would need a valid book to list an author.

Note that I'm using the term meta-table as a convenience. The underlying database doesn't need to create a separate structure.
11/08/14 @ 16:47

This post has 1 feedback awaiting moderation...

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)
This is a captcha-picture. It is used to prevent mass-access by robots.
Please enter the characters from the image above. (case insensitive)