MySQL ROW_NUMBER()

Have you ever been in a situation where you are selecting records from a database that need to be ranked, but the column(s) you’re attempting to ORDER BY are not unique. For example, using the Orders table below, how could you display a distinct list of customers and their last order? The problem is that the date field contains only the date, but not the time. Therefore, it’s possible that two orders can be placed by the same customer on the same day. Now if we had a field defined as AUTO_INCREMENT in MySQL or IDENTITY in Microsoft SQL Server and the records were entered sequentially, this would be a simple task.

Orders Table

Example:
Customer OrderDate Amount
Jane 2011-01-05 12
Jane 2011-01-07 15
Jane 2011-01-07 17
John 2011-01-01 11
John 2011-01-02 27
John 2011-01-02 13
Pat 2011-02-05 5
Pat 2011-02-07 34
Pat 2011-02-07 12

This can be solved in MS SQL , and with a little more code in MySQL , as well. In MS SQL Server 2005+, uniquely identifying the above records is a breeze using the ROW_NUMBER() function. If your not familiar withROW_NUMBER(), MSDN defines theT-SQL ROW_NUMBER()as“Returns the sequential number of a row within a partition of a result set, starting at 1 for the first row in each partition.”

So, in order to display a distinct list of customers and uniquely identify their last order, we could write something like:

SELECT  ROW_NUMBER() OVER (PARTITION BY Customer ORDER BY OrderDate DESC) AS RowNumber
       ,Customer
       ,OrderDate
       ,Amount
  FROM Orders
Results:
RowNumber Customer OrderDate Amount
1 Jane 2011-01-07 15
2 Jane 2011-01-07 17
3 Jane 2011-01-05 12
1 John 2011-01-02 27
2 John 2011-01-02 13
3 John 2011-01-01 11
1 Pat 2011-02-07 34
2 Pat 2011-02-07 12
3 Pat 2011-02-05 5

Notice how a unique row number is now apparent on each row within the partition. The next step would be to encompasses the statement in a sub query or Common Table Expression (CTE), and filter out the unwanted records based on the generated row number.

SELECT  Customer
       ,OrderDate
       ,Amount
  FROM
      (
        SELECT  ROW_NUMBER() OVER (PARTITION BY Customer ORDER BY OrderDate DESC) AS RowNumber
               ,Customer
               ,OrderDate
               ,Amount
          FROM Orders
      ) subquery WHERE RowNumber = 1
Results:
Customer OrderDate Amount
Jane 2011-01-07 15
John 2011-01-02 27
Pat 2011-02-07 34

The result is a single record for each customer, even when the customer has more than one order on the same day.


MySQL Implementation

Recently, I ran into a similar situation on a WordPress implementation, but with Horses not Orders/Customers. The requirement was to display each horse’s last workout and next race. MySQL does not have a ROW_NUMBER()function. However, MySQL does allow for inline assignment of variables and the ability to reassign and reference those variables as the query is working its way through the execution. This allows the same functionality thatROW_NUMBER() provides to be achieved in MySQL. Sticking to the same example used above the MySQL solution would be.

SELECT  @row_num := IF(@prev_value=o.Customer,@row_num+1,1) AS RowNumber
       ,o.Customer
       ,o.OrderDate
       ,o.Amount
       ,@prev_value := o.Customer
  FROM Orders o,
      (SELECT @row_num := 1) x,
      (SELECT @prev_value := '') y
  ORDER BY o.Customer, o.OrderDate DESC
Results:
RowNumber Customer OrderDate Amount
1 Jane 2011-01-07 15
2 Jane 2011-01-07 17
3 Jane 2011-01-05 12
1 John 2011-01-02 27
2 John 2011-01-02 13
3 John 2011-01-01 11
1 Pat 2011-02-07 34
2 Pat 2011-02-07 12
3 Pat 2011-02-05 5

A unique row number is now apparent on each row with in the partition. The @row_num variable holds the current row number and the @pev_value variable holds the current value of the partition by field. The variables are defined and assigned a default value with in subqueries. The Orders table and the subqueries are then combined in a single select statement. The @row_num variable is incremented by one until the @prev_value does not equal the Customer and is then reset back to one.

Important

  • The @row_num variable must be set before the @prev_value variable
  • The first field in the ORDER BY must be the field that you are partitioning by
  • The default value assigned to the @prev_value variable must not exist in the partition by field

As we did with the MS SQL ROW_NUMBER() example, we will need to encompass the statement in a sub query in order to filter based on the generated row number.

SELECT  Customer
       ,OrderDate
       ,Amount
  FROM
     (
      SELECT  @row_num := IF(@prev_value=o.Customer,@row_num+1,1) AS RowNumber
             ,o.Customer 
             ,o.OrderDate
             ,o.Amount
             ,@prev_value := o.Customer
        FROM Orders o,
             (SELECT @row_num := 1) x,
             (SELECT @prev_value := '') y
       ORDER BY o.Customer, o.OrderDate DESC
     ) subquery
 WHERE RowNumber = 1
Results:
Customer OrderDate Amount
Jane 2011-01-07 15
John 2011-01-02 27
Pat 2011-02-07 34

The result is a single record for each customer, even when the customer had more than one order on the same day.

The best solution is to avoid being in this situation in the first place. However an existing data schema does not always lend itself to new requirements.