In the world of relational databases, SQL is the go-to language for querying and manipulating data. MySQL provides a rich set of functions and constructs to assist developers in crafting efficient and expressive queries.
One such construct is the ROW_NUMBER()
function combined with the PARTITION BY
clause, which can be immensely useful
when dealing with complex data sets.
In this blog post, we'll delve into how to use ROW_NUMBER()
over PARTITION BY
in MySQL and explore its practical
applications with a real-world example.
ROW_NUMBER()
The ROW_NUMBER()
function in MySQL assigns a unique integer value to each row in the result set. This assigned number is determined by
the order in which rows appear in the query result. It's important to note that ROW_NUMBER()
does not directly modify
the table data but is rather a means to generate a ranking for each row within the result set.
Here is the basic syntax of the ROW_NUMBER()
function:
1ROW_NUMBER() OVER (ORDER BY column_name)
-
ROW_NUMBER()
: The function itself. -
OVER
: A clause that specifies the window frame for the function. -
(ORDER BY column_name)
: The column by which the rows will be ordered to assign the row numbers.
PARTITION BY
While ROW_NUMBER()
can be used on its own to assign a sequential number to each row in the result set, combining it
with the PARTITION BY
clause allows you to reset the
numbering for each distinct value in a specified column. This is particularly useful when you want to group your data
into partitions and assign row numbers within those partitions.
Here is the basic syntax of ROW_NUMBER()
with PARTITION BY
:
1ROW_NUMBER() OVER (PARTITION BY partition_column ORDER BY column_name)
-
PARTITION BY partition_column
: Specifies the column by which to partition the result set. TheROW_NUMBER()
function will reset the numbering for each unique value in this column. -
ORDER BY column_name
: Determines the order within each partition, which is used to assign row numbers.
Real-World Example: Document Versioning
Let's take a real-world example to illustrate the use of ROW_NUMBER()
over PARTITION BY
. Consider a scenario where
you have a database table called document_versions
, which stores multiple versions of documents.
The document_versions
table has the following structure:
1CREATE TABLE `document_versions` (
2 `id` int NOT NULL AUTO_INCREMENT,
3 `document_id` int NOT NULL,
4 `name` varchar(255) NOT NULL,
5 `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
6 PRIMARY KEY (`id`)
7);
The data in the document_versions
table looks like this:
+----+-------------+---------------+---------------------+
| id | document_id | name | created_at |
+----+-------------+---------------+---------------------+
| 1 | 1 | document1.pdf | 2023-09-13 18:07:19 |
| 2 | 2 | document2.pdf | 2023-09-13 18:08:19 |
| 3 | 1 | document1.pdf | 2023-09-13 18:10:19 |
| 4 | 2 | document2.pdf | 2023-09-13 18:12:19 |
| 5 | 2 | document2.pdf | 2023-09-13 18:14:19 |
+----+-------------+---------------+---------------------+
Each document has a unique document_id
, and you want to assign a version number to each version within the context of its associated document. Here's how you can achieve this using ROW_NUMBER()
and PARTITION BY
:
1SELECT
2 id,
3 document_id,
4 name,
5 created_at,
6 ROW_NUMBER() OVER (
7 PARTITION BY document_id ORDER BY created_at
8 ) AS version_number
9FROM
10 document_versions;
In this SQL query:
-
We select the
id
,document_id
,created_at
andname
columns from thedocument_versions
table. -
We use the
ROW_NUMBER()
function withPARTITION BY document_id
to partition the result set by thedocument_id
column. This ensures that the numbering starts fresh for each unique document. -
We specify the
ORDER BY created_at
clause to determine the order of versions within each document, which will be used for assigning version numbers. The oldest version will get the lowest number, the newest version will get the highest number.
Running the query results in the following output:
1SELECT
2 id,
3 document_id,
4 name,
5 created_at,
6 ROW_NUMBER() OVER (
7 PARTITION BY document_id ORDER BY created_at
8 ) AS version_number
9FROM
10 document_versions
11ORDER BY
12 created_at;
+----+-------------+---------------+---------------------+----------------+
| id | document_id | name | created_at | version_number |
+----+-------------+---------------+---------------------+----------------+
| 1 | 1 | document1.pdf | 2023-09-13 18:07:19 | 1 |
| 2 | 2 | document2.pdf | 2023-09-13 18:08:19 | 1 |
| 3 | 1 | document1.pdf | 2023-09-13 18:10:19 | 2 |
| 4 | 2 | document2.pdf | 2023-09-13 18:12:19 | 2 |
| 5 | 2 | document2.pdf | 2023-09-13 18:14:19 | 3 |
+----+-------------+---------------+---------------------+----------------+
5 rows in set (0.00 sec)
Even if you change the ordering of the result set, the version numbers will remain the same:
1SELECT
2 id,
3 document_id,
4 name,
5 created_at,
6 ROW_NUMBER() OVER (
7 PARTITION BY document_id ORDER BY created_at
8 ) AS version_number
9FROM
10 document_versions
11ORDER BY
12 name, created_at
+----+-------------+---------------+---------------------+----------------+
| id | document_id | name | created_at | version_number |
+----+-------------+---------------+---------------------+----------------+
| 1 | 1 | document1.pdf | 2023-09-13 18:07:19 | 1 |
| 3 | 1 | document1.pdf | 2023-09-13 18:10:19 | 2 |
| 2 | 2 | document2.pdf | 2023-09-13 18:08:19 | 1 |
| 4 | 2 | document2.pdf | 2023-09-13 18:12:19 | 2 |
| 5 | 2 | document2.pdf | 2023-09-13 18:14:19 | 3 |
+----+-------------+---------------+---------------------+----------------+
5 rows in set (0.00 sec)
Conclusion
The ROW_NUMBER()
function combined with the PARTITION BY
clause in MySQL is a powerful tool for assigning row
numbers within partitions of data. This construct is particularly valuable when you need to rank or sequence data within
specific groups or categories. In our real-world example of document versioning, it allowed us to easily assign version
numbers to each version of a document within the context of that document. By mastering this SQL construct, you can
unlock new possibilities for analyzing and organizing your data in MySQL databases.
If this post was enjoyable or useful for you, please share it! If you have comments, questions, or feedback, you can email my personal email. To get new posts, subscribe use the RSS feed.