We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
When working with versioned records in a SQL database, it's common to need a query that retrieves only the latest version of each record. Suppose we have a table called post_revisions
with the following structure:
CREATE TABLE post_revisions (
id INT AUTO_INCREMENT PRIMARY KEY,
post_id INT NOT NULL,
revision_number INT NOT NULL,
UNIQUE KEY (post_id, revision_number)
);
Each post_id
can have multiple revisions, and we want to retrieve only the id
of the record with the highest revision_number
for each post_id
.
Using a Subquery with JOIN
One of the most efficient ways to achieve this is by using a JOIN
with a subquery:
SELECT pr.id
FROM post_revisions pr
JOIN (
SELECT post_id, MAX(revision_number) AS max_version
FROM post_revisions
GROUP BY post_id
) latest ON pr.post_id = latest.post_id AND pr.revision_number = latest.max_version;
Explanation:
- The subquery (
latest
) retrieves the highestrevision_number
for eachpost_id
. - We then
JOIN
this result back to thepost_revisions
table to get the correspondingid
. - The result will contain only the
id
values of the latest versions for each post.
Alternative: Using a WHERE
clause with a correlated subquery
Another approach is to use a correlated subquery in the WHERE
clause:
SELECT id
FROM post_revisions pr
WHERE revision_number = (
SELECT MAX(revision_number)
FROM post_revisions
WHERE post_id = pr.post_id
);
Comparison:
- The
JOIN
approach is generally more efficient for large datasets since it avoids multiple subquery executions. - The
WHERE
clause approach is easier to read but may perform worse in some cases, depending on indexing and dataset size.
Alternative: using CTE (common table expressions)
You can also use common table expressions to get the same result:
WITH post_revisions AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY post_id ORDER BY revision_number DESC) as pid
FROM post_revisions
)
SELECT * FROM post_revisions WHERE pid = 1
Performance Considerations
To optimize these queries, ensure that you have the following indexes:
CREATE INDEX idx_post_id_version ON post_revisions (post_id, revision_number);
This allows the database to efficiently look up the latest version for each post_id
, improving query performance.
Conclusion
When retrieving the latest version of records in a SQL database, using a JOIN
with a subquery is often the best approach for performance and readability. However, for simpler cases, a correlated subquery may also work well. Indexing your post_id
and revision_number
columns will further enhance query performance.
Do you have other approaches or optimizations? Feel free to share your thoughts!
If this post was enjoyable or useful for you, please share it! If you have comments, questions, or feedback, you can email my personal email. To get new posts, subscribe use the RSS feed.