🔗 Don't use NOT IN in PostgreSQL

Don't use NOT IN, or any combination of NOT and IN such as NOT (x IN (select…)).

Why not?

Two reasons:

1. NOT IN behaves in unexpected ways if there is a null present:
select * from foo where col not in (1,null); -- always returns 0 rows
This happens because col IN (1,null) returns TRUE if col=1, and NULL otherwise (i.e. it can never return FALSE). Since NOT (TRUE) is FALSE, but NOT (NULL) is still NULL, there is no way that NOT (col IN (1,null)) (which is the same thing as col NOT IN (1,null)) can return TRUE under any circumstances.

2. Because of point 1 above, NOT IN (SELECT ...) does not optimize very well. In particular, the planner can't transform it into an anti-join, and so it becomes either a hashed Subplan or a plain Subplan. The hashed subplan is fast, but the planner only allows that plan for small result sets; the plain subplan is horrifically slow (in fact O(N²)). This means that the performance can look good in small-scale tests but then slow down by 5 or more orders of magnitude once a size threshold is crossed; you do not want this to happen.

Alternative solution: In most cases, the NULL behavior of NOT IN (SELECT …) is not intentionally desired, and the query can be rewritten using **NOT EXISTS** (SELECT …):
select * from foo where not exists (select from bar where foo.col = bar.x);
When should you?

NOT IN (_list,of,values,..._) is mostly safe unless you might have a null in the list (via a parameter or otherwise). So it's sometimes natural and even advisable to use it when excluding specific constant values from a query result.

continue reading on wiki.postgresql.org

⚠️ This post links to an external website. ⚠️

If this post was enjoyable or useful for you, please share it! If you have comments, questions, or feedback, you can email my personal email. To get new posts, subscribe use the RSS feed.