We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Don't use
NOT IN
, or any combination ofNOT
andIN
such asNOT (x IN (selectβ¦))
.Why not?
Two reasons:
1.
NOT IN
behaves in unexpected ways if there is a null present:select * from foo where col not in (1,null); -- always returns 0 rows
This happens because
col IN (1,null)
returnsTRUE
if col=1, andNULL
otherwise (i.e. it can never returnFALSE
). SinceNOT (TRUE)
isFALSE
, butNOT (NULL)
is stillNULL
, there is no way thatNOT (col IN (1,null))
(which is the same thing ascol NOT IN (1,null)
) can returnTRUE
under any circumstances.2. Because of point 1 above,
NOT IN (SELECT ...)
does not optimize very well. In particular, the planner can't transform it into an anti-join, and so it becomes either a hashed Subplan or a plain Subplan. The hashed subplan is fast, but the planner only allows that plan for small result sets; the plain subplan is horrifically slow (in fact O(NΒ²)). This means that the performance can look good in small-scale tests but then slow down by 5 or more orders of magnitude once a size threshold is crossed; you do not want this to happen.Alternative solution: In most cases, the NULL behavior of
NOT IN (SELECT β¦)
is not intentionally desired, and the query can be rewritten using**NOT EXISTS** (SELECT β¦)
:select * from foo where not exists (select from bar where foo.col = bar.x);
When should you?
NOT IN (_list,of,values,..._)
is mostly safe unless you might have a null in the list (via a parameter or otherwise). So it's sometimes natural and even advisable to use it when excluding specific constant values from a query result.
continue reading on wiki.postgresql.org
β οΈ This post links to an external website. β οΈ
If this post was enjoyable or useful for you, please share it! If you have comments, questions, or feedback, you can email my personal email. To get new posts, subscribe use the RSS feed.