 | | From: | Josh Berkus | | Subject: | IN() Optimization issue in 8.0rc5 | | Date: | Sat, 15 Jan 2005 12:23:10 -0800 |
|
|
 | Tom,
Hmmm ... I'm seeing an issue with IN() optimization -- or rather the lack of it -- in 8.0rc5. It seems to me that this worked better in 7.4, although I've not been able to load this particular database and test
dm=# explain dm-# SELECT personid FROM mr.person_attributes_old dm-# WHERE personid NOT IN (SELECT personid FROM mr.person_attributes); QUERY PLAN ----------------------------------------------------------------------------------- Seq Scan on person_attributes_old (cost=0.00..3226144059.85 rows=235732 width=4) Filter: (NOT (subplan)) SubPlan -> Seq Scan on person_attributes (cost=0.00..12671.07 rows=405807 width=4) (4 rows)
dm=# explain select pao.personid from mr.person_attributes_old pao dm-# left outer join mr.person_attributes p on pao.personid = p.personid dm-# where p.personid is null; QUERY PLAN --------------------------------------------------------------------------------------------------------------- Merge Left Join (cost=0.00..34281.83 rows=471464 width=4) Merge Cond: ("outer".personid = "inner".personid) Filter: ("inner".personid IS NULL) -> Index Scan using idx_opa_person on person_attributes_old pao (cost=0.00..13789.29 rows=471464 width=4) -> Index Scan using idx_pa_person on person_attributes p (cost=0.00..14968.25 rows=405807 width=4) (5 rows)
It seems like the planner ought to recognize that the first form of the query is optimizable into the 2nd form, and that I've seen it do so in 7.4. However, *no* amount of manipulation of query parameters I did on the 1st form of the query were successful in getting the planner to recognize that it could use indexes for the IN() form of the query.
Thoughts?
-- Josh Berkus Aglio Database Solutions San Francisco
---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
|
|
 | | From: | Tom Lane | | Subject: | Re: IN() Optimization issue in 8.0rc5 | | Date: | Sat, 15 Jan 2005 15:53:37 -0500 |
|
|
 | Josh Berkus writes: > dm=# explain > dm-# SELECT personid FROM mr.person_attributes_old > dm-# WHERE personid NOT IN (SELECT > personid FROM mr.person_attributes); > QUERY PLAN > ----------------------------------------------------------------------------------- > Seq Scan on person_attributes_old (cost=0.00..3226144059.85 rows=235732 > width=4) > Filter: (NOT (subplan)) > SubPlan > -> Seq Scan on person_attributes (cost=0.00..12671.07 rows=405807 > width=4) > (4 rows)
Hmm. What you want for a NOT IN is for it to say Filter: (NOT (hashed subplan)) which you are not getting. What's the datatypes of the two personid columns? Is the 400k-row estimate for person_attributes reasonable? Maybe you need to increase work_mem (nee sort_mem) to allow a 400k-row hash table?
regards, tom lane
---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives?
http://archives.postgresql.org
|
|
 | | From: | Josh Berkus | | Subject: | Re: IN() Optimization issue in 8.0rc5 | | Date: | Sat, 15 Jan 2005 13:32:27 -0800 |
|
|
 | Tom,
> Hmm. What you want for a NOT IN is for it to say > Filter: (NOT (hashed subplan)) > which you are not getting. What's the datatypes of the two personid > columns?
INT
> Is the 400k-row estimate for person_attributes reasonable?
Yes, the estimates are completely accurate.
> Maybe you need to increase work_mem (nee sort_mem) to allow a > 400k-row hash table?
Aha, that's it. I thought I'd already set that, but apparently it was a different session. Fixed. Thanks!
-- Josh Berkus Aglio Database Solutions San Francisco
---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings
|
|