| knowledge-database (beta) |
 |
Current group: pgsql.performance
index scan of whole table, can't see why
| Dan Langille | | Ragnar Hafstaš | | Dan Langille | | Stephan Szabo | | Dan Langille | | Dan Langille | | Dan Langille | | Russell Smith | | Stephan Szabo |
|
|
 | | From: | Dan Langille | | Subject: | index scan of whole table, can't see why | | Date: | Wed, 19 Jan 2005 20:37:59 -0500 |
|
|
 | Hi folks,
Running on 7.4.2, recently vacuum analysed the three tables in question.
The query plan in question changes dramatically when a WHERE clause changes from ports.broken to ports.deprecated. I don't see why. Well, I do see why: a sequential scan of a 130,000 rows. The query goes from 13ms to 1100ms because the of this. The full plans are at http://rafb.net/paste/results/v8ccvQ54.html
I have tried some tuning by:
set effective_cache_size to 4000, was 1000 set random_page_cost to 1, was 4
The resulting plan changes, but no speed improvment, are at http://rafb.net/paste/results/rV8khJ18.html
Any suggestions please?
-- Dan Langille : http://www.langille.org/ BSDCan - The Technical BSD Conference - http://www.bsdcan.org/
---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives?
http://archives.postgresql.org
|
|
 | | From: | Ragnar Hafstaš | | Subject: | Re: index scan of whole table, can't see why | | Date: | Thu, 20 Jan 2005 09:34:29 +0000 |
|
|
 | On Wed, 2005-01-19 at 20:37 -0500, Dan Langille wrote: > Hi folks, > > Running on 7.4.2, recently vacuum analysed the three tables in > question. > > The query plan in question changes dramatically when a WHERE clause > changes from ports.broken to ports.deprecated. I don't see why. > Well, I do see why: a sequential scan of a 130,000 rows. The query > goes from 13ms to 1100ms because the of this. The full plans are at > http://rafb.net/paste/results/v8ccvQ54.html > > I have tried some tuning by: > > set effective_cache_size to 4000, was 1000 > set random_page_cost to 1, was 4 > > The resulting plan changes, but no speed improvment, are at > http://rafb.net/paste/results/rV8khJ18.html >
this just confirms that an indexscan is not always better than a tablescan. by setting random_page_cost to 1, you deceiving the planner into thinking that the indexscan is almost as effective as a tablescan.
> Any suggestions please?
did you try to increase sort_mem ?
gnari
---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faqs/FAQ.html
|
|
 | | From: | Dan Langille | | Subject: | Re: index scan of whole table, can't see why | | Date: | Thu, 20 Jan 2005 06:56:20 -0500 |
|
|
 | On 20 Jan 2005 at 9:34, Ragnar Hafstaš wrote:
> On Wed, 2005-01-19 at 20:37 -0500, Dan Langille wrote: > > Hi folks, > > > > Running on 7.4.2, recently vacuum analysed the three tables in > > question. > > > > The query plan in question changes dramatically when a WHERE clause > > changes from ports.broken to ports.deprecated. I don't see why. > > Well, I do see why: a sequential scan of a 130,000 rows. The query > > goes from 13ms to 1100ms because the of this. The full plans are at > > http://rafb.net/paste/results/v8ccvQ54.html > > > > I have tried some tuning by: > > > > set effective_cache_size to 4000, was 1000 > > set random_page_cost to 1, was 4 > > > > The resulting plan changes, but no speed improvment, are at > > http://rafb.net/paste/results/rV8khJ18.html > > > > this just confirms that an indexscan is not always better than a > tablescan. by setting random_page_cost to 1, you deceiving the > planner into thinking that the indexscan is almost as effective > as a tablescan. > > > Any suggestions please? > > did you try to increase sort_mem ?
I tried sort_mem = 4096 and then 16384. This did not make a difference. See http://rafb.net/paste/results/AVDqEm55.html
Thank you. -- Dan Langille : http://www.langille.org/ BSDCan - The Technical BSD Conference - http://www.bsdcan.org/
---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives?
http://archives.postgresql.org
|
|
 | | From: | Stephan Szabo | | Subject: | Re: index scan of whole table, can't see why | | Date: | Thu, 20 Jan 2005 06:14:31 -0800 (PST) |
|
|
 | On Wed, 19 Jan 2005, Dan Langille wrote:
> Hi folks, > > Running on 7.4.2, recently vacuum analysed the three tables in > question. > > The query plan in question changes dramatically when a WHERE clause > changes from ports.broken to ports.deprecated. I don't see why. > Well, I do see why: a sequential scan of a 130,000 rows. The query > goes from 13ms to 1100ms because the of this. The full plans are at > http://rafb.net/paste/results/v8ccvQ54.html > > I have tried some tuning by: > > set effective_cache_size to 4000, was 1000 > set random_page_cost to 1, was 4 > > The resulting plan changes, but no speed improvment, are at > http://rafb.net/paste/results/rV8khJ18.html > > Any suggestions please?
As a question, what does it do if enable_hashjoin is false? I'm wondering if it'll pick a nested loop for that step for the element/ports join and what it estimates the cost to be.
---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives?
http://archives.postgresql.org
|
|
 | | From: | Dan Langille | | Subject: | Re: index scan of whole table, can't see why | | Date: | Thu, 20 Jan 2005 09:40:21 -0500 |
|
|
 | On 20 Jan 2005 at 6:14, Stephan Szabo wrote:
> On Wed, 19 Jan 2005, Dan Langille wrote: > > > Hi folks, > > > > Running on 7.4.2, recently vacuum analysed the three tables in > > question. > > > > The query plan in question changes dramatically when a WHERE clause > > changes from ports.broken to ports.deprecated. I don't see why. > > Well, I do see why: a sequential scan of a 130,000 rows. The query > > goes from 13ms to 1100ms because the of this. The full plans are at > > http://rafb.net/paste/results/v8ccvQ54.html > > > > I have tried some tuning by: > > > > set effective_cache_size to 4000, was 1000 > > set random_page_cost to 1, was 4 > > > > The resulting plan changes, but no speed improvment, are at > > http://rafb.net/paste/results/rV8khJ18.html > > > > Any suggestions please? > > As a question, what does it do if enable_hashjoin is false? I'm wondering > if it'll pick a nested loop for that step for the element/ports join and > what it estimates the cost to be.
With enable_hashjoin = false, no speed improvement. Execution plan at http://rafb.net/paste/results/qtSFVM72.html
thanks -- Dan Langille : http://www.langille.org/ BSDCan - The Technical BSD Conference - http://www.bsdcan.org/
---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
|
|
 | | From: | Dan Langille | | Subject: | Re: index scan of whole table, can't see why | | Date: | Thu, 20 Jan 2005 10:36:04 -0500 |
|
|
 | On 20 Jan 2005 at 7:26, Stephan Szabo wrote:
> On Thu, 20 Jan 2005, Dan Langille wrote: > > > On 20 Jan 2005 at 6:14, Stephan Szabo wrote: > > > > > On Wed, 19 Jan 2005, Dan Langille wrote: > > > > > > > Hi folks, > > > > > > > > Running on 7.4.2, recently vacuum analysed the three tables in > > > > question. > > > > > > > > The query plan in question changes dramatically when a WHERE clause > > > > changes from ports.broken to ports.deprecated. I don't see why. > > > > Well, I do see why: a sequential scan of a 130,000 rows. The query > > > > goes from 13ms to 1100ms because the of this. The full plans are at > > > > http://rafb.net/paste/results/v8ccvQ54.html > > > > > > > > I have tried some tuning by: > > > > > > > > set effective_cache_size to 4000, was 1000 > > > > set random_page_cost to 1, was 4 > > > > > > > > The resulting plan changes, but no speed improvment, are at > > > > http://rafb.net/paste/results/rV8khJ18.html > > > > > > > > Any suggestions please? > > > > > > As a question, what does it do if enable_hashjoin is false? I'm wondering > > > if it'll pick a nested loop for that step for the element/ports join and > > > what it estimates the cost to be. > > > > With enable_hashjoin = false, no speed improvement. Execution plan > > at http://rafb.net/paste/results/qtSFVM72.html > > Honestly I expected it to be slower (which it was), but I figured it's > worth seeing what alternate plans it'll generate (specifically to see how > it cost a nested loop on that join to compare to the fast plan). > Unfortunately, it generated a merge join, so I think it might require both > enable_hashjoin=false and enable_mergejoin=false to get it which is likely > to be even slower in practice but still may be useful to see.
Setting both to false gives a dramatic performance boost. See http://rafb.net/paste/results/b70KAi42.html
This gives suitable speed, but why does the plan vary so much with such a minor change in the WHERE clause? -- Dan Langille : http://www.langille.org/ BSDCan - The Technical BSD Conference - http://www.bsdcan.org/
---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster
|
|
 | | From: | Dan Langille | | Subject: | Re: index scan of whole table, can't see why | | Date: | Thu, 20 Jan 2005 19:55:20 -0500 |
|
|
 | On 21 Jan 2005 at 8:38, Russell Smith wrote:
> On Fri, 21 Jan 2005 02:36 am, Dan Langille wrote: > > On 20 Jan 2005 at 7:26, Stephan Szabo wrote: > > [snip] > > > Honestly I expected it to be slower (which it was), but I figured > > > it's worth seeing what alternate plans it'll generate > > > (specifically to see how it cost a nested loop on that join to > > > compare to the fast plan). Unfortunately, it generated a merge > > > join, so I think it might require both enable_hashjoin=false and > > > enable_mergejoin=false to get it which is likely to be even slower > > > in practice but still may be useful to see. > > > > Setting both to false gives a dramatic performance boost. See > > http://rafb.net/paste/results/b70KAi42.html > > > -> Materialize (cost=15288.70..15316.36 rows=2766 width=35) > (actual time=0.004..0.596 rows=135 loops=92) > -> Nested Loop (cost=0.00..15288.70 rows=2766 > width=35) (actual time=0.060..9.130 rows=135 loops=1) > > The Planner here has a quite inaccurate guess at the number of rows > that will match in the join. An alternative to turning off join types > is to up the statistics on the Element columns because that's where > the join is happening. Hopefully the planner will get a better idea. > However it may not be able too. 2766 rows vs 135 is quite likely to > choose different plans. As you can see you have had to turn off two > join types to give something you wanted/expected.
Fair comment. However, the statistics on ports.element_id, ports.deprecated, ports.broken, and element.id are both set to 1000.
> > This gives suitable speed, but why does the plan vary so much with > > such a minor change in the WHERE clause? > Plan 1 - broken > -> Nested Loop (cost=0.00..3825.30 rows=495 width=35) (actual > time=0.056..16.161 rows=218 loops=1) > > Plan 2 - deprecated > -> Hash Join (cost=3676.78..10144.06 rows=2767 width=35) > (actual time=7.638..1158.128 rows=135 loops=1) > > The performance difference is when the where is changed, you have a > totally different set of selection options. The Plan 1 and Plan 2 > shown from your paste earlier, report that you are out by a factor of > 2 for plan 1. But for plan 2 its a factor of 20. The planner is > likely to make the wrong choice when the stats are out by that factor. > > Beware what is a small "typing" change does not mean they queries are > anything alight.
Agreed. I just did not expect such a dramatic change which a result set that is similar. Actually, they aren't that similar at all.
Thank you. -- Dan Langille : http://www.langille.org/ BSDCan - The Technical BSD Conference - http://www.bsdcan.org/
---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
|
|
 | | From: | Russell Smith | | Subject: | Re: index scan of whole table, can't see why | | Date: | Fri, 21 Jan 2005 08:38:19 +1100 |
|
|
 | On Fri, 21 Jan 2005 02:36 am, Dan Langille wrote: > On 20 Jan 2005 at 7:26, Stephan Szabo wrote:
[snip] > > Honestly I expected it to be slower (which it was), but I figured it's > > worth seeing what alternate plans it'll generate (specifically to see how > > it cost a nested loop on that join to compare to the fast plan). > > Unfortunately, it generated a merge join, so I think it might require both > > enable_hashjoin=false and enable_mergejoin=false to get it which is likely > > to be even slower in practice but still may be useful to see. > > Setting both to false gives a dramatic performance boost. See > http://rafb.net/paste/results/b70KAi42.html > -> Materialize (cost=15288.70..15316.36 rows=2766 width=35) (actual time=0.004..0.596 rows=135 loops=92) -> Nested Loop (cost=0.00..15288.70 rows=2766 width=35) (actual time=0.060..9.130 rows=135 loops=1)
The Planner here has a quite inaccurate guess at the number of rows that will match in the join. An alternative to turning off join types is to up the statistics on the Element columns because that's where the join is happening. Hopefully the planner will get a better idea. However it may not be able too. 2766 rows vs 135 is quite likely to choose different plans. As you can see you have had to turn off two join types to give something you wanted/expected.
> This gives suitable speed, but why does the plan vary so much with > such a minor change in the WHERE clause? Plan 1 - broken -> Nested Loop (cost=0.00..3825.30 rows=495 width=35) (actual time=0.056..16.161 rows=218 loops=1)
Plan 2 - deprecated -> Hash Join (cost=3676.78..10144.06 rows=2767 width=35) (actual time=7.638..1158.128 rows=135 loops=1)
The performance difference is when the where is changed, you have a totally different set of selection options. The Plan 1 and Plan 2 shown from your paste earlier, report that you are out by a factor of 2 for plan 1. But for plan 2 its a factor of 20. The planner is likely to make the wrong choice when the stats are out by that factor.
Beware what is a small "typing" change does not mean they queries are anything alight.
Regards
Russell Smith.
---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
|
|
 | | From: | Stephan Szabo | | Subject: | Re: index scan of whole table, can't see why | | Date: | Thu, 20 Jan 2005 07:26:37 -0800 (PST) |
|
|
 | On Thu, 20 Jan 2005, Dan Langille wrote:
> On 20 Jan 2005 at 6:14, Stephan Szabo wrote: > > > On Wed, 19 Jan 2005, Dan Langille wrote: > > > > > Hi folks, > > > > > > Running on 7.4.2, recently vacuum analysed the three tables in > > > question. > > > > > > The query plan in question changes dramatically when a WHERE clause > > > changes from ports.broken to ports.deprecated. I don't see why. > > > Well, I do see why: a sequential scan of a 130,000 rows. The query > > > goes from 13ms to 1100ms because the of this. The full plans are at > > > http://rafb.net/paste/results/v8ccvQ54.html > > > > > > I have tried some tuning by: > > > > > > set effective_cache_size to 4000, was 1000 > > > set random_page_cost to 1, was 4 > > > > > > The resulting plan changes, but no speed improvment, are at > > > http://rafb.net/paste/results/rV8khJ18.html > > > > > > Any suggestions please? > > > > As a question, what does it do if enable_hashjoin is false? I'm wondering > > if it'll pick a nested loop for that step for the element/ports join and > > what it estimates the cost to be. > > With enable_hashjoin = false, no speed improvement. Execution plan > at http://rafb.net/paste/results/qtSFVM72.html
Honestly I expected it to be slower (which it was), but I figured it's worth seeing what alternate plans it'll generate (specifically to see how it cost a nested loop on that join to compare to the fast plan). Unfortunately, it generated a merge join, so I think it might require both enable_hashjoin=false and enable_mergejoin=false to get it which is likely to be even slower in practice but still may be useful to see.
---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
|
|
|
| | |
|
 |