knowledge-database (beta)

Current group: pgsql.performance

Optimizing this count query

Optimizing this count query  
Andrei Bintintan
 Re: Optimizing this count query  
Tom Lane
 Re: Optimizing this count query  
Andrei Bintintan
From:Andrei Bintintan
Subject:Optimizing this count query
Date:Mon, 17 Jan 2005 18:58:09 +0200
This is a multi-part message in MIME format.

------=_NextPart_000_0160_01C4FCC6.7CA02250
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Hi to all,=20

I have a query which counts how many elements I have in the database.

SELECT count(o.id) FROM orders o
INNER JOIN report r ON o.id=3Dr.id_order
INNER JOIN status s ON o.id_status=3Ds.id
INNER JOIN contact c ON o.id_ag=3Dc.id
INNER JOIN endkunde e ON o.id_endkunde=3De.id
INNER JOIN zufriden z ON r.id_zufriden=3Dz.id
INNER JOIN plannung v ON v.id=3Do.id_plannung
INNER JOIN mpsworker w ON v.id_worker=3Dw.id
INNER JOIN person p ON p.id =3D w.id_person
WHERE o.id_status>3=20

In the tables are not quite so many rows (~ 100000).

I keep the joins because in the where clause there can be also other =
search elemens which are searched in the other tables.=20
Now the id_status from the orders table (>3) can be 4 or 6. The =
id_status=3D6 has the most bigger percentage (4 =3D 10%, 6 =3D 70% and =
the rest are other statuses < 4). I think this is why the planner uses=20

I'm asking how can I improve the execution time of this query, because =
these tables are always increasing. And this count sometimes takes more =
than 10 secs and I need to run this count very offen.

Best regards,=20
Andy.


The explain:
Aggregate (cost=3D37931.33..37931.33 rows=3D1 width=3D4)
-> Hash Join (cost=3D27277.86..37828.45 rows=3D41154 width=3D4)
Hash Cond: ("outer".id_person =3D "inner".id)
-> Hash Join (cost=3D27269.79..37100.18 rows=3D41153 =
width=3D8)
Hash Cond: ("outer".id_worker =3D "inner".id)
-> Hash Join (cost=3D27268.28..36378.50 rows=3D41152 =
width=3D8)
Hash Cond: ("outer".id_endkunde =3D "inner".id)
-> Hash Join (cost=3D25759.54..33326.98 =
rows=3D41151 width=3D12)
Hash Cond: ("outer".id_ag =3D "inner".id)
-> Hash Join (cost=3D25587.07..32331.51 =
rows=3D41150 width=3D16)
Hash Cond: ("outer".id_status =3D =
"inner".id)
-> Hash Join =
(cost=3D25586.00..31713.18 rows=3D41150 width=3D20)
Hash Cond: ("outer".id_zufriden =
=3D "inner".id)
-> Hash Join =
(cost=3D25584.85..31094.78 rows=3D41150 width=3D24)
Hash Cond: =
("outer".id_plannung =3D "inner".id)
-> Hash Join =
(cost=3D24135.60..27869.53 rows=3D41149 width=3D24)
Hash Cond: ("outer".id =
=3D "inner".id_order)
-> Seq Scan on orders =
o (cost=3D0.00..2058.54 rows=3D42527 width=3D20)
Filter: =
(id_status > 3)
-> Hash =
(cost=3D23860.48..23860.48 rows=3D42848 width=3D8)
-> Seq Scan on =
report r (cost=3D0.00..23860.48 rows=3D42848 width=3D8)
-> Hash =
(cost=3D1050.80..1050.80 rows=3D62180 width=3D8)
-> Seq Scan on =
plannung v (cost=3D0.00..1050.80 rows=3D62180 width=3D8)
-> Hash (cost=3D1.12..1.12 =
rows=3D12 width=3D4)
-> Seq Scan on zufriden z =
(cost=3D0.00..1.12 rows=3D12 width=3D4)
-> Hash (cost=3D1.06..1.06 rows=3D6 =
width=3D4)
-> Seq Scan on status s =
(cost=3D0.00..1.06 rows=3D6 width=3D4)
-> Hash (cost=3D161.57..161.57 rows=3D4357 =
width=3D4)
-> Seq Scan on contact c =
(cost=3D0.00..161.57 rows=3D4357 width=3D4)
-> Hash (cost=3D1245.99..1245.99 rows=3D44299 =
width=3D4)
-> Seq Scan on endkunde e =
(cost=3D0.00..1245.99 rows=3D44299 width=3D4)
-> Hash (cost=3D1.41..1.41 rows=3D41 width=3D8)
-> Seq Scan on mpsworker w (cost=3D0.00..1.41 =
rows=3D41 width=3D8)
-> Hash (cost=3D7.66..7.66 rows=3D166 width=3D4)
-> Seq Scan on person p (cost=3D0.00..7.66 rows=3D166 =
width=3D4)
------=_NextPart_000_0160_01C4FCC6.7CA02250
Content-Type: text/html;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable



charset=3Diso-8859-1">




color=3D#000000>Hi to=20
all,

 

I have a query which counts how many elements I have =
in the=20
database.

 

SELECT count(o.id) FROM orders=20
o
      INNER JOIN report r ON=20
o.id=3Dr.id_order
      INNER JOIN =
status s ON=20
o.id_status=3Ds.id
      INNER JOIN =
contact c ON=20
o.id_ag=3Dc.id
      INNER JOIN =
endkunde e ON=20
o.id_endkunde=3De.id
      INNER JOIN =
zufriden z=20
ON r.id_zufriden=3Dz.id
      INNER =
JOIN plannung=20
v ON v.id=3Do.id_plannung
      INNER =
JOIN=20
mpsworker w ON =
v.id_worker=3Dw.id
      INNER=20
JOIN person p ON p.id =3D =
w.id_person
      WHERE=20
o.id_status>3

 

In the tables are not quite so many =
rows (~=20
100000).

 

I keep the joins because in the where =
clause there=20
can be also other search elemens which are searched in the other tables. =


Now the id_status from the orders table =
(>3) can=20
be 4 or 6. The id_status=3D6 has the most bigger percentage (4 =3D =
10%, 6 =3D 70%=20
and the rest are other statuses < 4). I think this is why the planner =
uses=20

 

I'm asking how can I improve the =
execution time of=20
this query, because these tables are always increasing. And this count =
sometimes=20
takes more than 10 secs and I need to run this count very =
offen.

 

Best regards,

Andy.

 

 

The explain:

Aggregate  =
(cost=3D37931.33..37931.33 rows=3D1=20
width=3D4)
  ->  Hash Join  =
(cost=3D27277.86..37828.45=20
rows=3D41154 width=3D4)
        =
Hash Cond:=20
("outer".id_person =3D =
"inner".id)
       =20
->  Hash Join  (cost=3D27269.79..37100.18 rows=3D41153=20
width=3D8)
          =
;   =20
Hash Cond: ("outer".id_worker =3D=20
"inner".id)
         &nbs=
p;   =20
->  Hash Join  (cost=3D27268.28..36378.50 rows=3D41152=20
width=3D8)
          =
;         =20
Hash Cond: ("outer".id_endkunde =3D=20
"inner".id)
         &nbs=
p;         =20
->  Hash Join  (cost=3D25759.54..33326.98 rows=3D41151=20
width=3D12)
         &nbs=
p;            =
;   =20
Hash Cond: ("outer".id_ag =3D=20
"inner".id)
         &nbs=
p;            =
;   =20
->  Hash Join  (cost=3D25587.07..32331.51 rows=3D41150=20
width=3D16)
         &nbs=
p;            =
;         =20
Hash Cond: ("outer".id_status =3D=20
"inner".id)
         &nbs=
p;            =
;         =20
->  Hash Join  (cost=3D25586.00..31713.18 rows=3D41150=20
width=3D20)
         &nbs=
p;            =
;            =
   =20
Hash Cond: ("outer".id_zufriden =3D=20
"inner".id)
         &nbs=
p;            =
;            =
   =20
->  Hash Join  (cost=3D25584.85..31094.78 rows=3D41150=20
width=3D24)
         &nbs=
p;            =
;            =
         =20
Hash Cond: ("outer".id_plannung =3D=20
"inner".id)
         &nbs=
p;            =
;            =
         =20
->  Hash Join  (cost=3D24135.60..27869.53 rows=3D41149=20
width=3D24)
         &nbs=
p;            =
;            =
            &=
nbsp;  =20
Hash Cond: ("outer".id =3D=20
"inner".id_order)
        &nbs=
p;            =
;            =
            &=
nbsp;   =20
->  Seq Scan on orders o  (cost=3D0.00..2058.54 =
rows=3D42527=20
width=3D20)
         &nbs=
p;            =
;            =
            &=
nbsp;        =20
Filter: (id_status >=20
3)
           &=
nbsp;           &n=
bsp;           &nb=
sp;           &nbs=
p; =20
->  Hash  (cost=3D23860.48..23860.48 rows=3D42848=20
width=3D8)
          =
;            =
            &=
nbsp;           &n=
bsp;        =20
->  Seq Scan on report r  (cost=3D0.00..23860.48 =
rows=3D42848=20
width=3D8)
          =
;            =
            &=
nbsp;        =20
->  Hash  (cost=3D1050.80..1050.80 rows=3D62180=20
width=3D8)
          =
;            =
            &=
nbsp;           &n=
bsp;  =20
->  Seq Scan on plannung v  (cost=3D0.00..1050.80 =
rows=3D62180=20
width=3D8)
          =
;            =
            &=
nbsp;  =20
->  Hash  (cost=3D1.12..1.12 rows=3D12=20
width=3D4)
          =
;            =
            &=
nbsp;        =20
->  Seq Scan on zufriden z  (cost=3D0.00..1.12 rows=3D12=20
width=3D4)
          =
;            =
         =20
->  Hash  (cost=3D1.06..1.06 rows=3D6=20
width=3D4)
          =
;            =
            &=
nbsp;  =20
->  Seq Scan on status s  (cost=3D0.00..1.06 rows=3D6=20
width=3D4)
          =
;            =
   =20
->  Hash  (cost=3D161.57..161.57 rows=3D4357=20
width=3D4)
          =
;            =
         =20
->  Seq Scan on contact c  (cost=3D0.00..161.57 rows=3D4357 =

width=3D4)
          =
;         =20
->  Hash  (cost=3D1245.99..1245.99 rows=3D44299=20
width=3D4)
          =
;            =
   =20
->  Seq Scan on endkunde e  (cost=3D0.00..1245.99 =
rows=3D44299=20
width=3D4)
          =
;   =20
->  Hash  (cost=3D1.41..1.41 rows=3D41=20
width=3D8)
          =
;         =20
->  Seq Scan on mpsworker w  (cost=3D0.00..1.41 rows=3D41=20
width=3D8)
        ->  =
Hash =20
(cost=3D7.66..7.66 rows=3D166=20
width=3D4)
          =
;   =20
->  Seq Scan on person p  (cost=3D0.00..7.66 rows=3D166=20
width=3D4)


------=_NextPart_000_0160_01C4FCC6.7CA02250--
From:Tom Lane
Subject:Re: Optimizing this count query
Date:Mon, 17 Jan 2005 12:55:49 -0500
"Andrei Bintintan" writes:
> SELECT count(o.id) FROM orders o
> INNER JOIN report r ON o.id=r.id_order
> INNER JOIN status s ON o.id_status=s.id
> INNER JOIN contact c ON o.id_ag=c.id
> INNER JOIN endkunde e ON o.id_endkunde=e.id
> INNER JOIN zufriden z ON r.id_zufriden=z.id
> INNER JOIN plannung v ON v.id=o.id_plannung
> INNER JOIN mpsworker w ON v.id_worker=w.id
> INNER JOIN person p ON p.id = w.id_person
> WHERE o.id_status>3

> I'm asking how can I improve the execution time of this query, because =
> these tables are always increasing. And this count sometimes takes more =
> than 10 secs and I need to run this count very offen.

Unless you've increased the default value of join_collapse_limit, this
construction will be forcing the join order; see
http://www.postgresql.org/docs/7.4/static/explicit-joins.html

I'm not sure if you can improve the join order at all --- since you only
showed EXPLAIN and not EXPLAIN ANALYZE, it's hard to be sure whether any
of the steps are producing large intermediate results. But it's
something to look into.

You should also ask yourself if you need to be joining so many tables at
all. The planner seems to think that only the o/r join is really going
to affect the result row count. I can't tell if it's right or not, but
if this is a star schema and the other seven tables are just detail
tables, you don't need them in order to obtain a count.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
From:Andrei Bintintan
Subject:Re: Optimizing this count query
Date:Tue, 18 Jan 2005 09:48:47 +0200
I have to do all the joins because in the where cause I can also have other
conditions that are related to the other tables.
For example:
.....WHERE o.id_status>3 AND o.id_ag=72 AND v.id_worker=5 AND z.id=10.

Now if these search functions are IN then the query runs faster.

One thing I could do at this point is not to make the join if that table is
not needed in the where clause.

This is the explain analize for the first query.
Aggregate (cost=37182.56..37182.56 rows=1 width=4) (actual
time=3032.126..3032.126 rows=1 loops=1)
-> Hash Join (cost=27279.22..37079.68 rows=41154 width=4) (actual
time=662.600..2999.845 rows=42835 loops=1)
Hash Cond: ("outer".id_endkunde = "inner".id)
-> Hash Join (cost=25770.48..34068.10 rows=41153 width=8) (actual
time=561.112..2444.574 rows=42835 loops=1)
Hash Cond: ("outer".id_worker = "inner".id)
-> Hash Join (cost=25759.54..33326.98 rows=41151 width=12)
(actual time=560.514..2361.776 rows=42835 loops=1)
Hash Cond: ("outer".id_ag = "inner".id)
-> Hash Join (cost=25587.07..32331.51 rows=41150
width=16) (actual time=551.505..2240.217 rows=42835 loops=1)
Hash Cond: ("outer".id_status = "inner".id)
-> Hash Join (cost=25586.00..31713.18 rows=41150
width=20) (actual time=551.418..2150.224 rows=42835 loops=1)
Hash Cond: ("outer".id_zufriden =
"inner".id)
-> Hash Join (cost=25584.85..31094.78
rows=41150 width=24) (actual time=551.341..2057.142 rows=42835 loops=1)
Hash Cond: ("outer".id_plannung =
"inner".id)
-> Hash Join
(cost=24135.60..27869.53 rows=41149 width=24) (actual time=415.189..1162.429
rows=42835 loops=1)
Hash Cond: ("outer".id =
"inner".id_order)
-> Seq Scan on orders o
(cost=0.00..2058.54 rows=42527 width=20) (actual time=0.046..93.692
rows=42835 loops=1)
Filter: (id_status > 3)
-> Hash
(cost=23860.48..23860.48 rows=42848 width=8) (actual time=414.923..414.923
rows=0 loops=1)
-> Seq Scan on report r
(cost=0.00..23860.48 rows=42848 width=8) (actual time=282.905..371.401
rows=42848 loops=1)
-> Hash (cost=1050.80..1050.80
rows=62180 width=8) (actual time=133.505..133.505 rows=0 loops=1)
-> Seq Scan on plannung v
(cost=0.00..1050.80 rows=62180 width=8) (actual time=0.034..73.048
rows=62180 loops=1)
-> Hash (cost=1.12..1.12 rows=12 width=4)
(actual time=0.048..0.048 rows=0 loops=1)
-> Seq Scan on zufriden z
(cost=0.00..1.12 rows=12 width=4) (actual time=0.027..0.040 rows=12 loops=1)
-> Hash (cost=1.06..1.06 rows=6 width=4) (actual
time=0.045..0.045 rows=0 loops=1)
-> Seq Scan on status s (cost=0.00..1.06
rows=6 width=4) (actual time=0.032..0.037 rows=6 loops=1)
-> Hash (cost=161.57..161.57 rows=4357 width=4)
(actual time=8.973..8.973 rows=0 loops=1)
-> Seq Scan on contact c (cost=0.00..161.57
rows=4357 width=4) (actual time=0.032..5.902 rows=4357 loops=1)
-> Hash (cost=10.84..10.84 rows=42 width=4) (actual
time=0.557..0.557 rows=0 loops=1)
-> Hash Join (cost=1.51..10.84 rows=42 width=4)
(actual time=0.182..0.523 rows=41 loops=1)
Hash Cond: ("outer".id = "inner".id_person)
-> Seq Scan on person p (cost=0.00..7.66
rows=166 width=4) (actual time=0.027..0.216 rows=166 loops=1)
-> Hash (cost=1.41..1.41 rows=41 width=8)
(actual time=0.125..0.125 rows=0 loops=1)
-> Seq Scan on mpsworker w
(cost=0.00..1.41 rows=41 width=8) (actual time=0.038..0.086 rows=41 loops=1)
-> Hash (cost=1245.99..1245.99 rows=44299 width=4) (actual
time=101.257..101.257 rows=0 loops=1)
-> Seq Scan on endkunde e (cost=0.00..1245.99 rows=44299
width=4) (actual time=0.050..59.641 rows=44301 loops=1)
Total runtime: 3033.230 ms

Thanks for help.
Andy.

----- Original Message -----
From: "Tom Lane"
To: "Andrei Bintintan"
Cc:
Sent: Monday, January 17, 2005 7:55 PM
Subject: Re: [PERFORM] Optimizing this count query


> "Andrei Bintintan" writes:
>> SELECT count(o.id) FROM orders o
>> INNER JOIN report r ON o.id=r.id_order
>> INNER JOIN status s ON o.id_status=s.id
>> INNER JOIN contact c ON o.id_ag=c.id
>> INNER JOIN endkunde e ON o.id_endkunde=e.id
>> INNER JOIN zufriden z ON r.id_zufriden=z.id
>> INNER JOIN plannung v ON v.id=o.id_plannung
>> INNER JOIN mpsworker w ON v.id_worker=w.id
>> INNER JOIN person p ON p.id = w.id_person
>> WHERE o.id_status>3
>
>> I'm asking how can I improve the execution time of this query, because =
>> these tables are always increasing. And this count sometimes takes more =
>> than 10 secs and I need to run this count very offen.
>
> Unless you've increased the default value of join_collapse_limit, this
> construction will be forcing the join order; see
> http://www.postgresql.org/docs/7.4/static/explicit-joins.html
>
> I'm not sure if you can improve the join order at all --- since you only
> showed EXPLAIN and not EXPLAIN ANALYZE, it's hard to be sure whether any
> of the steps are producing large intermediate results. But it's
> something to look into.
>
> You should also ask yourself if you need to be joining so many tables at
> all. The planner seems to think that only the o/r join is really going
> to affect the result row count. I can't tell if it's right or not, but
> if this is a star schema and the other seven tables are just detail
> tables, you don't need them in order to obtain a count.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>


---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster
   

Copyright © 2006 knowledge-database   -   All rights reserved