 | --0-625005588-1105713570=:13342 Content-Type: text/plain; charset=us-ascii
Hi All, I have the following query to generate a report grouped by "states". SELECT distinct upper(cd.state) as mystate, SUM(d.amount) as total_amount, SUM(COALESCE(d.fee,0) + COALESCE(mp.seller_fee, 0) + COALESCE(mp.buyer_fee,0)) as total_fee FROM data d left JOIN customerdata cd ON d.uid = cd.uid LEFT JOIN merchant_purchase mp ON d.id = mp.data_id WHERE d.what IN (26,0, 15) AND d.flags IN (1,9,10,12 ) AND lower(cd.country) = 'us' AND date_part('year',d.time)= 2004 GROUP BY myst ate ORDER BY mystate;
mystate | total_amount | total_fee ---------+--------------+----------- | 3695 | 0 AR | 3000 | 0 AZ | 1399 | 0 CA | 113100 | 6242 FL | 121191 | 9796 GA | 34826876 | 478888 GEORGIA | 57990 | 3500 IEIE | 114000 | 4849 MD | 20000 | 1158 MI | 906447 | 0 NY | 8000 | 600 PA | 6200 | 375 SC | 25000 | 600 TN | 1443681 | 1124 | 13300 | 0 (15 rows)
If you notice, my problem in this query is that the records for GA, GEORGIA appear separately. But what I want to do is to have them combined to a single entry with their values summed up . Initially we had accepted both formats as input for the state field. Also, there are some invalid entries for the state field (like the "IEIE" and null values), which appear because the input for state was not validated initially. These entries have to be eliminated from the report.This query did not take a long time to complete, but did not meet the needs for the report. So, the query was rewritten to the following query which takes nearly 7-8 mins to complete on our test database: SELECT (SELECT DISTINCT pc.state FROM postalcode pc WHERE UPPER(cd.state) IN (pc.state, pc.state_code)) as mystate, SUM(d.amount) as total_amount, SUM(COALESCE(d.fee,0) + COALESCE(mp.seller_fee, 0) + COALESCE(mp.buyer_fee,0)) as total_fee FROM data d JOIN customerdata cd ON d.uid = cd.uid LEFT JOIN merchant_purchase mp ON d.id = mp.data_id WHERE d.what IN (26,0, 15) AND d.flags IN (1,9,10,12 ) AND lower(cd.country) = 'us' AND date_part('year', d.time) = 2004 GROUP BY mystate ORDER BY mystate; mystate | total_amount | total_fee ----------------+--------------+----------- ARIZONA | 1399 | 0 ARKANSAS | 3000 | 0 CALIFORNIA | 113100 | 6242 FLORIDA | 121191 | 9796 GEORGIA | 34884866 | 482388 MARYLAND | 20000 | 1158 MICHIGAN | 906447 | 0 NEW YORK | 8000 | 600 PENNSYLVANIA | 6200 | 375 SOUTH CAROLINA | 25000 | 600 TENNESSEE | 1443681 | 1124 | 130995 | 4849
Here is the explain analyze of this query: QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=1226.57..1226.58 rows=1 width=38) (actual time=362355.58..362372.09 rows=12 loops=1) -> Group (cost=1226.57..1226.57 rows=1 width=38) (actual time=362355.54..362367.73 rows=2197 loops=1) -> Sort (cost=1226.57..1226.57 rows=1 width=38) (actual time=362355.53..362356.96 rows=2197 loops=1) Sort Key: (subplan) -> Nested Loop (cost=0.00..1226.56 rows=1 width=38) (actual time=166.11..362321.46 rows=2197 loops=1) -> Nested Loop (cost=0.00..1220.53 rows=1 width=26) (actual time=1.68..361.32 rows=2115 loops=1) -> Seq Scan on customerdata cd (cost=0.00..274.32 rows=31 width=10) (actual time=0.04..29.87 rows=3303 loops=1) Filter: (lower((country)::text) = 'us'::text) -> Index Scan using data_uid_idx on data d (cost=0.00..30.08 rows=1 width=16) (actual time=0.04..0.09 rows=1 loops=3303) Index Cond: (d.uid = "outer".uid) Filter: (((what = 26) OR (what = 0) OR (what = 15)) AND ((flags = 1) OR (flags = 9) OR (flags = 10) OR (flags = 12)) AND (date_part('year'::text, "time") = 2004::double precision)) -> Index Scan using merchant_purchase_data_idx on merchant_purchase mp (cost=0.00..6.01 rows=1 width=12) (actual time=0.05..0.05 rows=1 loops=2115) Index Cond: ("outer".id = mp.data_id) SubPlan -> Unique (cost=2237.12..2243.22 rows=122 width=13) (actual time=161.25..164.68 rows=1 loops=2197) -> Sort (cost=2237.12..2240.17 rows=1220 width=13) (actual time=161.21..161.88 rows=1033 loops=2197) Sort Key: state -> Seq Scan on postalcode pc (cost=0.00..2174.56 rows=1220 width=13) (actual time=35.79..148.33 rows=1033 loops=2197) Filter: ((upper(($0)::text) = (state)::text) OR (upper(($0)::text) = (state_code)::text)) Total runtime: 362372.57 msec The postalcode table is used in the query to validate the states and to combine the entries like GA and GEORGIA.
\d postalcode Table "public.postalcode" Column | Type | Modifiers ------------+-----------------------+------------------------------------------------------------ id | integer | not null default nextval('public.postalcode_id_seq'::text) country | character(2) | state | character varying(30) | zipcode | character varying(20) | city | character varying(50) | city_alias | character varying(20) | state_code | character varying(2) | Indexes: postalcode_country_key unique btree (country, state_code, zipcode), postalcode_state_code_idx btree (state_code), postalcode_state_idx btree (state) The postalcode table has 70328 rows! Can some one please help me optimize this query? Thanks, Saranya
--------------------------------- Do you Yahoo!? Yahoo! Mail - You care about security. So do we. --0-625005588-1105713570=:13342 Content-Type: text/html; charset=us-ascii
Hi All,
I have the following query to generate a report grouped by "states".
SELECT distinct upper(cd.state) as mystate, SUM(d.amount) as total_amount, SUM(COALESCE(d.fee,0) + COALESCE(mp.seller_fee, 0) + COALESCE(mp.buyer_fee,0)) as total_fee FROM data d left JOIN customerdata cd ON d.uid = cd.uid LEFT JOIN merchant_purchase mp ON d.id = mp.data_id WHERE d.what IN (26,0, 15) AND d.flags IN (1,9,10,12 ) AND lower(cd.country) = 'us' AND date_part('year',d.time)= 2004 GROUP BY myst ate ORDER BY mystate;
mystate | total_amount | total_fee ---------+--------------+----------- | 3695 | 0 AR | 3000 | 0 AZ | 1399 | 0 CA | 113100 | 6242 FL | 121191 | 9796 GA | 34826876 | 478888 GEORGIA | 57990 | &nbs p; 3500 IEIE | 114000 | 4849 MD | 20000 | 1158 MI | 906447 | 0 NY | 8000 | 600 PA | 6200 | 375 SC | 25000 | 600 TN | 1443681 | 1124 | 13300 | 0 (15 rows)
If you notice, my problem in this query is that the records for GA, GEORGIA appear separately. But what I want to do is to have them combined to a single entry with their values summed up . Initially we had accepted both formats as input for the state field. Also, there are some invalid entries for the state field (like the "IEIE" and null values), which appear because the input for state was not validated initially. These entries have to be eliminated from the report.This query did not take a long time to complete, but did not meet the needs for the report.
So, the query was rewritten to the following query which takes nearly 7-8 mins to complete on our test database:
SELECT (SELECT DISTINCT pc.state FROM postalcode pc WHERE UPPER(cd.state) IN (pc.state, pc.state_code)) as mystate, SUM(d.amount) as total_amount, SUM(COALESCE(d.fee,0) + COALESCE(mp.seller_fee, 0) + COALESCE(mp.buyer_fee,0)) as total_fee FROM data d JOIN customerdata cd ON d.uid = cd.uid LEFT JOIN merchant_purchase mp ON d.id = mp.data_id WHERE d.what IN (26,0, 15) AND d.flags IN (1,9,10,12 ) AND lower(cd.country) = 'us' AND date_part('year', d.time) = 2004 GROUP BY mystate ORDER BY mystate; mystate | total_amount | total_fee ----------------+--------------+----------- ARIZONA | 1399 | 0 ARKANSAS | 3000 | 0 CALIFORNIA | 113100 | 6242 FLORIDA | 121191 | 9796 GEORGIA | 34884866 | 482388 MARYLAND | 20000 | 1158 MICHIGAN | 906447 | 0 NEW YORK | 8000 | 600 PENNSYLVANIA | 6200 | 375 SOUTH CAROLINA | 25000 | 600 TENNESSEE | 1443681 | 1124 | 130995 | 4849
Here is the explain analyze of this query:
QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=1226.57..1226.58 rows=1 width=38) (actual time=362355.58..362372.09 rows=12 loops=1) -> Group (cost=1226.57..1226.57 rows=1 width=38) (actual time=362355.54..362367.73 rows=2197 loops=1) -> Sort (cost=1226.57..1226.57 rows=1 width=38) (actual time=362355.53..362356.96 rows=2197 loops=1) Sort Key: (subplan) -> Nested Loop (cost=0.00..1226.56 rows=1 width=38) (actual time=166.11..362321.46 rows=2197 loops=1) -> Nested Loop (cost=0.00..1220.53 rows=1 width=26) (actual time=1.68..361.32 rows=2115 loops=1) -> Seq Scan on customerdata cd (cost=0.00..274.32 ro ws=31 width=10) (actual time=0.04..29.87 rows=3303 loops=1) Filter: (lower((country)::text) = 'us'::text) -> Index Scan using data_uid_idx on data d (cost=0.00..30.08 rows=1 width=16) (actual time=0.04..0.09 rows=1 loops=3303) Index Cond: (d.uid = "outer".uid) Filter: (((what = 26) OR (what = 0) OR (what = 15)) AND ((flags = 1) OR (flags = 9) OR (flags = 10) OR (flags = 12)) AND (date_part('year'::text, "time") = 2004::double precision)) -> Index Scan using merchant_purchase_data_idx on merchant_purchase mp (cost=0.00..6.01 rows=1 width=12) (actual time=0.05..0.05 rows=1 loops=2115) Index Cond: ("outer".id = mp.data_id) SubPlan -> Unique (cost=2237.12..2243.22 rows=122 width=13) (actual time=161.25..164.68 rows=1 loops=2197) -> Sort (cost=2237.12..2240.17 rows=1220 width=13) (actual time=161.21..161.88 rows=1033 loops=2197) Sort Key: state -> Seq Scan on postalcode pc (cost=0.00..2174.56 rows=1220 width=13) (actual time=35.79..148.33 rows=1033 loops=2197) Filter: ((upper(($0)::text) = (state)::text) OR (upper(($0)::text) = (state_code)::text)) Total runtime: 362372.57 msec
The postalcode table is used in the query to validate the states and to combine the entries like GA and GEORGIA.
\d postalcode Table "public.postalcode" Column | Type | Modifiers ------------+-----------------------+------------------------------------------------------------ id | integer | not null default nextval('public.postalcode_id_seq'::text) country | character(2) | state | character varying(30) | zipcode | character varying(20) | city | character varying(50) | city_alias | character varying(20) | state_code | character varying(2) | Indexes: postalcode_country_key unique btree (country, state_code, zipcode), postalcode_state_code_idx btree (state_code), postalcode_state_idx btree (state) The postalcode table has 70328 rows! Can some one please help me optimize this query? Thanks, Saranya
Do you Yahoo!? Yahoo! Mail - You care about security. So do we. --0-625005588-1105713570=:13342--
|