knowledge-database (beta)

Current group: pgsql.performance

Index on a function and SELECT DISTINCT

Index on a function and SELECT DISTINCT  
Adrian Holovaty
 Re: Index on a function and SELECT DISTINCT  
Frank Wiles
 Re: Index on a function and SELECT DISTINCT  
Adrian Holovaty
 Re: Index on a function and SELECT DISTINCT  
PFC
From:Adrian Holovaty
Subject:Index on a function and SELECT DISTINCT
Date:Fri, 14 Jan 2005 12:32:12 -0600
If I have this table, function and index in Postgres 7.3.6 ...

"""
CREATE TABLE news_stories (
id serial primary key NOT NULL,
pub_date timestamp with time zone NOT NULL,
...
)
CREATE OR REPLACE FUNCTION get_year_trunc(timestamp with time zone) returns
timestamp with time zone AS 'SELECT date_trunc(\'year\',$1);' LANGUAGE 'SQL'
IMMUTABLE;
CREATE INDEX news_stories_pub_date_year_trunc ON
news_stories( get_year_trunc(pub_date) );
"""

....why does this query not use the index?

db=# EXPLAIN SELECT DISTINCT get_year_trunc(pub_date) FROM news_stories;
QUERY PLAN
---------------------------------------------------------------------------------
Unique (cost=59597.31..61311.13 rows=3768 width=8)
-> Sort (cost=59597.31..60454.22 rows=342764 width=8)
Sort Key: date_trunc('year'::text, pub_date)
-> Seq Scan on news_stories (cost=0.00..23390.55 rows=342764
width=8)
(4 rows)

The query is noticably slow (2 seconds) on a database with 150,000+ records.
How can I speed it up?

Thanks,
Adrian

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly
From:Frank Wiles
Subject:Re: Index on a function and SELECT DISTINCT
Date:Mon, 17 Jan 2005 10:09:42 -0600
On Fri, 14 Jan 2005 12:32:12 -0600
Adrian Holovaty wrote:

> If I have this table, function and index in Postgres 7.3.6 ...
>
> """
> CREATE TABLE news_stories (
> id serial primary key NOT NULL,
> pub_date timestamp with time zone NOT NULL,
> ...
> )
> CREATE OR REPLACE FUNCTION get_year_trunc(timestamp with time zone)
> returns timestamp with time zone AS 'SELECT date_trunc(\'year\',$1);'
> LANGUAGE 'SQL' IMMUTABLE;
> CREATE INDEX news_stories_pub_date_year_trunc ON
> news_stories( get_year_trunc(pub_date) );
> """
>
> ...why does this query not use the index?
>
> db=# EXPLAIN SELECT DISTINCT get_year_trunc(pub_date) FROM
> news_stories;
> QUERY PLAN
> ---------------------------------------------------------------------
> ------------
> Unique (cost=59597.31..61311.13 rows=3768 width=8)
> -> Sort (cost=59597.31..60454.22 rows=342764 width=8)
> Sort Key: date_trunc('year'::text, pub_date)
> -> Seq Scan on news_stories (cost=0.00..23390.55
> rows=342764
> width=8)
> (4 rows)
>
> The query is noticably slow (2 seconds) on a database with 150,000+
> records. How can I speed it up?

It's doing a sequence scan because you're not limiting the query in
the FROM clause. No point in using an index when you're asking for
the entire table. :)

---------------------------------
Frank Wiles
http://www.wiles.org
---------------------------------


---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html
From:Adrian Holovaty
Subject:Re: Index on a function and SELECT DISTINCT
Date:Mon, 17 Jan 2005 11:59:24 -0600
Frank Wiles wrote:
> Adrian Holovaty wrote:
> > If I have this table, function and index in Postgres 7.3.6 ...
> >
> > """
> > CREATE TABLE news_stories (
> > id serial primary key NOT NULL,
> > pub_date timestamp with time zone NOT NULL,
> > ...
> > )
> > CREATE OR REPLACE FUNCTION get_year_trunc(timestamp with time zone)
> > returns timestamp with time zone AS 'SELECT date_trunc(\'year\',$1);'
> > LANGUAGE 'SQL' IMMUTABLE;
> > CREATE INDEX news_stories_pub_date_year_trunc ON
> > news_stories( get_year_trunc(pub_date) );
> > """
> >
> > ...why does this query not use the index?
> >
> > db=# EXPLAIN SELECT DISTINCT get_year_trunc(pub_date) FROM
> > news_stories;
> > QUERY PLAN
> > ---------------------------------------------------------------------
> > ------------
> > Unique (cost=59597.31..61311.13 rows=3768 width=8)
> > -> Sort (cost=59597.31..60454.22 rows=342764 width=8)
> > Sort Key: date_trunc('year'::text, pub_date)
> > -> Seq Scan on news_stories (cost=0.00..23390.55
> > rows=342764
> > width=8)
> > (4 rows)
> >
> > The query is noticably slow (2 seconds) on a database with 150,000+
> > records. How can I speed it up?
>
> It's doing a sequence scan because you're not limiting the query in
> the FROM clause. No point in using an index when you're asking for
> the entire table. :)

Ah, that makes sense. So is there a way to optimize SELECT DISTINCT queries
that have no WHERE clause?

Adrian

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html
From:PFC
Subject:Re: Index on a function and SELECT DISTINCT
Date:Mon, 17 Jan 2005 19:17:57 +0100


Try :

EXPLAIN SELECT get_year_trunc(pub_date) as foo FROM ... GROUP BY foo

Apart from that, you could use a materialized view...

>> > db=# EXPLAIN SELECT DISTINCT get_year_trunc(pub_date) FROM

> Ah, that makes sense. So is there a way to optimize SELECT DISTINCT
> queries
> that have no WHERE clause?
>
> Adrian
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faqs/FAQ.html
>



---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend
   

Copyright © 2006 knowledge-database   -   All rights reserved