knowledge-database (beta)

Current group: pgsql.performance

Re: PostgreSQL clustering VS MySQL clustering

Re: PostgreSQL clustering VS MySQL clustering  
Simon Riggs
 Re: PostgreSQL clustering VS MySQL clustering  
Tom Lane
 Re: PostgreSQL clustering VS MySQL clustering  
Jim C. Nasby
 Re: PostgreSQL clustering VS MySQL clustering  
Guy Thornley
From:Simon Riggs
Subject:Re: PostgreSQL clustering VS MySQL clustering
Date:Sun, 23 Jan 2005 20:15:52 +0000
On Sat, 2005-01-22 at 16:10 -0500, Tom Lane wrote:
> Tatsuo Ishii writes:
> > Probably VACUUM works well for small to medium size tables, but not
> > for huge ones. I'm considering about to implement "on the spot
> > salvaging dead tuples".
>
> That's impossible on its face, except for the special case where the
> same transaction inserts and deletes a tuple. In all other cases, the
> transaction deleting a tuple cannot know whether it will commit.

Perhaps Tatsuo has an idea...

As Tom says, if you have only a single row version and then you update
that row to create a second version, then we must not remove the first
version, since it is effectively the Undo copy.

However, if there were already 2+ row versions, then as Tatsuo suggests,
it might be possible to use on the spot salvaging of dead tuples. It
might be worth checking the Xid of the earlier row version(s), to see if
they are now expired and could be removed immediately.

However, if you had a high number of concurrent updaters, this extra
effort would not be that useful, since the other row versions might
still be transaction-in-progress versions. That would mean implementing
this idea would be useful often, but not in the case of repeatedly
updated rows.

Changing the idea slightly might be better: if a row update would cause
a block split, then if there is more than one row version then we vacuum
the whole block first, then re-attempt the update. That way we wouldn't
do the row every time, just when it becomes a problem.

I'm suggesting putting a call to vacuum_page() into heap_update(),
immediately before any call to RelationGetBufferForTuple().

We already know that page splitting is an expensive operation, so doing
some work to try to avoid that could frequently pay off. This would be
isolated to updating.

This wouldn't remove the need for vacuuming, but it would act to prevent
severe performance degradation caused by frequent re-updating.

What do you think?

--
Best Regards, Simon Riggs


---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster
From:Tom Lane
Subject:Re: PostgreSQL clustering VS MySQL clustering
Date:Sun, 23 Jan 2005 15:40:03 -0500
Simon Riggs writes:
> Changing the idea slightly might be better: if a row update would cause
> a block split, then if there is more than one row version then we vacuum
> the whole block first, then re-attempt the update.

"Block split"? I think you are confusing tables with indexes.

Chasing down prior versions of the same row is not very practical
anyway, since there is no direct way to find them.

One possibility is, if you tried to insert a row on a given page but
there's not room, to look through the other rows on the same page to see
if any are deletable (xmax below the GlobalXmin event horizon). This
strikes me as a fairly expensive operation though, especially when you
take into account the need to get rid of their index entries first.
Moreover, the check would often be unproductive.

The real issue with any such scheme is that you are putting maintenance
costs into the critical paths of foreground processes that are executing
user queries. I think that one of the primary advantages of the
Postgres storage design is that we keep that work outside the critical
path and delegate it to maintenance processes that can run in the
background. We shouldn't lightly toss away that advantage.

There was some discussion in Toronto this week about storing bitmaps
that would tell VACUUM whether or not there was any need to visit
individual pages of each table. Getting rid of useless scans through
not-recently-changed areas of large tables would make for a significant
reduction in the cost of VACUUM.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly
From:Jim C. Nasby
Subject:Re: PostgreSQL clustering VS MySQL clustering
Date:Sun, 23 Jan 2005 16:21:34 -0600
On Sun, Jan 23, 2005 at 03:40:03PM -0500, Tom Lane wrote:
> There was some discussion in Toronto this week about storing bitmaps
> that would tell VACUUM whether or not there was any need to visit
> individual pages of each table. Getting rid of useless scans through
> not-recently-changed areas of large tables would make for a significant
> reduction in the cost of VACUUM.
FWIW, that's already on the TODO. See also
http://lnk.nu/archives.postgresql.org/142.php.
--
Jim C. Nasby, Database Consultant decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend
From:Guy Thornley
Subject:Re: PostgreSQL clustering VS MySQL clustering
Date:Mon, 24 Jan 2005 13:21:50 +1300
> The real issue with any such scheme is that you are putting maintenance
> costs into the critical paths of foreground processes that are executing
> user queries. I think that one of the primary advantages of the
> Postgres storage design is that we keep that work outside the critical
> path and delegate it to maintenance processes that can run in the
> background. We shouldn't lightly toss away that advantage.

As a rather naive user, I'd consider modifying the FSM so that it has pages
with 'possibly freeable' space on them, as well as those with free space.

This way when the pages of actually free space is depleted, the list of
'possibly freeable' pages could be vacuumed (as a batch for that relation)
then placed on the actually-free list like vacuum currently does

Since there is concern about critical path performance, there could be an
extra backend process that would wake up perodically (or on a signal) and
vacuum the pages, so theyre not processed inline with some transaction. Then
grabbing a page with free space is the same as it is currently.

Actually I was hoping to find some time to investigate this myself, but my
employer is keeping me busy with other tasks ;/. Our particular data
management problems could be mitigated much better with a data partitioning
approach, anyway.

On another note, is anybody investigating backing up the FSM with disk files
so when the FSM size exceeds memory allocated, the appropriate data is
swapped to disk? At least since 7.4 you no longer need a VACUUM when
postgres starts, to learn about free space ;)

- Guy Thornley

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq
   

Copyright © 2006 knowledge-database   -   All rights reserved