|
|
 | | From: | Simon Riggs | | Subject: | Re: PostgreSQL clustering VS MySQL clustering | | Date: | Sun, 23 Jan 2005 20:15:52 +0000 |
|
|
 | On Sat, 2005-01-22 at 16:10 -0500, Tom Lane wrote: > Tatsuo Ishii writes: > > Probably VACUUM works well for small to medium size tables, but not > > for huge ones. I'm considering about to implement "on the spot > > salvaging dead tuples". > > That's impossible on its face, except for the special case where the > same transaction inserts and deletes a tuple. In all other cases, the > transaction deleting a tuple cannot know whether it will commit.
Perhaps Tatsuo has an idea...
As Tom says, if you have only a single row version and then you update that row to create a second version, then we must not remove the first version, since it is effectively the Undo copy.
However, if there were already 2+ row versions, then as Tatsuo suggests, it might be possible to use on the spot salvaging of dead tuples. It might be worth checking the Xid of the earlier row version(s), to see if they are now expired and could be removed immediately.
However, if you had a high number of concurrent updaters, this extra effort would not be that useful, since the other row versions might still be transaction-in-progress versions. That would mean implementing this idea would be useful often, but not in the case of repeatedly updated rows.
Changing the idea slightly might be better: if a row update would cause a block split, then if there is more than one row version then we vacuum the whole block first, then re-attempt the update. That way we wouldn't do the row every time, just when it becomes a problem.
I'm suggesting putting a call to vacuum_page() into heap_update(), immediately before any call to RelationGetBufferForTuple().
We already know that page splitting is an expensive operation, so doing some work to try to avoid that could frequently pay off. This would be isolated to updating.
This wouldn't remove the need for vacuuming, but it would act to prevent severe performance degradation caused by frequent re-updating.
What do you think?
-- Best Regards, Simon Riggs
---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster
|
|
 | | From: | Tom Lane | | Subject: | Re: PostgreSQL clustering VS MySQL clustering | | Date: | Sun, 23 Jan 2005 15:40:03 -0500 |
|
|
 | Simon Riggs writes: > Changing the idea slightly might be better: if a row update would cause > a block split, then if there is more than one row version then we vacuum > the whole block first, then re-attempt the update.
"Block split"? I think you are confusing tables with indexes.
Chasing down prior versions of the same row is not very practical anyway, since there is no direct way to find them.
One possibility is, if you tried to insert a row on a given page but there's not room, to look through the other rows on the same page to see if any are deletable (xmax below the GlobalXmin event horizon). This strikes me as a fairly expensive operation though, especially when you take into account the need to get rid of their index entries first. Moreover, the check would often be unproductive.
The real issue with any such scheme is that you are putting maintenance costs into the critical paths of foreground processes that are executing user queries. I think that one of the primary advantages of the Postgres storage design is that we keep that work outside the critical path and delegate it to maintenance processes that can run in the background. We shouldn't lightly toss away that advantage.
There was some discussion in Toronto this week about storing bitmaps that would tell VACUUM whether or not there was any need to visit individual pages of each table. Getting rid of useless scans through not-recently-changed areas of large tables would make for a significant reduction in the cost of VACUUM.
regards, tom lane
---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly
|
|
 | | From: | Jim C. Nasby | | Subject: | Re: PostgreSQL clustering VS MySQL clustering | | Date: | Sun, 23 Jan 2005 16:21:34 -0600 |
|
|
 | On Sun, Jan 23, 2005 at 03:40:03PM -0500, Tom Lane wrote: > There was some discussion in Toronto this week about storing bitmaps > that would tell VACUUM whether or not there was any need to visit > individual pages of each table. Getting rid of useless scans through > not-recently-changed areas of large tables would make for a significant > reduction in the cost of VACUUM. FWIW, that's already on the TODO. See also http://lnk.nu/archives.postgresql.org/142.php. -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828
Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend
|
|
 | | From: | Guy Thornley | | Subject: | Re: PostgreSQL clustering VS MySQL clustering | | Date: | Mon, 24 Jan 2005 13:21:50 +1300 |
|
|
 | > The real issue with any such scheme is that you are putting maintenance > costs into the critical paths of foreground processes that are executing > user queries. I think that one of the primary advantages of the > Postgres storage design is that we keep that work outside the critical > path and delegate it to maintenance processes that can run in the > background. We shouldn't lightly toss away that advantage.
As a rather naive user, I'd consider modifying the FSM so that it has pages with 'possibly freeable' space on them, as well as those with free space.
This way when the pages of actually free space is depleted, the list of 'possibly freeable' pages could be vacuumed (as a batch for that relation) then placed on the actually-free list like vacuum currently does
Since there is concern about critical path performance, there could be an extra backend process that would wake up perodically (or on a signal) and vacuum the pages, so theyre not processed inline with some transaction. Then grabbing a page with free space is the same as it is currently.
Actually I was hoping to find some time to investigate this myself, but my employer is keeping me busy with other tasks ;/. Our particular data management problems could be mitigated much better with a data partitioning approach, anyway.
On another note, is anybody investigating backing up the FSM with disk files so when the FSM size exceeds memory allocated, the appropriate data is swapped to disk? At least since 7.4 you no longer need a VACUUM when postgres starts, to learn about free space ;)
- Guy Thornley
---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faq
|
|
|