|
|
 | | From: | Hervé_Piedvache | | Subject: | PostgreSQL clustering VS MySQL clustering | | Date: | Thu, 20 Jan 2005 15:03:31 +0100 |
|
|
 | Dear community,
My company, which I actually represent, is a fervent user of PostgreSQL. We used to make all our applications using PostgreSQL for more than 5 years. We usually do classical client/server applications under Linux, and Web interface (php, perl, C/C++). We used to manage also public web services with 10/15 millions records and up to 8 millions pages view by month.
Now we are in front of a new need, but we do not find any good solution with PostgreSQL. We need to make a sort of directory of millions of data growing about 4/8 millions per month, and to be able to be used by many users from the web. In order to do this, our solution need to be able to run perfectly with many insert and many select access (done before each insert, and done by web site visitors). We will also need to make a search engine for the millions of data (140/150 millions records at the immediate beginning) ... No it's not google, but the kind of volume of data stored in the main table is similar.
Then ... we have made some tests, with the actual servers we have here, like a Bi-Pro Xeon 2.8 Ghz, with 4 Gb of RAM and the result of the cumulative inserts, and select access is slowing down the service really quickly ... (Load average is going up to 10 really quickly on the database).
We were at this moment thinking about a Cluster solution ... We saw on the Internet many solution talking about Cluster solution using MySQL ... but nothing about PostgreSQL ... the idea is to use several servers to make a sort of big virtual server using the disk space of each server as one, and having the ability to use the CPU and RAM of each servers in order to maintain good service performance ...one can imagin it is like a GFS but dedicated to postgreSQL...
Is there any solution with PostgreSQL matching these needs ... ? Do we have to backport our development to MySQL for this kind of problem ? Is there any other solution than a Cluster for our problem ?
Looking for your reply,
Regards, -- Hervé
---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend
|
|
 | | From: | Christopher Kings-Lynne | | Subject: | Re: PostgreSQL clustering VS MySQL clustering | | Date: | Thu, 20 Jan 2005 14:24:05 +0000 |
|
|
 | > Is there any solution with PostgreSQL matching these needs ... ?
You want: http://www.slony.info/
> Do we have to backport our development to MySQL for this kind of problem ? > Is there any other solution than a Cluster for our problem ?
Well, Slony does replication which is basically what you want :)
Only master->slave though, so you will need to have all inserts go via the master server, but selects can come off any server.
Chris
---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faqs/FAQ.html
|
|
 | | From: | Hervé_Piedvache | | Subject: | Re: PostgreSQL clustering VS MySQL clustering | | Date: | Thu, 20 Jan 2005 15:36:08 +0100 |
|
|
 | Le Jeudi 20 Janvier 2005 15:24, Christopher Kings-Lynne a écrit : > > Is there any solution with PostgreSQL matching these needs ... ? > > You want: http://www.slony.info/ > > > Do we have to backport our development to MySQL for this kind of problem > > ? Is there any other solution than a Cluster for our problem ? > > Well, Slony does replication which is basically what you want :) > > Only master->slave though, so you will need to have all inserts go via > the master server, but selects can come off any server.
Sorry but I don't agree with this ... Slony is a replication solution ... I don't need replication ... what will I do when my database will grow up to 50 Gb ... I'll need more than 50 Gb of RAM on each server ??? This solution is not very realistic for me ...
I need a Cluster solution not a replication one or explain me in details how I will do for managing the scalabilty of my database ...
regards, -- Hervé
---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend
|
|
 | | From: | William Yu | | Subject: | Re: PostgreSQL clustering VS MySQL clustering | | Date: | Thu, 20 Jan 2005 09:12:01 -0800 |
|
|
 | Hervé Piedvache wrote: > Sorry but I don't agree with this ... Slony is a replication solution ... I > don't need replication ... what will I do when my database will grow up to 50 > Gb ... I'll need more than 50 Gb of RAM on each server ??? > This solution is not very realistic for me ...
Have you confirmed you need a 1:1 RAM:data ratio? Of course more memory gets more speed but often at a diminishing rate of return. Unless every record of your 50GB is used in every query, only the most commonly used elements of your DB needs to be in RAM. This is the very idea of caching.
|
|
 | | From: | Christopher Kings-Lynne | | Subject: | Re: PostgreSQL clustering VS MySQL clustering | | Date: | Thu, 20 Jan 2005 14:38:34 +0000 |
|
|
 | > Sorry but I don't agree with this ... Slony is a replication solution ... I > don't need replication ... what will I do when my database will grow up to 50 > Gb ... I'll need more than 50 Gb of RAM on each server ??? > This solution is not very realistic for me ... > > I need a Cluster solution not a replication one or explain me in details how I > will do for managing the scalabilty of my database ...
Buy Oracle
---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives?
http://archives.postgresql.org
|
|
 | | From: | Hervé_Piedvache | | Subject: | Re: PostgreSQL clustering VS MySQL clustering | | Date: | Thu, 20 Jan 2005 15:42:06 +0100 |
|
|
 | Le Jeudi 20 Janvier 2005 15:38, Christopher Kings-Lynne a écrit : > > Sorry but I don't agree with this ... Slony is a replication solution ... > > I don't need replication ... what will I do when my database will grow up > > to 50 Gb ... I'll need more than 50 Gb of RAM on each server ??? > > This solution is not very realistic for me ... > > > > I need a Cluster solution not a replication one or explain me in details > > how I will do for managing the scalabilty of my database ... > > Buy Oracle
I think this is not my solution ... sorry I'm talking about finding a PostgreSQL solution ... -- Hervé Piedvache
Elma Ingénierie Informatique 6 rue du Faubourg Saint-Honoré F-75008 - Paris - France Pho. 33-144949901 Fax. 33-144949902
---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend
|
|
 | | From: | Christopher Kings-Lynne | | Subject: | Re: PostgreSQL clustering VS MySQL clustering | | Date: | Thu, 20 Jan 2005 14:51:21 +0000 |
|
|
 | >>>Sorry but I don't agree with this ... Slony is a replication solution ... >>>I don't need replication ... what will I do when my database will grow up >>>to 50 Gb ... I'll need more than 50 Gb of RAM on each server ??? >>>This solution is not very realistic for me ... >>> >>>I need a Cluster solution not a replication one or explain me in details >>>how I will do for managing the scalabilty of my database ... >> >>Buy Oracle > > > I think this is not my solution ... sorry I'm talking about finding a > PostgreSQL solution ...
My point being is that there is no free solution. There simply isn't. I don't know why you insist on keeping all your data in RAM, but the mysql cluster requires that ALL data MUST fit in RAM all the time.
PostgreSQL has replication, but not partitioning (which is what you want).
So, your only option is Oracle or another very expensive commercial database.
Chris
---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend
|
|
 | | From: | Stephen Frost | | Subject: | Re: PostgreSQL clustering VS MySQL clustering | | Date: | Thu, 20 Jan 2005 10:08:47 -0500 |
|
|
 | --F+K/njrK4+2xOJ0g Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable
* Christopher Kings-Lynne (chriskl@familyhealth.com.au) wrote: > PostgreSQL has replication, but not partitioning (which is what you want).
It doesn't have multi-server partitioning.. It's got partitioning within a single server (doesn't it? I thought it did, I know it was discussed w/ the guy from Cox Communications and I thought he was using it :).
> So, your only option is Oracle or another very expensive commercial=20 > database.
Or partition the data at the application layer.
Stephen
--F+K/njrK4+2xOJ0g Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature Content-Disposition: inline
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux)
iD8DBQFB78l/rzgMPqB3kigRAkNkAJ0WDBq6zNz9CAl/2AemXBBskqsX1QCglHZ0 v8spVpSyyvXq0l5yV+noJ/0= =CA1o -----END PGP SIGNATURE-----
--F+K/njrK4+2xOJ0g--
|
|
 | | From: | Jim C. Nasby | | Subject: | Re: PostgreSQL clustering VS MySQL clustering | | Date: | Thu, 20 Jan 2005 19:32:04 -0600 |
|
|
 | On Thu, Jan 20, 2005 at 10:08:47AM -0500, Stephen Frost wrote: > * Christopher Kings-Lynne (chriskl@familyhealth.com.au) wrote: > > PostgreSQL has replication, but not partitioning (which is what you want). > > It doesn't have multi-server partitioning.. It's got partitioning > within a single server (doesn't it? I thought it did, I know it was > discussed w/ the guy from Cox Communications and I thought he was using > it :).
No, PostgreSQL doesn't support any kind of partitioning, unless you write it yourself. I think there's some work being done in this area, though. -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828
Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
|
|
 | | From: | Hervé_Piedvache | | Subject: | Re: PostgreSQL clustering VS MySQL clustering | | Date: | Thu, 20 Jan 2005 16:02:39 +0100 |
|
|
 | Le Jeudi 20 Janvier 2005 15:51, Christopher Kings-Lynne a écrit : > >>>Sorry but I don't agree with this ... Slony is a replication solution > >>> ... I don't need replication ... what will I do when my database will > >>> grow up to 50 Gb ... I'll need more than 50 Gb of RAM on each server > >>> ??? This solution is not very realistic for me ... > >>> > >>>I need a Cluster solution not a replication one or explain me in details > >>>how I will do for managing the scalabilty of my database ... > >> > >>Buy Oracle > > > > I think this is not my solution ... sorry I'm talking about finding a > > PostgreSQL solution ... > > My point being is that there is no free solution. There simply isn't. > I don't know why you insist on keeping all your data in RAM, but the > mysql cluster requires that ALL data MUST fit in RAM all the time.
I don't insist about have data in RAM .... but when you use PostgreSQL with big database you know that for quick access just for reading the index file for example it's better to have many RAM as possible ... I just want to be able to get a quick access with a growing and growind database ...
> PostgreSQL has replication, but not partitioning (which is what you want).
:o(
> So, your only option is Oracle or another very expensive commercial > database.
That's not a good news ... -- Hervé Piedvache
Elma Ingénierie Informatique 6 rue du Faubourg Saint-Honoré F-75008 - Paris - France Pho. 33-144949901 Fax. 33-144949902
---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
|
|
 | | From: | Darcy Buskermolen | | Subject: | Re: PostgreSQL clustering VS MySQL clustering | | Date: | Thu, 20 Jan 2005 09:33:42 -0800 |
|
|
 | On January 20, 2005 06:51 am, Christopher Kings-Lynne wrote: > >>>Sorry but I don't agree with this ... Slony is a replication solution > >>> ... I don't need replication ... what will I do when my database will > >>> grow up to 50 Gb ... I'll need more than 50 Gb of RAM on each server > >>> ??? This solution is not very realistic for me ... > >>> > >>>I need a Cluster solution not a replication one or explain me in details > >>>how I will do for managing the scalabilty of my database ... > >> > >>Buy Oracle > > > > I think this is not my solution ... sorry I'm talking about finding a > > PostgreSQL solution ... > > My point being is that there is no free solution. There simply isn't. > I don't know why you insist on keeping all your data in RAM, but the > mysql cluster requires that ALL data MUST fit in RAM all the time. > > PostgreSQL has replication, but not partitioning (which is what you want). > > So, your only option is Oracle or another very expensive commercial > database.
Another Option to consider would be pgmemcache. that way you just build the farm out of lots of large memory, diskless boxes for keeping the whole database in memory in the whole cluster. More information on it can be found at: http://people.freebsd.org/~seanc/pgmemcache/
> > Chris > > ---------------------------(end of broadcast)--------------------------- > TIP 8: explain analyze is your friend
-- Darcy Buskermolen Wavefire Technologies Corp. ph: 250.717.0200 fx: 250.763.1759 http://www.wavefire.com
---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend
|
|
 | | From: | Christopher Browne | | Subject: | Re: PostgreSQL clustering VS MySQL clustering | | Date: | Sun, 23 Jan 2005 00:41:22 -0500 |
|
|
 | In an attempt to throw the authorities off his trail, herve@elma.fr (Hervé Piedvache) transmitted: > Le Jeudi 20 Janvier 2005 15:24, Christopher Kings-Lynne a écrit : >> > Is there any solution with PostgreSQL matching these needs ... ? >> >> You want: http://www.slony.info/ >> >> > Do we have to backport our development to MySQL for this kind of problem >> > ? Is there any other solution than a Cluster for our problem ? >> >> Well, Slony does replication which is basically what you want :) >> >> Only master->slave though, so you will need to have all inserts go via >> the master server, but selects can come off any server. > > Sorry but I don't agree with this ... Slony is a replication > solution ... I don't need replication ... what will I do when my > database will grow up to 50 Gb ... I'll need more than 50 Gb of RAM > on each server ??? This solution is not very realistic for me ...
Huh? Why on earth do you imagine that Slony-I requires a lot of memory?
It doesn't. A fairly _large_ Slony-I process is about 10MB. There will be some demand for memory on the DB servers, but you don't need an enormous quantity of extra memory to run it.
There is a MySQL "replicating/clustering" system that uses an in-memory database which means that if your DB is 50GB in size, you need something like 200GB of RAM. If you're thinking of that, that's not relevant to PostgreSQL or Slony-I...
> I need a Cluster solution not a replication one or explain me in > details how I will do for managing the scalabilty of my database ...
I'm not sure you understand clustering if you imagine it doesn't involve replication.
There are numerous models for clustering, much as there are numerous RAID models.
But the only sorts of clustering cases where you get to NOT do replication are the cases where all you're looking for from clustering is improved speed, and you're willing for any breakage on any host to potentially destroy your cluster.
Perhaps you need to describe what you _think_ you mean by a "cluster solution." It may be that it'll take further thought to determine what you actually need... -- output = ("cbbrowne" "@" "gmail.com") http://www3.sympatico.ca/cbbrowne/postgresql.html "Not me, guy. I read the Bash man page each day like a Jehovah's Witness reads the Bible. No wait, the Bash man page IS the bible. Excuse me..." (More on confusing aliases, taken from comp.os.linux.misc)
|
|
 | | From: | Jeff | | Subject: | Re: PostgreSQL clustering VS MySQL clustering | | Date: | Thu, 20 Jan 2005 09:48:07 -0500 |
|
|
 | On Jan 20, 2005, at 9:36 AM, Hervé Piedvache wrote:
> Sorry but I don't agree with this ... Slony is a replication solution > ... I > don't need replication ... what will I do when my database will grow > up to 50 > Gb ... I'll need more than 50 Gb of RAM on each server ???
Slony doesn't use much ram. The mysql clustering product, ndb I believe it is called, requires all data fit in RAM. (At least, it used to). What you'll need is disk space.
As for a cluster I think you are thinking of multi-master replication.
You should look into what others have said about trying to partiition data among several boxes and then join the results together.
Or you could fork over hundreds of thousands of dollars for Oracle's RAC.
-- Jeff Trout http://www.jefftrout.com/ http://www.stuarthamm.net/
---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend
|
|
 | | From: | Hervé_Piedvache | | Subject: | Re: PostgreSQL clustering VS MySQL clustering | | Date: | Thu, 20 Jan 2005 15:54:23 +0100 |
|
|
 | Le Jeudi 20 Janvier 2005 15:48, Jeff a écrit : > On Jan 20, 2005, at 9:36 AM, Hervé Piedvache wrote: > > Sorry but I don't agree with this ... Slony is a replication solution > > ... I > > don't need replication ... what will I do when my database will grow > > up to 50 > > Gb ... I'll need more than 50 Gb of RAM on each server ??? > > Slony doesn't use much ram. The mysql clustering product, ndb I believe > it is called, requires all data fit in RAM. (At least, it used to). > What you'll need is disk space.
Slony do not use RAM ... but PostgreSQL will need RAM for accessing a database of 50 Gb ... so having two servers with the same configuration replicated by slony do not slove the problem of the scalability of the database ...
> As for a cluster I think you are thinking of multi-master replication.
No I'm really thinking about a Cluster solution ... having several servers making one big virtual server to have several processors, and many RAM in many boxes ...
> You should look into what others have said about trying to partiition > data among several boxes and then join the results together.
??? Who talk about this ?
> Or you could fork over hundreds of thousands of dollars for Oracle's > RAC.
No please do not talk about this again ... I'm looking about a PostgreSQL solution ... I know RAC ... and I'm not able to pay for a RAC certify hardware configuration plus a RAC Licence.
Regards, -- Hervé Piedvache
Elma Ingénierie Informatique 6 rue du Faubourg Saint-Honoré F-75008 - Paris - France Pho. 33-144949901 Fax. 33-144949902
---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
|
|
 | | From: | Joshua D. Drake | | Subject: | Re: PostgreSQL clustering VS MySQL clustering | | Date: | Thu, 20 Jan 2005 07:03:01 -0800 |
|
|
 | This is a multi-part message in MIME format. --------------010502080809010003020009 Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit
> >No please do not talk about this again ... I'm looking about a PostgreSQL >solution ... I know RAC ... and I'm not able to pay for a RAC certify >hardware configuration plus a RAC Licence. > > What you want does not exist for PostgreSQL. You will either have to build it yourself or pay somebody to build it for you.
Sincerely,
Joshua D. Drake
>Regards, > >
-- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com PostgreSQL Replicator -- production quality replication for PostgreSQL
--------------010502080809010003020009 Content-Type: text/x-vcard; charset=utf-8; name="jd.vcf" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="jd.vcf"
begin:vcard fn:Joshua Drake n:Drake;Joshua org:Command Prompt, Inc. adr:;;PO Box 215 ;Cascade Locks;OR;97014;US email;internet:jd@commandprompt.com title:Consultant tel;work:503-667-4564 tel;fax:503-210-0334 x-mozilla-html:FALSE url:http://www.commandprompt.com version:2.1 end:vcard
--------------010502080809010003020009 Content-Type: text/plain Content-Disposition: inline Content-Transfer-Encoding: 8bit MIME-Version: 1.0
---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
--------------010502080809010003020009--
|
|
 | | From: | Christopher Kings-Lynne | | Subject: | Re: PostgreSQL clustering VS MySQL clustering | | Date: | Thu, 20 Jan 2005 14:58:42 +0000 |
|
|
 | >>Or you could fork over hundreds of thousands of dollars for Oracle's >>RAC. > > > No please do not talk about this again ... I'm looking about a PostgreSQL > solution ... I know RAC ... and I'm not able to pay for a RAC certify > hardware configuration plus a RAC Licence.
There is absolutely zero PostgreSQL solution...
You may have to split the data yourself onto two independent db servers and combine the results somehow in your application.
Chris
---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster
|
|
 | | From: | Joshua D. Drake | | Subject: | Re: PostgreSQL clustering VS MySQL clustering | | Date: | Thu, 20 Jan 2005 07:05:25 -0800 |
|
|
 | This is a multi-part message in MIME format. --------------030801040507090704090606 Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit
Christopher Kings-Lynne wrote:
>>> Or you could fork over hundreds of thousands of dollars for Oracle's >>> RAC. >> >> >> >> No please do not talk about this again ... I'm looking about a >> PostgreSQL solution ... I know RAC ... and I'm not able to pay for a >> RAC certify hardware configuration plus a RAC Licence. > > > There is absolutely zero PostgreSQL solution...
I just replied the same thing but then I was thinking. Couldn't he use multiple databases over multiple servers with dblink?
It is not exactly how I would want to do it, but it would provide what he needs I think???
Sincerely,
Joshua D. Drake
> > You may have to split the data yourself onto two independent db > servers and combine the results somehow in your application. > > Chris > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster
-- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com PostgreSQL Replicator -- production quality replication for PostgreSQL
--------------030801040507090704090606 Content-Type: text/x-vcard; charset=utf-8; name="jd.vcf" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="jd.vcf"
begin:vcard fn:Joshua Drake n:Drake;Joshua org:Command Prompt, Inc. adr:;;PO Box 215 ;Cascade Locks;OR;97014;US email;internet:jd@commandprompt.com title:Consultant tel;work:503-667-4564 tel;fax:503-210-0334 x-mozilla-html:FALSE url:http://www.commandprompt.com version:2.1 end:vcard
--------------030801040507090704090606 Content-Type: text/plain Content-Disposition: inline Content-Transfer-Encoding: 8bit MIME-Version: 1.0
---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
--------------030801040507090704090606--
|
|
 | | From: | Hervé_Piedvache | | Subject: | Re: PostgreSQL clustering VS MySQL clustering | | Date: | Thu, 20 Jan 2005 16:07:51 +0100 |
|
|
 | Le Jeudi 20 Janvier 2005 16:05, Joshua D. Drake a écrit : > Christopher Kings-Lynne wrote: > >>> Or you could fork over hundreds of thousands of dollars for Oracle's > >>> RAC. > >> > >> No please do not talk about this again ... I'm looking about a > >> PostgreSQL solution ... I know RAC ... and I'm not able to pay for a > >> RAC certify hardware configuration plus a RAC Licence. > > > > There is absolutely zero PostgreSQL solution... > > I just replied the same thing but then I was thinking. Couldn't he use > multiple databases > over multiple servers with dblink? > > It is not exactly how I would want to do it, but it would provide what > he needs I think???
Yes seems to be the only solution ... but I'm a little disapointed about this ... could you explain me why there is not this kind of functionnality ... it seems to be a real need for big applications no ?
Thanks all for your answers ... -- Hervé Piedvache
Elma Ingénierie Informatique 6 rue du Faubourg Saint-Honoré F-75008 - Paris - France Pho. 33-144949901 Fax. 33-144949902
---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster
|
|
 | | From: | Joshua D. Drake | | Subject: | Re: PostgreSQL clustering VS MySQL clustering | | Date: | Thu, 20 Jan 2005 07:12:42 -0800 |
|
|
 | This is a multi-part message in MIME format. --------------020209050909030606010208 Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit
>> then I was thinking. Couldn't he use >>multiple databases >>over multiple servers with dblink? >> >>It is not exactly how I would want to do it, but it would provide what >>he needs I think??? >> >> > >Yes seems to be the only solution ... but I'm a little disapointed about >this ... could you explain me why there is not this kind of >functionnality ... it seems to be a real need for big applications no ? > > Because it is really, really hard to do correctly and hard equals expensive.
Sincerely,
Joshua D. Drake
>Thanks all for your answers ... > >
-- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com PostgreSQL Replicator -- production quality replication for PostgreSQL
--------------020209050909030606010208 Content-Type: text/x-vcard; charset=utf-8; name="jd.vcf" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="jd.vcf"
begin:vcard fn:Joshua Drake n:Drake;Joshua org:Command Prompt, Inc. adr:;;PO Box 215 ;Cascade Locks;OR;97014;US email;internet:jd@commandprompt.com title:Consultant tel;work:503-667-4564 tel;fax:503-210-0334 x-mozilla-html:FALSE url:http://www.commandprompt.com version:2.1 end:vcard
--------------020209050909030606010208 Content-Type: text/plain Content-Disposition: inline Content-Transfer-Encoding: 8bit MIME-Version: 1.0
---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faqs/FAQ.html
--------------020209050909030606010208--
|
|
 | | From: | Jim C. Nasby | | Subject: | Re: PostgreSQL clustering VS MySQL clustering | | Date: | Thu, 20 Jan 2005 19:39:22 -0600 |
|
|
 | On Thu, Jan 20, 2005 at 07:12:42AM -0800, Joshua D. Drake wrote: > > >>then I was thinking. Couldn't he use > >>multiple databases > >>over multiple servers with dblink? > >> > >>It is not exactly how I would want to do it, but it would provide what > >>he needs I think??? > >> > >> > > > >Yes seems to be the only solution ... but I'm a little disapointed about > >this ... could you explain me why there is not this kind of > >functionnality ... it seems to be a real need for big applications no ? > > > > > Because it is really, really hard to do correctly and hard > equals expensive.
To expand on what Josh said, the expense in this case is development resources. If you look on the developer site you'll see a huge TODO list and a relatively small list of PostgreSQL developers. To develop a cluster solution similar to RAC would probably take the efforts of the entire development team for a year or more, during which time very little else would be done.
I'm glad to see your persistance in wanting to use PostgreSQL, and there might be some kind of limited clustering scheme that could be implemented without a great amount of effort by the core developers. In that case I think there's a good chance you could find people willing to work on it. -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828
Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster
|
|
 | | From: | Christopher Browne | | Subject: | Re: PostgreSQL clustering VS MySQL clustering | | Date: | Sun, 23 Jan 2005 00:46:51 -0500 |
|
|
 | In the last exciting episode, herve@elma.fr (Hervé Piedvache) wrote: > Le Jeudi 20 Janvier 2005 16:05, Joshua D. Drake a écrit : >> Christopher Kings-Lynne wrote: >> >>> Or you could fork over hundreds of thousands of dollars for Oracle's >> >>> RAC. >> >> >> >> No please do not talk about this again ... I'm looking about a >> >> PostgreSQL solution ... I know RAC ... and I'm not able to pay for a >> >> RAC certify hardware configuration plus a RAC Licence. >> > >> > There is absolutely zero PostgreSQL solution... >> >> I just replied the same thing but then I was thinking. Couldn't he use >> multiple databases >> over multiple servers with dblink? >> >> It is not exactly how I would want to do it, but it would provide what >> he needs I think??? > > Yes seems to be the only solution ... but I'm a little disapointed about > this ... could you explain me why there is not this kind of > functionnality ... it seems to be a real need for big applications no ?
If this is what you actually need, well, it's something that lots of people would sort of like to have, but it's really DIFFICULT to implement it.
Partitioning data onto different servers appears like it ought to be a good idea. Unfortunately, getting _exactly_ the right semantics is hard.
If the data is all truly independent, then it's no big deal; just have one server for one set of data, and another for the other.
But reality normally is that if you _think_ you need a cluster, that's because some of the data needs to be _shared_, which means you need to either:
a) Have queries that run across two databases, or
b) Replicate the shared data between the systems.
We're likely back to the need for replication. -- If this was helpful, rate me http://www3.sympatico.ca/cbbrowne/rdbms.html "It is the user who should parameterize procedures, not their creators." -- Alan Perlis
|
|
 | | From: | Edgars Diebelis | | Subject: | Re: PostgreSQL clustering VS MySQL clustering | | Date: | Thu, 20 Jan 2005 17:24:37 +0200 |
|
|
 | I have no experience with pgCluster, but I found: PGCluster is a multi-master and synchronous replication system that supports load balancing of PostgreSQL. http://www.software-facilities.com/databases-software/pgcluster.php
May be some have some expierience with this tool?
----- Original Message ----- From: "Christopher Kings-Lynne" To: "Hervé Piedvache" Cc: "Jeff" ; Sent: Thursday, January 20, 2005 4:58 PM Subject: [spam] Re: [PERFORM] PostgreSQL clustering VS MySQL clustering
>>>Or you could fork over hundreds of thousands of dollars for Oracle's >>>RAC. >> >> >> No please do not talk about this again ... I'm looking about a PostgreSQL >> solution ... I know RAC ... and I'm not able to pay for a RAC certify >> hardware configuration plus a RAC Licence. > > There is absolutely zero PostgreSQL solution... > > You may have to split the data yourself onto two independent db servers > and combine the results somehow in your application. > > Chris > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster >
---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster
|
|
 | | From: | Greg Stark | | Subject: | Re: PostgreSQL clustering VS MySQL clustering | | Date: | 20 Jan 2005 11:44:20 -0500 |
|
|
 | Steve Wampler writes:
> Hervé Piedvache wrote: > > > No ... as I have said ... how I'll manage a database getting a table of may > > be 250 000 000 records ? I'll need incredible servers ... to get quick access > > or index reading ... no ? > > Probably by carefully partitioning their data. I can't imagine anything > being fast on a single table in 250,000,000 tuple range.
Why are you all so psyched out by the size of the table? That's what indexes are for.
The size of the table really isn't relevant here. The important thing is the size of the working set. Ie, How many of those records are required to respond to queries.
As long as you tune your application so every query can be satisfied by reading a (very) limited number of those records and have indexes to speed access to those records you can have quick response time even if you have terabytes of raw data.
I would start by looking at the plans for the queries you're running and seeing if you have any queries that are reading more than hundred records or so. If so then you have to optimize them or rethink your application design. You might need to restructure your data so you don't have to scan too many records for any query.
No clustering system is going to help you if your application requires reading through too much data. If every query is designed to not have to read more than a hundred or so records then there's no reason you can't have sub-100ms response time even if you had terabytes of raw data.
If the problem is just that each individual query is fast but there's too many coming for a single server then something like slony is all you need. It'll spread the load over multiple machines. If you spread the load in an intelligent way you can even concentrate each server on certain subsets of the data. But that shouldn't even really be necessary, just a nice improvement.
-- greg
---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly
|
|
 | | From: | Jean-Max Reymond | | Subject: | Re: PostgreSQL clustering VS MySQL clustering | | Date: | Thu, 20 Jan 2005 20:35:44 +0100 |
|
|
 | On Thu, 20 Jan 2005 12:13:17 -0700, Steve Wampler wrote: > Mitch Pirtle wrote:
> But that's not enough, because you're going to be running separate > postgresql backends on the different hosts, and there are > definitely consistency issues with trying to do that. So far as > I know (right, experts?) postgresql isn't designed with providing > distributed consistency in mind (isn't shared memory used for > consistency, which restricts all the backends to a single host?).
yes, you're right: you'll need a Distributed Lock Manager and an application to manage it , Postgres ?
---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings
|
|
 | | From: | Mitch Pirtle | | Subject: | Re: PostgreSQL clustering VS MySQL clustering | | Date: | Thu, 20 Jan 2005 13:42:25 -0500 |
|
|
 | On Thu, 20 Jan 2005 09:33:42 -0800, Darcy Buskermolen wrote: > > Another Option to consider would be pgmemcache. that way you just build the > farm out of lots of large memory, diskless boxes for keeping the whole > database in memory in the whole cluster. More information on it can be found > at: http://people.freebsd.org/~seanc/pgmemcache/
Which brings up another question: why not just cluster at the hardware layer? Get an external fiberchannel array, and cluster a bunch of dual Opterons, all sharing that storage. In that sense you would be getting one big PostgreSQL 'image' running across all of the servers.
Or is that idea too 90's? ;-)
-- Mitch
---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
|
|
 | | From: | Steve Wampler | | Subject: | Re: PostgreSQL clustering VS MySQL clustering | | Date: | Thu, 20 Jan 2005 12:13:17 -0700 |
|
|
 | Mitch Pirtle wrote:
> Which brings up another question: why not just cluster at the hardware > layer? Get an external fiberchannel array, and cluster a bunch of dual > Opterons, all sharing that storage. In that sense you would be getting > one big PostgreSQL 'image' running across all of the servers.
This isn't as easy as it sounds. Simply sharing the array among hosts with a 'standard' file system won't work because of cache inconsistencies. So, you need to put a shareable filesystem (such as GFS or Lustre) on it.
But that's not enough, because you're going to be running separate postgresql backends on the different hosts, and there are definitely consistency issues with trying to do that. So far as I know (right, experts?) postgresql isn't designed with providing distributed consistency in mind (isn't shared memory used for consistency, which restricts all the backends to a single host?).
-- Steve Wampler -- swampler@noao.edu The gods that smiled on your birth are now laughing out loud.
---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings
|
|
 | | From: | Bruno Almeida do Lago | | Subject: | Re: PostgreSQL clustering VS MySQL clustering | | Date: | Thu, 20 Jan 2005 22:40:02 -0200 |
|
|
 | I was thinking the same! I'd like to know how other databases such as Oracle do it.
-----Original Message----- From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Mitch Pirtle Sent: Thursday, January 20, 2005 4:42 PM To: pgsql-performance@postgresql.org Subject: Re: [PERFORM] PostgreSQL clustering VS MySQL clustering
On Thu, 20 Jan 2005 09:33:42 -0800, Darcy Buskermolen wrote: > > Another Option to consider would be pgmemcache. that way you just build the > farm out of lots of large memory, diskless boxes for keeping the whole > database in memory in the whole cluster. More information on it can be found > at: http://people.freebsd.org/~seanc/pgmemcache/
Which brings up another question: why not just cluster at the hardware layer? Get an external fiberchannel array, and cluster a bunch of dual Opterons, all sharing that storage. In that sense you would be getting one big PostgreSQL 'image' running across all of the servers.
Or is that idea too 90's? ;-)
-- Mitch
---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings
|
|
 | | From: | Dave Cramer | | Subject: | Re: PostgreSQL clustering VS MySQL clustering | | Date: | Thu, 20 Jan 2005 20:04:19 -0500 |
|
|
 | This is a multi-part message in MIME format. --------------010200020507040709090003 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit
This idea won't work with postgresql only one instance can operate on a datastore at a time.
Dave
Bruno Almeida do Lago wrote:
> >I was thinking the same! I'd like to know how other databases such as Oracle >do it. > >-----Original Message----- >From: pgsql-performance-owner@postgresql.org >[mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Mitch Pirtle >Sent: Thursday, January 20, 2005 4:42 PM >To: pgsql-performance@postgresql.org >Subject: Re: [PERFORM] PostgreSQL clustering VS MySQL clustering > >On Thu, 20 Jan 2005 09:33:42 -0800, Darcy Buskermolen > wrote: > > >>Another Option to consider would be pgmemcache. that way you just build >> >> >the > > >>farm out of lots of large memory, diskless boxes for keeping the whole >>database in memory in the whole cluster. More information on it can be >> >> >found > > >>at: http://people.freebsd.org/~seanc/pgmemcache/ >> >> > >Which brings up another question: why not just cluster at the hardware >layer? Get an external fiberchannel array, and cluster a bunch of dual >Opterons, all sharing that storage. In that sense you would be getting >one big PostgreSQL 'image' running across all of the servers. > >Or is that idea too 90's? ;-) > >-- Mitch > >---------------------------(end of broadcast)--------------------------- >TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > > >---------------------------(end of broadcast)--------------------------- >TIP 7: don't forget to increase your free space map settings > > > >
--------------010200020507040709090003 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit
This idea won't work with postgresql only one instance can operate on a datastore at a time.
Dave
Bruno Almeida do Lago wrote:
type="cite"> I was thinking the same! I'd like to know how other databases such as Oracle do it.
-----Original Message----- From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Mitch Pirtle Sent: Thursday, January 20, 2005 4:42 PM To: pgsql-performance@postgresql.org Subject: Re: [PERFORM] PostgreSQL clustering VS MySQL clustering
On Thu, 20 Jan 2005 09:33:42 -0800, Darcy Buskermolen <darcy@wavefire.com> wrote: Another Option to consider would be pgmemcache. that way you just build
the farm out of lots of large memory, diskless boxes for keeping the whole database in memory in the whole cluster. More information on it can be
found at: http://people.freebsd.org/~seanc/pgmemcache/
Which brings up another question: why not just cluster at the hardware layer? Get an external fiberchannel array, and cluster a bunch of dual Opterons, all sharing that storage. In that sense you would be getting one big PostgreSQL 'image' running across all of the servers.
Or is that idea too 90's? ;-)
-- Mitch
---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings
--------------010200020507040709090003--
|
|
 | | From: | Jim C. Nasby | | Subject: | Re: PostgreSQL clustering VS MySQL clustering | | Date: | Thu, 20 Jan 2005 19:30:40 -0600 |
|
|
 | On Thu, Jan 20, 2005 at 10:40:02PM -0200, Bruno Almeida do Lago wrote: > > I was thinking the same! I'd like to know how other databases such as Oracle > do it. > In a nutshell, in a clustered environment (which iirc in oracle means shared disks), they use a set of files for locking and consistency across machines. So you better have fast access to the drive array, and the array better have caching of some kind. -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828
Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly
|
|
 | | From: | Iain | | Subject: | Re: PostgreSQL clustering VS MySQL clustering | | Date: | Fri, 21 Jan 2005 11:14:59 +0900 |
|
|
 | Oracle's RAC is good, but I think it's best to view it as a step in the high availability direction rather than a performance enhancer. While it can help your application scale up, that depends on the usage pattern. Also it's not 100% transparent to the application for example you can't depend on a sequence numbers being allocated uniquely as there can be delays propagating them to all nodes. So in clusters where insert rates are high this means you should explicitly check for unique key violations and try again. Dealing with propagation delays comes with the clustering technology I guess. Nonetheless, I would love to see this kind of functionality in postgres.
Regards Iain
----- Original Message ----- From: "Jim C. Nasby" To: "Bruno Almeida do Lago" Cc: "'Mitch Pirtle'" ;
Sent: Friday, January 21, 2005 10:30 AM Subject: Re: [PERFORM] PostgreSQL clustering VS MySQL clustering
> On Thu, Jan 20, 2005 at 10:40:02PM -0200, Bruno Almeida do Lago wrote: >> >> I was thinking the same! I'd like to know how other databases such as >> Oracle >> do it. >> > In a nutshell, in a clustered environment (which iirc in oracle means > shared disks), they use a set of files for locking and consistency > across machines. So you better have fast access to the drive array, and > the array better have caching of some kind. > -- > Jim C. Nasby, Database Consultant decibel@decibel.org > Give your computer some brain candy! www.distributed.net Team #1828 > > Windows: "Where do you want to go today?" > Linux: "Where do you want to go tomorrow?" > FreeBSD: "Are you guys coming, or what?" > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly
---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend
|
|
 | | From: | Josh Berkus | | Subject: | Re: PostgreSQL clustering VS MySQL clustering | | Date: | Thu, 20 Jan 2005 17:25:41 -0800 |
|
|
 | Bruno,
> Which brings up another question: why not just cluster at the hardware > layer? Get an external fiberchannel array, and cluster a bunch of dual > Opterons, all sharing that storage. In that sense you would be getting > one big PostgreSQL 'image' running across all of the servers. > > Or is that idea too 90's? ;-)
No, it just doesn't work. Multiple postmasters can't share one database.
LinuxLabs (as I've gathered) tried to go one better by using a tool that allows shared memory to bridge multple networked servers -- in other words, one postmaster controlling 4 or 5 servers. The problem is that IPC via this method is about 1,000 times slower than IPC on a single machine, wiping out all of the scalability gains from having the cluster in the first place.
-- --Josh
Josh Berkus Aglio Database Solutions San Francisco
---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
|
|
 | | From: | Darcy Buskermolen | | Subject: | Re: PostgreSQL clustering VS MySQL clustering | | Date: | Thu, 20 Jan 2005 11:07:23 -0800 |
|
|
 | On January 20, 2005 10:42 am, Mitch Pirtle wrote: > On Thu, 20 Jan 2005 09:33:42 -0800, Darcy Buskermolen > > wrote: > > Another Option to consider would be pgmemcache. that way you just build > > the farm out of lots of large memory, diskless boxes for keeping the > > whole database in memory in the whole cluster. More information on it > > can be found at: http://people.freebsd.org/~seanc/pgmemcache/ > > Which brings up another question: why not just cluster at the hardware > layer? Get an external fiberchannel array, and cluster a bunch of dual > Opterons, all sharing that storage. In that sense you would be getting > one big PostgreSQL 'image' running across all of the servers.
It dosn't quite work that way, thanks to shared memory, and kernel disk cache. (among other things) > > Or is that idea too 90's? ;-) > > -- Mitch > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
-- Darcy Buskermolen Wavefire Technologies Corp. ph: 250.717.0200 fx: 250.763.1759 http://www.wavefire.com
---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faqs/FAQ.html
|
|
 | | From: | Joshua D. Drake | | Subject: | Re: PostgreSQL clustering VS MySQL clustering | | Date: | Thu, 20 Jan 2005 06:44:16 -0800 |
|
|
 | This is a multi-part message in MIME format. --------------040708050609000701040803 Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 8bit
Hervé Piedvache wrote:
>Dear community, > >My company, which I actually represent, is a fervent user of PostgreSQL. >We used to make all our applications using PostgreSQL for more than 5 years. >We usually do classical client/server applications under Linux, and Web >interface (php, perl, C/C++). We used to manage also public web services with >10/15 millions records and up to 8 millions pages view by month. > > Depending on your needs either:
Slony: www.slony.info
or
Replicator: www.commandprompt.com
Will both do what you want. Replicator is easier to setup but Slony is free.
Sincerely,
Joshua D. Drake
>Now we are in front of a new need, but we do not find any good solution with >PostgreSQL. >We need to make a sort of directory of millions of data growing about 4/8 >millions per month, and to be able to be used by many users from the web. In >order to do this, our solution need to be able to run perfectly with many >insert and many select access (done before each insert, and done by web site >visitors). We will also need to make a search engine for the millions of data >(140/150 millions records at the immediate beginning) ... No it's not google, >but the kind of volume of data stored in the main table is similar. > >Then ... we have made some tests, with the actual servers we have here, like a >Bi-Pro Xeon 2.8 Ghz, with 4 Gb of RAM and the result of the cumulative >inserts, and select access is slowing down the service really quickly ... >(Load average is going up to 10 really quickly on the database). > >We were at this moment thinking about a Cluster solution ... We saw on the >Internet many solution talking about Cluster solution using MySQL ... but >nothing about PostgreSQL ... the idea is to use several servers to make a >sort of big virtual server using the disk space of each server as one, and >having the ability to use the CPU and RAM of each servers in order to >maintain good service performance ...one can imagin it is like a GFS but >dedicated to postgreSQL... > >Is there any solution with PostgreSQL matching these needs ... ? >Do we have to backport our development to MySQL for this kind of problem ? >Is there any other solution than a Cluster for our problem ? > >Looking for your reply, > >Regards, > >
-- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com PostgreSQL Replicator -- production quality replication for PostgreSQL
--------------040708050609000701040803 Content-Type: text/x-vcard; charset=utf-8; name="jd.vcf" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="jd.vcf"
begin:vcard fn:Joshua Drake n:Drake;Joshua org:Command Prompt, Inc. adr:;;PO Box 215 ;Cascade Locks;OR;97014;US email;internet:jd@commandprompt.com title:Consultant tel;work:503-667-4564 tel;fax:503-210-0334 x-mozilla-html:FALSE url:http://www.commandprompt.com version:2.1 end:vcard
--------------040708050609000701040803 Content-Type: text/plain Content-Disposition: inline Content-Transfer-Encoding: 8bit MIME-Version: 1.0
---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly
--------------040708050609000701040803--
|
|
 | | From: | Hervé_Piedvache | | Subject: | Re: PostgreSQL clustering VS MySQL clustering | | Date: | Thu, 20 Jan 2005 16:00:47 +0100 |
|
|
 | Joshua,
Le Jeudi 20 Janvier 2005 15:44, Joshua D. Drake a écrit : > Hervé Piedvache wrote: > > > >My company, which I actually represent, is a fervent user of PostgreSQL. > >We used to make all our applications using PostgreSQL for more than 5 > > years. We usually do classical client/server applications under Linux, > > and Web interface (php, perl, C/C++). We used to manage also public web > > services with 10/15 millions records and up to 8 millions pages view by > > month. > > Depending on your needs either: > > Slony: www.slony.info > > or > > Replicator: www.commandprompt.com > > Will both do what you want. Replicator is easier to setup but > Slony is free.
No ... as I have said ... how I'll manage a database getting a table of may be 250 000 000 records ? I'll need incredible servers ... to get quick access or index reading ... no ?
So what we would like to get is a pool of small servers able to make one virtual server ... for that is called a Cluster ... no ?
I know they are not using PostgreSQL ... but how a company like Google do to get an incredible database in size and so quick access ?
regards, -- Hervé Piedvache
Elma Ingénierie Informatique 6 rue du Faubourg Saint-Honoré F-75008 - Paris - France Pho. 33-144949901 Fax. 33-144949902
---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives?
http://archives.postgresql.org
|
|
 | | From: | Steve Wampler | | Subject: | Re: PostgreSQL clustering VS MySQL clustering | | Date: | Thu, 20 Jan 2005 08:14:28 -0700 |
|
|
 | Hervé Piedvache wrote:
> > No ... as I have said ... how I'll manage a database getting a table of may be > 250 000 000 records ? I'll need incredible servers ... to get quick access or > index reading ... no ? > > So what we would like to get is a pool of small servers able to make one > virtual server ... for that is called a Cluster ... no ? > > I know they are not using PostgreSQL ... but how a company like Google do to > get an incredible database in size and so quick access ?
Probably by carefully partitioning their data. I can't imagine anything being fast on a single table in 250,000,000 tuple range. Nor can I really imagine any database that efficiently splits a single table across multiple machines (or even inefficiently unless some internal partitioning is being done).
So, you'll have to do some work at your end and not just hope that a "magic bullet" is available.
Once you've got the data partitioned, the question becomes one of how to inhance performance/scalability. Have you considered RAIDb?
-- Steve Wampler -- swampler@noao.edu The gods that smiled on your birth are now laughing out loud.
---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings
|
|
 | | From: | Hervé_Piedvache | | Subject: | Re: PostgreSQL clustering VS MySQL clustering | | Date: | Thu, 20 Jan 2005 16:23:17 +0100 |
|
|
 | Le Jeudi 20 Janvier 2005 16:14, Steve Wampler a écrit : > Once you've got the data partitioned, the question becomes one of > how to inhance performance/scalability. Have you considered RAIDb?
No but I'll seems to be very interesting ... close to the explanation of Joshua ... but automaticly done ...
Thanks ! -- Hervé Piedvache
Elma Ingénierie Informatique 6 rue du Faubourg Saint-Honoré F-75008 - Paris - France Pho. 33-144949901 Fax. 33-144949902
---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
|
|
 | | From: | Christopher Kings-Lynne | | Subject: | Re: PostgreSQL clustering VS MySQL clustering | | Date: | Thu, 20 Jan 2005 15:57:43 +0000 |
|
|
 | > Probably by carefully partitioning their data. I can't imagine anything > being fast on a single table in 250,000,000 tuple range. Nor can I > really imagine any database that efficiently splits a single table > across multiple machines (or even inefficiently unless some internal > partitioning is being done).
Ah, what about partial indexes - those might help. As a kind of 'semi-partition'.
Chris
---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly
|
|
 | | From: | Joshua D. Drake | | Subject: | Re: PostgreSQL clustering VS MySQL clustering | | Date: | Thu, 20 Jan 2005 08:04:04 -0800 |
|
|
 | This is a multi-part message in MIME format. --------------090906040804070208060205 Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit
Christopher Kings-Lynne wrote:
>> Probably by carefully partitioning their data. I can't imagine anything >> being fast on a single table in 250,000,000 tuple range. Nor can I >> really imagine any database that efficiently splits a single table >> across multiple machines (or even inefficiently unless some internal >> partitioning is being done). > > > Ah, what about partial indexes - those might help. As a kind of > 'semi-partition'.
He could also you schemas to partition out the information within the same database.
J
> > Chris
-- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com PostgreSQL Replicator -- production quality replication for PostgreSQL
--------------090906040804070208060205 Content-Type: text/x-vcard; charset=utf-8; name="jd.vcf" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="jd.vcf"
begin:vcard fn:Joshua Drake n:Drake;Joshua org:Command Prompt, Inc. adr:;;PO Box 215 ;Cascade Locks;OR;97014;US email;internet:jd@commandprompt.com title:Consultant tel;work:503-667-4564 tel;fax:503-210-0334 x-mozilla-html:FALSE url:http://www.commandprompt.com version:2.1 end:vcard
--------------090906040804070208060205 Content-Type: text/plain Content-Disposition: inline Content-Transfer-Encoding: 8bit MIME-Version: 1.0
---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster
--------------090906040804070208060205--
|
|
 | | From: | Joshua D. Drake | | Subject: | Re: PostgreSQL clustering VS MySQL clustering | | Date: | Thu, 20 Jan 2005 07:04:19 -0800 |
|
|
 | This is a multi-part message in MIME format. --------------000206010106060001020400 Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit
>So what we would like to get is a pool of small servers able to make one >virtual server ... for that is called a Cluster ... no ? > >I know they are not using PostgreSQL ... but how a company like Google do to >get an incredible database in size and so quick access ? > > You could use dblink with multiple servers across data partitions within PostgreSQL but I don't know how fast that would be.
J
>regards, > >
-- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com PostgreSQL Replicator -- production quality replication for PostgreSQL
--------------000206010106060001020400 Content-Type: text/x-vcard; charset=utf-8; name="jd.vcf" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="jd.vcf"
begin:vcard fn:Joshua Drake n:Drake;Joshua org:Command Prompt, Inc. adr:;;PO Box 215 ;Cascade Locks;OR;97014;US email;internet:jd@commandprompt.com title:Consultant tel;work:503-667-4564 tel;fax:503-210-0334 x-mozilla-html:FALSE url:http://www.commandprompt.com version:2.1 end:vcard
--------------000206010106060001020400 Content-Type: text/plain Content-Disposition: inline Content-Transfer-Encoding: 8bit MIME-Version: 1.0
---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faqs/FAQ.html
--------------000206010106060001020400--
|
|
 | | From: | Stephen Frost | | Subject: | Re: PostgreSQL clustering VS MySQL clustering | | Date: | Thu, 20 Jan 2005 10:07:37 -0500 |
|
|
 | --CkVcT5DPb0HurK+K Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable
* Herv? Piedvache (herve@elma.fr) wrote: > I know they are not using PostgreSQL ... but how a company like Google do= to=20 > get an incredible database in size and so quick access ?
They segment their data across multiple machines and have an algorithm which tells the application layer which machine to contact for what data.
Stephen
--CkVcT5DPb0HurK+K Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature Content-Disposition: inline
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux)
iD8DBQFB78k5rzgMPqB3kigRAqgCAJ94nkR30iQ2ZyqOcUkwLCXOnuIH8gCeOXfl znbL8QGVsfXj3DJ5kvEQ+eY= =Euv0 -----END PGP SIGNATURE-----
--CkVcT5DPb0HurK+K--
|
|
 | | From: | Dave Cramer | | Subject: | Re: PostgreSQL clustering VS MySQL clustering | | Date: | Thu, 20 Jan 2005 10:23:34 -0500 |
|
|
 | This is a multi-part message in MIME format. --------------000002000804000801020902 Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 8bit
Google uses something called the google filesystem, look it up in google. It is a distributed file system.
Dave
Hervé Piedvache wrote:
>Joshua, > >Le Jeudi 20 Janvier 2005 15:44, Joshua D. Drake a écrit : > > >>Hervé Piedvache wrote: >> >> >>>My company, which I actually represent, is a fervent user of PostgreSQL. >>>We used to make all our applications using PostgreSQL for more than 5 >>>years. We usually do classical client/server applications under Linux, >>>and Web interface (php, perl, C/C++). We used to manage also public web >>>services with 10/15 millions records and up to 8 millions pages view by >>>month. >>> >>> >>Depending on your needs either: >> >>Slony: www.slony.info >> >>or >> >>Replicator: www.commandprompt.com >> >>Will both do what you want. Replicator is easier to setup but >>Slony is free. >> >> > >No ... as I have said ... how I'll manage a database getting a table of may be >250 000 000 records ? I'll need incredible servers ... to get quick access or >index reading ... no ? > >So what we would like to get is a pool of small servers able to make one >virtual server ... for that is called a Cluster ... no ? > >I know they are not using PostgreSQL ... but how a company like Google do to >get an incredible database in size and so quick access ? > >regards, > >
-- Dave Cramer http://www.postgresintl.com 519 939 0336 ICQ#14675561
--------------000002000804000801020902 Content-Type: text/html; charset=ISO-8859-15 Content-Transfer-Encoding: 8bit
http-equiv="Content-Type">
Google uses something called the google filesystem, look it up in google. It is a distributed file system.
Dave
Hervé Piedvache wrote:
Joshua,
Le Jeudi 20 Janvier 2005 15:44, Joshua D. Drake a écrit : Hervé Piedvache wrote: My company, which I actually represent, is a fervent user of PostgreSQL. We used to make all our applications using PostgreSQL for more than 5 years. We usually do classical client/server applications under Linux, and Web interface (php, perl, C/C++). We used to manage also public web services with 10/15 millions records and up to 8 millions pages view by month.
Depending on your needs either:
Slony: www.slony.info
or
Replicator: www.commandprompt.com
Will both do what you want. Replicator is easier to setup but Slony is free.
No ... as I have said ... how I'll manage a database getting a table of may be 250 000 000 records ? I'll need incredible servers ... to get quick access or index reading ... no ?
So what we would like to get is a pool of small servers able to make one virtual server ... for that is called a Cluster ... no ?
I know they are not using PostgreSQL ... but how a company like Google do to get an incredible database in size and so quick access ?
regards,
-- Dave Cramer http://www.postgresintl.com 519 939 0336 ICQ#14675561
--------------000002000804000801020902--
|
|
 | | From: | Hervé_Piedvache | | Subject: | Re: PostgreSQL clustering VS MySQL clustering | | Date: | Thu, 20 Jan 2005 16:32:27 +0100 |
|
|
 | Le Jeudi 20 Janvier 2005 16:23, Dave Cramer a écrit : > Google uses something called the google filesystem, look it up in > google. It is a distributed file system.
Yes that's another point I'm working on ... make a cluster of server using GFS ... and making PostgreSQL running with it ...
But I have not finished my test ... and may be people could have experience with this ...
Regards, -- Hervé Piedvache
Elma Ingénierie Informatique 6 rue du Faubourg Saint-Honoré F-75008 - Paris - France Pho. 33-144949901 Fax. 33-144949902
---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend
|
|
 | | From: | Holger Hoffstaette | | Subject: | Re: PostgreSQL clustering VS MySQL clustering | | Date: | Thu, 20 Jan 2005 17:55:56 +0100 |
|
|
 | On Thu, 20 Jan 2005 16:32:27 +0100, Hervé Piedvache wrote:
> Le Jeudi 20 Janvier 2005 16:23, Dave Cramer a écrit : >> Google uses something called the google filesystem, look it up in >> google. It is a distributed file system. > > Yes that's another point I'm working on ... make a cluster of server using > GFS ... and making PostgreSQL running with it ...
Did you read the GFS whitepaper? It really works differently from other filesystems with regard to latency and consistency. You'll probably have better success with Lustre (http://www.clusterfs.com/) or RedHat's Global File System (http://www.redhat.com/software/rha/gfs/). If you're looking for a 'cheap, free and easy' solution you can just as well stop right now. :-)
-h
---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster
|
|
 | | From: | Steve Wampler | | Subject: | Re: PostgreSQL clustering VS MySQL clustering | | Date: | Thu, 20 Jan 2005 08:40:04 -0700 |
|
|
 | Hervé Piedvache wrote: > Le Jeudi 20 Janvier 2005 16:23, Dave Cramer a écrit : > >>Google uses something called the google filesystem, look it up in >>google. It is a distributed file system. > > > Yes that's another point I'm working on ... make a cluster of server using > GFS ... and making PostgreSQL running with it ...
A few years ago I played around with GFS, but not for postgresql.
I don't think it's going to help - logically there's no difference between putting PG on GFS and putting PG on NFS - in both cases the filesystem doesn't provide any support for distributing the task at hand - and a PG database server isn't written to be distributed across hosts regardless of the distribution of the data across filesystems.
-- Steve Wampler -- swampler@noao.edu The gods that smiled on your birth are now laughing out loud.
---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
|
|
 | | From: | amrit at health2.moph.go.th | | Subject: | Re: PostgreSQL clustering VS MySQL clustering | | Date: | Thu, 20 Jan 2005 23:10:03 +0700 |
|
|
 | What you want is some kind of huge pararell computing , isn't it? I have heard from many groups of Japanese Pgsql developer did it but they are talking in japanese website and of course in Japanese. I can name one of them " Asushi Mitani" and his website http://www.csra.co.jp/~mitani/jpug/pgcluster/en/index.html and you may directly contact him.
Amrit Thailand
---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly
|
|
 | | From: | Christopher Browne | | Subject: | Re: PostgreSQL clustering VS MySQL clustering | | Date: | Sun, 23 Jan 2005 00:58:28 -0500 |
|
|
 | After a long battle with technology, herve@elma.fr (Hervé Piedvache), an earthling, wrote: > Joshua, > > Le Jeudi 20 Janvier 2005 15:44, Joshua D. Drake a écrit : >> Hervé Piedvache wrote: >> > >> >My company, which I actually represent, is a fervent user of PostgreSQL. >> >We used to make all our applications using PostgreSQL for more than 5 >> > years. We usually do classical client/server applications under Linux, >> > and Web interface (php, perl, C/C++). We used to manage also public web >> > services with 10/15 millions records and up to 8 millions pages view by >> > month. >> >> Depending on your needs either: >> >> Slony: www.slony.info >> >> or >> >> Replicator: www.commandprompt.com >> >> Will both do what you want. Replicator is easier to setup but >> Slony is free. > > No ... as I have said ... how I'll manage a database getting a table > of may be 250 000 000 records ? I'll need incredible servers ... to > get quick access or index reading ... no ? > > So what we would like to get is a pool of small servers able to make > one virtual server ... for that is called a Cluster ... no ?
The term "cluster" simply indicates the use of multiple servers.
There are numerous _DIFFERENT_ forms of "clusters," so that for someone to say "I want a cluster" commonly implies that since they didn't realize the need to specify things further, they really don't know what they want in a usefully identifiable way.
> I know they are not using PostgreSQL ... but how a company like > Google do to get an incredible database in size and so quick access > ?
Google has built a specialized application that evidently falls into the category known as "embarrassingly parallel."
There are classes of applications that are amenable to parallelization.
Those tend to be applications completely different from those implemented atop transactional data stores like PostgreSQL.
If your problem is "embarrassingly parallel," then I'd bet lunch that PostgreSQL (and all other SQL databases) are exactly the _wrong_ tool for implementing its data store.
If your problem is _not_ "embarrassingly parallel," then you'll almost certainly discover that the cheapest way to make it fast involves fitting all the data onto _one_ computer so that you do not have to pay the costs of transmitting data over slow inter-computer communications links. -- let name="cbbrowne" and tld="gmail.com" in String.concat "@" [name;tld];; http://www.ntlug.org/~cbbrowne/ It isn't that physicists enjoy physics more than they enjoy , its that they enjoy more when they are thinking of physics.
|
|
 | | From: | Rod Taylor | | Subject: | Re: PostgreSQL clustering VS MySQL clustering | | Date: | Thu, 20 Jan 2005 11:02:58 -0500 |
|
|
 | On Thu, 2005-01-20 at 15:36 +0100, Hervé Piedvache wrote: > Le Jeudi 20 Janvier 2005 15:24, Christopher Kings-Lynne a écrit : > > > Is there any solution with PostgreSQL matching these needs ... ? > > > > You want: http://www.slony.info/ > > > > > Do we have to backport our development to MySQL for this kind of problem > > > ? Is there any other solution than a Cluster for our problem ? > > > > Well, Slony does replication which is basically what you want :) > > > > Only master->slave though, so you will need to have all inserts go via > > the master server, but selects can come off any server. > > Sorry but I don't agree with this ... Slony is a replication solution ... I > don't need replication ... what will I do when my database will grow up to 50 > Gb ... I'll need more than 50 Gb of RAM on each server ??? > This solution is not very realistic for me ...
Slony has some other issues with databases > 200GB in size as well (well, it hates long running transactions -- and pg_dump is a regular long running transaction)
However, you don't need RAM one each server for this, you simply need enough disk space.
Have a Master which takes writes, a "replicator" which you can consider to be a hot-backup of the master, have N slaves replicate off of the otherwise untouched "replicator" machine.
For your next trick, have the application send read requests for Clients A-C to slave 1, D-F to slave 2, ...
You need enough memory to hold the index sections for clients A-C on slave 1. The rest of the index can remain on disk. It's available should it be required (D-F box crashed, so your application is now feeding those read requests to the A-C machine)...
Go to more slaves and smaller segments as you require. Use the absolute cheapest hardware you can find for the slaves that gives reasonable performance. They don't need to be reliable, so RAID 0 on IDE drives is perfectly acceptable.
PostgreSQL can do the replication portion quite nicely. You need to implement the "cluster" part in the application side. --
---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
|
|
 | | From: | Stephen Frost | | Subject: | Re: PostgreSQL clustering VS MySQL clustering | | Date: | Thu, 20 Jan 2005 09:30:45 -0500 |
|
|
 | --8xrkxCw8NdvQ1PPj Content-Type: text/plain; charset=us-ascii Content-Disposition: inline
* Herv? Piedvache (herve@elma.fr) wrote: > Is there any solution with PostgreSQL matching these needs ... ?
You might look into pg_pool. Another possibility would be slony, though I'm not sure it's to the point you need it at yet, depends on if you can handle some delay before an insert makes it to the slave select systems.
> Do we have to backport our development to MySQL for this kind of problem ?
Well, hopefully not. :)
> Is there any other solution than a Cluster for our problem ?
Bigger server, more CPUs/disks in one box. Try to partition up your data some way such that it can be spread across multiple machines, then if you need to combine the data have it be replicated using slony to a big box that has a view which joins all the tables and do your big queries against that.
Just some thoughts.
Stephen
--8xrkxCw8NdvQ1PPj Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature Content-Disposition: inline
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux)
iD8DBQFB78CVrzgMPqB3kigRAtYCAJwI/YJlqadWGNSNJgdYLXQhoHwRJACfbEPm UEWaQ8XGRcGInTJxXiFvAEQ= =gNO+ -----END PGP SIGNATURE-----
--8xrkxCw8NdvQ1PPj--
|
|
 | | From: | Hervé_Piedvache | | Subject: | Re: PostgreSQL clustering VS MySQL clustering | | Date: | Thu, 20 Jan 2005 15:39:49 +0100 |
|
|
 | Le Jeudi 20 Janvier 2005 15:30, Stephen Frost a écrit : > * Herv? Piedvache (herve@elma.fr) wrote: > > Is there any solution with PostgreSQL matching these needs ... ? > > You might look into pg_pool. Another possibility would be slony, though > I'm not sure it's to the point you need it at yet, depends on if you can > handle some delay before an insert makes it to the slave select systems.
I think not ... pgpool or slony are replication solutions ... but as I have said to Christopher Kings-Lynne how I'll manage the scalabilty of the database ? I'll need several servers able to load a database growing and growing to get good speed performance ...
> > Do we have to backport our development to MySQL for this kind of problem > > ? > > Well, hopefully not. :)
I hope so ;o)
> > Is there any other solution than a Cluster for our problem ? > > Bigger server, more CPUs/disks in one box. Try to partition up your > data some way such that it can be spread across multiple machines, then > if you need to combine the data have it be replicated using slony to a > big box that has a view which joins all the tables and do your big > queries against that.
But I'll arrive to limitation of a box size quickly I thing a 4 processors with 64 Gb of RAM ... and after ?
regards, -- Hervé
---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
|
| |
|
|