I've noticed in recent testing that current versions of both RedHat and SuSE default to encodings of "en_US.UTF-8" in the US. Presumably they do something corresponding in other countries.
I think this means that in the basic install-from-source instructions, we need to warn people to compile with --locale='C'. I know it caught me by surprise when indexing for LIKE stopped working, and I wasted a lot of time hunting for a PG bug when it was my SuSE upgrade at fault.
-- Josh Berkus Aglio Database Solutions San Francisco
---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
Josh Berkus wrote: > I've noticed in recent testing that current versions of both RedHat > and SuSE default to encodings of "en_US.UTF-8" in the US. > Presumably they do something corresponding in other countries. > > I think this means that in the basic install-from-source > instructions, we need to warn people to compile with --locale='C'.
I think not.
-- Peter Eisentraut http://developer.postgresql.org/~petere/
---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Josh Berkus wrote: > Otherwise we're set for 12,000 e-mails to the SQL list on "why > doesn't this query use and index, it did on my old machine".
Linux distributions have been shipping with non-C locale settings for a long, long time, so that complaint would be invalid. And even back then we figured that recommending that people turn locale off was not an acceptable "solution".
-- Peter Eisentraut http://developer.postgresql.org/~petere/
---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings
> Linux distributions have been shipping with non-C locale settings for a > long, long time, so that complaint would be invalid.
In US distributions it's a recent thing. The switch to non-C locales is a recent thing; RH Enterprise 3.0, and SuSE 9.0. But I expect in Europe you've been using non-C locales for a while.
I'd like to have an explanation of this somewhere else newbies are liable to read it, *before* their first production "LIKE" query doesn't use an index. Where would be appropriate?
And, for English speakers, what exactly is wrong with using 'C' locale instead of the environment one?
-- --Josh
Josh Berkus Aglio Database Solutions San Francisco
---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
Josh Berkus wrote: > In US distributions it's a recent thing. The switch to non-C > locales is a recent thing; RH Enterprise 3.0, and SuSE 9.0.
I have it on record that Red Hat has set a non-C locale by default at least since Red Hat 6.1 as distributed in North America (aren't they the same anyway?) in 1999. I know that because we had this exact discussion back then.
> I'd like to have an explanation of this somewhere else newbies are > liable to read it, *before* their first production "LIKE" query > doesn't use an index. Where would be appropriate?
Near the documentation of "LIKE".
> And, for English speakers, what exactly is wrong with using 'C' > locale instead of the environment one?
It makes it difficult to write a résumé, to name one thing.
-- Peter Eisentraut http://developer.postgresql.org/~petere/
---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster
Peter Eisentraut writes: > Josh Berkus wrote: >> I'd like to have an explanation of this somewhere else newbies are >> liable to read it, *before* their first production "LIKE" query >> doesn't use an index. Where would be appropriate?
> Near the documentation of "LIKE".
I think it would be fair to mention this somewhere near the discussion of creating a database cluster, too. The existing documentation does warn you that sort order may be affected by your choice, but there is nothing anywhere near that section to suggest that LIKE performance might be affected. A para in the "Locale Support" section (in charset.sgml) would probably be appropriate, and maybe another word or two in the place that link to it in runtime.sgml and ref/initdb.sgml.
regards, tom lane
---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives?
Tom Lane wrote: > Peter Eisentraut writes: > > Josh Berkus wrote: > >> I'd like to have an explanation of this somewhere else newbies are > >> liable to read it, *before* their first production "LIKE" query > >> doesn't use an index. Where would be appropriate? > > > Near the documentation of "LIKE". > > I think it would be fair to mention this somewhere near the discussion > of creating a database cluster, too. The existing documentation does > warn you that sort order may be affected by your choice, but there is > nothing anywhere near that section to suggest that LIKE performance > might be affected. A para in the "Locale Support" section (in > charset.sgml) would probably be appropriate, and maybe another word or > two in the place that link to it in runtime.sgml and ref/initdb.sgml. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org >
-- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
! Locale support influences in particular the following features:
--- 189,198 ----
! Behavior>
! Locale support influences the following features:
*************** *** 204,209 **** --- 204,216 ----
+ The ability to use indexes with LIKE> clauses + LIKE>and locales> + + + + + The to_char> family of functions
*************** *** 211,219 ****
! The only severe drawback of using the locale support in ! PostgreSQL> is its speed. So use locales only if ! you actually need them.
--- 218,228 ----
! The drawback of using locales other than C> or ! POSIX> in PostgreSQL> is its performance ! impact. It slows character handling and prevents ordinary indexes ! from being used by LIKE>. For this reason use locales ! only if you actually need them.
Index: doc/src/sgml/runtime.sgml =================================================================== RCS file: /cvsroot/pgsql/doc/src/sgml/runtime.sgml,v retrieving revision 1.299 diff -c -c -r1.299 runtime.sgml *** doc/src/sgml/runtime.sgml 26 Dec 2004 23:06:56 -0000 1.299 --- doc/src/sgml/runtime.sgml 4 Jan 2005 00:02:56 -0000 *************** *** 144,152 **** that can be found in . The sort order used within a particular database cluster is set by initdb and cannot be changed later, short of ! dumping all data, rerunning initdb, and ! reloading the data. So it's important to make this choice correctly ! the first time.
--- 144,153 ---- that can be found in . The sort order used within a particular database cluster is set by initdb and cannot be changed later, short of ! dumping all data, rerunning initdb, and reloading ! the data. There is also a performance impact for using locales ! other than C> or POSIX>. Therefore, it is ! important to make this choice correctly the first time.
! initdb initializes the database cluster's ! default locale and character set encoding. Some locale categories ! are fixed for the lifetime of the cluster, so it is important to ! make the right choice when running initdb. ! Other locale categories can be changed later when the server is ! started. initdb will write those locale ! settings into the postgresql.conf ! configuration file so they are the default, but they can be changed ! by editing that file. To set the locale that ! initdb uses, see the description of the ! option. The character set encoding can be set separately for each database as it is created. initdb determines the encoding for the template1 database, which will serve as the ! default for all other databases. To alter the default encoding use the option.
--- 54,75 ----
! initdb initializes the database cluster's default ! locale and character set encoding. Some locale categories are fixed ! for the lifetime of the cluster. There is also a performance impact ! in using locales other than C> or POSIX>. ! Therefore it is important to make the right choice when running ! initdb. Other locale categories can be changed ! later when the server is started. initdb will ! write those locale settings into the ! postgresql.conf configuration file so they are ! the default, but they can be changed by editing that file. To set the ! locale that initdb uses, see the description of ! the option. The character set encoding can be set separately for each database as it is created. initdb determines the encoding for the template1 database, which will serve as the ! default for all other databases. To alter the default encoding use the option.
Bruce Momjian wrote: > I have applied the following patch to mention non-C locales affect > LIKE.
I don't think it's accurate to say that there is a performance impact. The only impact is that you need to create a different kind of index.
-- Peter Eisentraut http://developer.postgresql.org/~petere/
---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
Peter Eisentraut wrote: > Bruce Momjian wrote: > > I have applied the following patch to mention non-C locales affect > > LIKE. > > I don't think it's accurate to say that there is a performance impact. > The only impact is that you need to create a different kind of index.
Well, there is a performance impact to using the locale functions, and you can't use ordinary indexes for LIKE, just special ones. Is there better text I should use?
I used "performance impact" in the reference text to suggest there is an issue and they might want to read the locale section.
-- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly
On Monday 06 December 2004 13:39, Josh Berkus wrote: > I'd like to have an explanation of this somewhere else newbies are liable > to read it, *before* their first production "LIKE" query doesn't use an > index. Where would be appropriate? >
It's buried in the faq -> http://www.postgresql.org/docs/faqs/FAQ.html#4.8
-- Robert Treat Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings