knowledge-database (beta)

Current group: pgsql.docs

Doc patch needed: encodings?

Doc patch needed: encodings?  
Josh Berkus
 Re: Doc patch needed: encodings?  
Peter Eisentraut
 Re: Doc patch needed: encodings?  
Josh Berkus
 Re: Doc patch needed: encodings?  
Peter Eisentraut
 Re: Doc patch needed: encodings?  
Josh Berkus
 Re: Doc patch needed: encodings?  
Peter Eisentraut
 Re: Doc patch needed: encodings?  
Tom Lane
 Re: Doc patch needed: encodings?  
Bruce Momjian
 Re: Doc patch needed: encodings?  
Peter Eisentraut
 Re: Doc patch needed: encodings?  
Bruce Momjian
 Re: Doc patch needed: encodings?  
Robert Treat
From:Josh Berkus
Subject:Doc patch needed: encodings?
Date:Sun, 5 Dec 2004 15:46:53 -0800
Folks,

I've noticed in recent testing that current versions of both RedHat and SuSE
default to encodings of "en_US.UTF-8" in the US. Presumably they do
something corresponding in other countries.

I think this means that in the basic install-from-source instructions, we need
to warn people to compile with --locale='C'. I know it caught me by
surprise when indexing for LIKE stopped working, and I wasted a lot of time
hunting for a PG bug when it was my SuSE upgrade at fault.

--
Josh Berkus
Aglio Database Solutions
San Francisco

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
From:Peter Eisentraut
Subject:Re: Doc patch needed: encodings?
Date:Mon, 6 Dec 2004 02:00:51 +0100
Josh Berkus wrote:
> I've noticed in recent testing that current versions of both RedHat
> and SuSE default to encodings of "en_US.UTF-8" in the US.
> Presumably they do something corresponding in other countries.
>
> I think this means that in the basic install-from-source
> instructions, we need to warn people to compile with --locale='C'.

I think not.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match
From:Josh Berkus
Subject:Re: Doc patch needed: encodings?
Date:Mon, 6 Dec 2004 09:23:02 -0800
Peter,

> > I think this means that in the basic install-from-source
> > instructions, we need to warn people to compile with --locale='C'.  
>
> I think not.

Care to explain that? Why shouldn't we warn people?

Otherwise we're set for 12,000 e-mails to the SQL list on "why doesn't this
query use and index, it did on my old machine".

--
Josh Berkus
Aglio Database Solutions
San Francisco

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings
From:Peter Eisentraut
Subject:Re: Doc patch needed: encodings?
Date:Mon, 6 Dec 2004 19:19:21 +0100
Josh Berkus wrote:
> Otherwise we're set for 12,000 e-mails to the SQL list on "why
> doesn't this query use and index, it did on my old machine".

Linux distributions have been shipping with non-C locale settings for a
long, long time, so that complaint would be invalid. And even back
then we figured that recommending that people turn locale off was not
an acceptable "solution".

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings
From:Josh Berkus
Subject:Re: Doc patch needed: encodings?
Date:Mon, 6 Dec 2004 10:39:17 -0800
Peter,

> Linux distributions have been shipping with non-C locale settings for a
> long, long time, so that complaint would be invalid.

In US distributions it's a recent thing. The switch to non-C locales is a
recent thing; RH Enterprise 3.0, and SuSE 9.0. But I expect in Europe you've
been using non-C locales for a while.

I'd like to have an explanation of this somewhere else newbies are liable to
read it, *before* their first production "LIKE" query doesn't use an index.
Where would be appropriate?

And, for English speakers, what exactly is wrong with using 'C' locale instead
of the environment one?

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
From:Peter Eisentraut
Subject:Re: Doc patch needed: encodings?
Date:Mon, 6 Dec 2004 20:02:43 +0100
Josh Berkus wrote:
> In US distributions it's a recent thing. The switch to non-C
> locales is a recent thing; RH Enterprise 3.0, and SuSE 9.0.

I have it on record that Red Hat has set a non-C locale by default at
least since Red Hat 6.1 as distributed in North America (aren't they
the same anyway?) in 1999. I know that because we had this exact
discussion back then.

> I'd like to have an explanation of this somewhere else newbies are
> liable to read it, *before* their first production "LIKE" query
> doesn't use an index. Where would be appropriate?

Near the documentation of "LIKE".

> And, for English speakers, what exactly is wrong with using 'C'
> locale instead of the environment one?

It makes it difficult to write a résumé, to name one thing.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster
From:Tom Lane
Subject:Re: Doc patch needed: encodings?
Date:Mon, 06 Dec 2004 14:13:28 -0500
Peter Eisentraut writes:
> Josh Berkus wrote:
>> I'd like to have an explanation of this somewhere else newbies are
>> liable to read it, *before* their first production "LIKE" query
>> doesn't use an index. Where would be appropriate?

> Near the documentation of "LIKE".

I think it would be fair to mention this somewhere near the discussion
of creating a database cluster, too. The existing documentation does
warn you that sort order may be affected by your choice, but there is
nothing anywhere near that section to suggest that LIKE performance
might be affected. A para in the "Locale Support" section (in
charset.sgml) would probably be appropriate, and maybe another word or
two in the place that link to it in runtime.sgml and ref/initdb.sgml.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org
From:Bruce Momjian
Subject:Re: Doc patch needed: encodings?
Date:Mon, 3 Jan 2005 19:05:38 -0500 (EST)
--ELM1104797138-1052-0_
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=US-ASCII


I have applied the following patch to mention non-C locales affect LIKE.

---------------------------------------------------------------------------

Tom Lane wrote:
> Peter Eisentraut writes:
> > Josh Berkus wrote:
> >> I'd like to have an explanation of this somewhere else newbies are
> >> liable to read it, *before* their first production "LIKE" query
> >> doesn't use an index. Where would be appropriate?
>
> > Near the documentation of "LIKE".
>
> I think it would be fair to mention this somewhere near the discussion
> of creating a database cluster, too. The existing documentation does
> warn you that sort order may be affected by your choice, but there is
> nothing anywhere near that section to suggest that LIKE performance
> might be affected. A para in the "Locale Support" section (in
> charset.sgml) would probably be appropriate, and maybe another word or
> two in the place that link to it in runtime.sgml and ref/initdb.sgml.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

--ELM1104797138-1052-0_
Content-Transfer-Encoding: 7bit
Content-Type: text/plain
Content-Disposition: inline; filename="/bjm/diff"

Index: doc/src/sgml/charset.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/charset.sgml,v
retrieving revision 2.47
diff -c -c -r2.47 charset.sgml
*** doc/src/sgml/charset.sgml 27 Dec 2004 22:30:10 -0000 2.47
--- doc/src/sgml/charset.sgml 4 Jan 2005 00:02:40 -0000
***************
*** 189,198 ****



! Benefits</><br> <br> <para><br>! Locale support influences in particular the following features:<br> <br> <itemizedlist><br> <listitem><br>--- 189,198 ----<br> </sect2><br> <br> <sect2><br>! <title>Behavior</><br> <br> <para><br>! Locale support influences the following features:<br> <br> <itemizedlist><br> <listitem><br>***************<br>*** 204,209 ****<br>--- 204,216 ----<br> <br> <listitem><br> <para><br>+ The ability to use indexes with <literal>LIKE</> clauses<br>+ <indexterm><primary>LIKE</><secondary>and locales</></indexterm><br>+ </para><br>+ </listitem><br>+ <br>+ <listitem><br>+ <para><br> The <function>to_char</> family of functions<br> </para><br> </listitem><br>***************<br>*** 211,219 ****<br> </para><br> <br> <para><br>! The only severe drawback of using the locale support in<br>! <productname>PostgreSQL</> is its speed. So use locales only if<br>! you actually need them.<br> </para><br> </sect2><br> <br>--- 218,228 ----<br> </para><br> <br> <para><br>! The drawback of using locales other than <literal>C</> or<br>! <literal>POSIX</> in <productname>PostgreSQL</> is its performance<br>! impact. It slows character handling and prevents ordinary indexes<br>! from being used by <literal>LIKE</>. For this reason use locales<br>! only if you actually need them.<br> </para><br> </sect2><br> <br>Index: doc/src/sgml/runtime.sgml<br>===================================================================<br>RCS file: /cvsroot/pgsql/doc/src/sgml/runtime.sgml,v<br>retrieving revision 1.299<br>diff -c -c -r1.299 runtime.sgml<br>*** doc/src/sgml/runtime.sgml 26 Dec 2004 23:06:56 -0000 1.299<br>--- doc/src/sgml/runtime.sgml 4 Jan 2005 00:02:56 -0000<br>***************<br>*** 144,152 ****<br> that can be found in <xref linkend="locale">. The sort order used<br> within a particular database cluster is set by<br> <command>initdb</command> and cannot be changed later, short of<br>! dumping all data, rerunning <command>initdb</command>, and<br>! reloading the data. So it's important to make this choice correctly<br>! the first time.<br> </para><br> </sect1><br> <br>--- 144,153 ----<br> that can be found in <xref linkend="locale">. The sort order used<br> within a particular database cluster is set by<br> <command>initdb</command> and cannot be changed later, short of<br>! dumping all data, rerunning <command>initdb</command>, and reloading<br>! the data. There is also a performance impact for using locales<br>! other than <literal>C</> or <literal>POSIX</>. Therefore, it is<br>! important to make this choice correctly the first time.<br> </para><br> </sect1><br> <br>Index: doc/src/sgml/ref/initdb.sgml<br>===================================================================<br>RCS file: /cvsroot/pgsql/doc/src/sgml/ref/initdb.sgml,v<br>retrieving revision 1.32<br>diff -c -c -r1.32 initdb.sgml<br>*** doc/src/sgml/ref/initdb.sgml 1 Aug 2004 06:19:18 -0000 1.32<br>--- doc/src/sgml/ref/initdb.sgml 4 Jan 2005 00:02:57 -0000<br>***************<br>*** 54,74 ****<br> </para><br> <br> <para><br>! <command>initdb</command> initializes the database cluster's<br>! default locale and character set encoding. Some locale categories<br>! are fixed for the lifetime of the cluster, so it is important to<br>! make the right choice when running <command>initdb</command>.<br>! Other locale categories can be changed later when the server is<br>! started. <command>initdb</command> will write those locale<br>! settings into the <filename>postgresql.conf</filename><br>! configuration file so they are the default, but they can be changed<br>! by editing that file. To set the locale that<br>! <command>initdb</command> uses, see the description of the<br>! <option>--locale</option> option. The character set encoding can<br> be set separately for each database as it is created.<br> <command>initdb</command> determines the encoding for the<br> <literal>template1</literal> database, which will serve as the<br>! default for all other databases. To alter the default encoding use<br> the <option>--encoding</option> option.<br> </para><br> <br>--- 54,75 ----<br> </para><br> <br> <para><br>! <command>initdb</command> initializes the database cluster's default<br>! locale and character set encoding. Some locale categories are fixed<br>! for the lifetime of the cluster. There is also a performance impact<br>! in using locales other than <literal>C</> or <literal>POSIX</>.<br>! Therefore it is important to make the right choice when running<br>! <command>initdb</command>. Other locale categories can be changed<br>! later when the server is started. <command>initdb</command> will<br>! write those locale settings into the<br>! <filename>postgresql.conf</filename> configuration file so they are<br>! the default, but they can be changed by editing that file. To set the<br>! locale that <command>initdb</command> uses, see the description of<br>! the <option>--locale</option> option. The character set encoding can<br> be set separately for each database as it is created.<br> <command>initdb</command> determines the encoding for the<br> <literal>template1</literal> database, which will serve as the<br>! default for all other databases. To alter the default encoding use<br> the <option>--encoding</option> option.<br> </para><br> <br><br>--ELM1104797138-1052-0_<br>Content-Type: text/plain<br>Content-Disposition: inline<br>Content-Transfer-Encoding: 8bit<br>MIME-Version: 1.0<br><br><br>---------------------------(end of broadcast)---------------------------<br>TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org<br><br>--ELM1104797138-1052-0_--<br></td></tr></table></td></tr><tr><td> <table cellpadding=0 cellspacing=0 class=txt width=100%><tr><td><img src=/0.gif width=320 height=1></td><td bgcolor=#e0e0e0 width=100%><table class=txt><tr><td><a name=a8><b>From:</b></a></td><td><b>Peter Eisentraut</b></td></tr><tr><td><b>Subject:</b></td><td><b>Re: Doc patch needed: encodings?</b></td></tr><tr><td><b>Date:</b></td><td><b>Tue, 4 Jan 2005 01:12:31 +0100</b></td></tr></table></tr></table></td></tr><tr><td> <table cellpadding=0 cellspacing=0 class=txt><tr><td><img src=http:/0.gif width=320 height=1></td><td>Bruce Momjian wrote:<br>> I have applied the following patch to mention non-C locales affect<br>> LIKE.<br><br>I don't think it's accurate to say that there is a performance impact. <br>The only impact is that you need to create a different kind of index.<br><br>-- <br>Peter Eisentraut<br>http://developer.postgresql.org/~petere/<br><br>---------------------------(end of broadcast)---------------------------<br>TIP 2: you can get off all lists at once with the unregister command<br> (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)<br></td></tr></table></td></tr><tr><td> <table cellpadding=0 cellspacing=0 class=txt width=100%><tr><td><img src=/0.gif width=360 height=1></td><td bgcolor=#e0e0e0 width=100%><table class=txt><tr><td><a name=a9><b>From:</b></a></td><td><b>Bruce Momjian</b></td></tr><tr><td><b>Subject:</b></td><td><b>Re: Doc patch needed: encodings?</b></td></tr><tr><td><b>Date:</b></td><td><b>Mon, 3 Jan 2005 19:16:39 -0500 (EST)</b></td></tr></table></tr></table></td></tr><tr><td> <table cellpadding=0 cellspacing=0 class=txt><tr><td><img src=http:/0.gif width=360 height=1></td><td>Peter Eisentraut wrote:<br>> Bruce Momjian wrote:<br>> > I have applied the following patch to mention non-C locales affect<br>> > LIKE.<br>> <br>> I don't think it's accurate to say that there is a performance impact. <br>> The only impact is that you need to create a different kind of index.<br><br>Well, there is a performance impact to using the locale functions, and<br>you can't use ordinary indexes for LIKE, just special ones. Is there<br>better text I should use?<br><br>I used "performance impact" in the reference text to suggest there is an<br>issue and they might want to read the locale section.<br><br>-- <br> Bruce Momjian | http://candle.pha.pa.us<br> pgman@candle.pha.pa.us | (610) 359-1001<br> + If your life is a hard drive, | 13 Roberts Road<br> + Christ can be your backup. | Newtown Square, Pennsylvania 19073<br><br>---------------------------(end of broadcast)---------------------------<br>TIP 3: if posting/reading through Usenet, please send an appropriate<br> subscribe-nomail command to majordomo@postgresql.org so that your<br> message can get through to the mailing list cleanly<br></td></tr></table></td></tr><tr><td> <table cellpadding=0 cellspacing=0 class=txt width=100%><tr><td><img src=/0.gif width=200 height=1></td><td bgcolor=#e0e0e0 width=100%><table class=txt><tr><td><a name=a10><b>From:</b></a></td><td><b>Robert Treat</b></td></tr><tr><td><b>Subject:</b></td><td><b>Re: Doc patch needed: encodings?</b></td></tr><tr><td><b>Date:</b></td><td><b>Mon, 6 Dec 2004 13:48:53 -0500</b></td></tr></table></tr></table></td></tr><tr><td> <table cellpadding=0 cellspacing=0 class=txt><tr><td><img src=http:/0.gif width=200 height=1></td><td>On Monday 06 December 2004 13:39, Josh Berkus wrote:<br>> I'd like to have an explanation of this somewhere else newbies are liable<br>> to read it, *before* their first production "LIKE" query doesn't use an<br>> index. Where would be appropriate?<br>><br><br>It's buried in the faq -> http://www.postgresql.org/docs/faqs/FAQ.html#4.8<br><br>-- <br>Robert Treat<br>Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL<br><br>---------------------------(end of broadcast)---------------------------<br>TIP 7: don't forget to increase your free space map settings<br></td></tr></table></td></tr></table></td></tr></table></td><td>  </td><td valign=top align=right> <!-- google adsense --></td></tr></table><br> </td> <td><img src=0.gif width=1 height=300></td> </tr> </table> <table cellpadding=0 cellspacing=0><tr><td><img src=/0.gif height=3></td></tr></table> <table cellpadding=0 cellspacing=0 border=0 bgcolor=#a0a000 width=100%><tr><td><img src=/0.gif></td></tr></table> <table class=stxt width=100% border=0 bgcolor=#f4f5c2 cellspacing=0> <tr><td> Copyright © 2006 knowledge-database   -   All rights reserved </td></tr> </table> <table cellpadding=0 cellspacing=0 border=0 bgcolor=#a0a000 width=100%><tr><td><img src=/0.gif></td></tr></table> </body> </html>