Discussion:
BUG #7913: TO_CHAR Function & Turkish collate
(too old to reply)
a***@hotmail.com
2013-03-02 12:46:14 UTC
Permalink
The following bug has been logged on the website:

Bug reference: 7913
Logged by: TO_CHAR Function & Turkish collate
Email address: ***@hotmail.com
PostgreSQL version: 9.2.0
Operating system: Linux
Description:

prod=# SELECT TO_CHAR('2013-03-01'::date,'DAY');
to_char
----------
FRİDAY
(1 row)
But it must return as FRIDAY.
Our database lc_collate is tr_TR.UTF-8 and encoding is UTF8.

Best regards,
Adnan DURSUN
Ankar/TURKEY
--
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Tom Lane
2013-03-03 15:42:59 UTC
Permalink
Post by a***@hotmail.com
prod=# SELECT TO_CHAR('2013-03-01'::date,'DAY');
to_char
----------
FRÄ°DAY
(1 row)
But it must return as FRIDAY.
Our database lc_collate is tr_TR.UTF-8 and encoding is UTF8.
It looks like the cause of this is that the result is computed as
str_toupper("Friday"), and str_toupper() applies a collation-sensitive
upcasing rule.

I think the use of str_toupper() is appropriate when processing the
locale-specific string for a TMDAY specification; but plain DAY is not
supposed to be locale-dependent, so we probably should use an ASCII-only
upcasing rule in the non-TM code path.

Anybody have an opinion on whether to back-patch such a fix? It seems
conceivable that somebody out there is relying on the current behavior.
OTOH, I believe that only Turkish UTF8 locales exhibit this behavior
(the single-byte-encoding code path in str_toupper acts differently for
historical reasons). So it's pretty inconsistent as it stands.

regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Peter Eisentraut
2013-03-04 02:28:52 UTC
Permalink
Post by Tom Lane
I think the use of str_toupper() is appropriate when processing the
locale-specific string for a TMDAY specification; but plain DAY is not
supposed to be locale-dependent, so we probably should use an
ASCII-only upcasing rule in the non-TM code path.
Agreed.
Post by Tom Lane
Anybody have an opinion on whether to back-patch such a fix?
I think it's a bug that should be backpatched.
--
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Tom Lane
2013-03-05 18:08:09 UTC
Permalink
Post by Peter Eisentraut
Post by Tom Lane
Anybody have an opinion on whether to back-patch such a fix?
I think it's a bug that should be backpatched.
Done. In addition to day/month names, I found that there were
case-folding hazards for timezone abbreviations ('tz' format)
and Roman numerals for numbers ('rn' format) ... though, curiously,
not for Roman numerals for months.

regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Devrim GÜNDÜZ
2013-03-14 18:56:47 UTC
Permalink
Hi,
Post by Tom Lane
Post by Peter Eisentraut
I think it's a bug that should be backpatched.
Done. In addition to day/month names, I found that there were
case-folding hazards for timezone abbreviations ('tz' format)
and Roman numerals for numbers ('rn' format) ... though, curiously,
not for Roman numerals for months.
Thanks!

Regards,
--
Devrim GÜNDÜZ
Principal Systems Engineer @ EnterpriseDB: http://www.enterprisedb.com
PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer
Community: devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
http://www.gunduz.org Twitter: http://twitter.com/devrimgunduz
Euler Taveira
2013-03-05 04:30:03 UTC
Permalink
Post by Tom Lane
Anybody have an opinion on whether to back-patch such a fix? It seems
conceivable that somebody out there is relying on the current behavior.
OTOH, I believe that only Turkish UTF8 locales exhibit this behavior
(the single-byte-encoding code path in str_toupper acts differently for
historical reasons). So it's pretty inconsistent as it stands.
Nope. I'm not aware of the Turkish weird rules. Mea culpa. :(

As you suggested, s/str_toupper/pg_toupper/ in the else block (no TM) is the
right fix. I'm not aware of another locale that would break if we apply such a
change in a stable branch. Are you want me to post a fix?
--
Euler Taveira de Oliveira - Timbira http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
--
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Tom Lane
2013-03-05 15:06:44 UTC
Permalink
Post by Euler Taveira
As you suggested, s/str_toupper/pg_toupper/ in the else block (no TM) is the
right fix. I'm not aware of another locale that would break if we apply such a
change in a stable branch. Are you want me to post a fix?
Thanks, but I have a fix mostly written already.

regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Loading...