s***@gmail.com
2013-03-27 10:32:57 UTC
The following bug has been logged on the website:
Bug reference: 7999
Logged by: david
Email address: ***@gmail.com
PostgreSQL version: 9.1.8
Operating system: linux
Description:
\y and \Y do not behave correctly next to
multibyte utf-8 characters - they seem to invert their sensesː
Propper behaivour with ascii e
'es'~$$\y[eɛ]s$$ => t
Inverted behaviour with epsilon
'ɛs'~$$\y[eɛ]s$$ => f
'ɛs'~$$[eɛ]\ys$$ => t
'ɛs'~$$[eɛ]\Ys$$ => f
This seems to be a case of utf8 characters not being recognised as
word-forming:
'ɛ'~$$\w'$$ => f
I've checked with a few other characters which are >1byte in utf8. U+00F0
counds as \w, but nothing I've tried > FF matches. I wonder if it's
something to do with >256?
In case anyone else hits this bug, replacing \y with
(^|$|\s|[[:punct:]]) seems to work for me, although it's ugly.
Bug reference: 7999
Logged by: david
Email address: ***@gmail.com
PostgreSQL version: 9.1.8
Operating system: linux
Description:
\y and \Y do not behave correctly next to
multibyte utf-8 characters - they seem to invert their sensesː
Propper behaivour with ascii e
'es'~$$\y[eɛ]s$$ => t
Inverted behaviour with epsilon
'ɛs'~$$\y[eɛ]s$$ => f
'ɛs'~$$[eɛ]\ys$$ => t
'ɛs'~$$[eɛ]\Ys$$ => f
This seems to be a case of utf8 characters not being recognised as
word-forming:
'ɛ'~$$\w'$$ => f
I've checked with a few other characters which are >1byte in utf8. U+00F0
counds as \w, but nothing I've tried > FF matches. I wonder if it's
something to do with >256?
In case anyone else hits this bug, replacing \y with
(^|$|\s|[[:punct:]]) seems to work for me, although it's ugly.
--
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs