1

Closed

Lower casing non-ASCII characters does not work

description

Interestingly upper() works but lower() doesn't. Tested with some Scandinavian letters but not with other scripts.
 
To reproduce:
 
IronPython 2.7.3 (2.7.0.40) on .NET 4.0.30319.269 (32-bit)
Type "help", "copyright", "credits" or "license" for more information.
a = u'\xe4' # Character 'ä'
A = a.upper()
A
u'\xc4' # Correctly produces 'Ä'
a.islower()
True
A.islower()
False
A.lower()
u'\xc4' # Wrong! Still 'Ä'
A.lower() == a
False
A.lower() == A
True
Closed by

comments

pekkaklarck wrote Sep 19, 2012 at 9:28 AM

Very interestingly swapcase() works:
a = u'\xe4'
A = a.upper()
A.swapcase() == a
True
a.swapcase() == A
True
This allowed me to create the following workaround function:
def lower(string):
    return ''.join(c if not c.isupper() else c.swapcase() for c in string)

pekkaklarck wrote Sep 19, 2012 at 10:02 AM

My workaround function was unfortunately slow but could luckily be optimized in common case (including when not running on IronPython). Here's the final code:


if sys.platform != 'cli':
def lower(string):
    return string.lower()
else:
def lower(string):
    if string.islower():
        return string
    if not _has_non_ascii_chars(string):
        return string.lower()
    return ''.join(c if not c.isupper() else c.swapcase() for c in string)

def _has_non_ascii_chars(string):
    for c in string:
        if c >= u'\x80':
            return True
    return False

paweljasinski wrote Dec 31, 2012 at 2:23 PM

Given what I have seen in the implementation, there is another workaround: put extra ASCII upper case character in the string to be converted and take it out after it is done.
("A"+"Ä").lower()[1:]
u'\xe4'

jdhardy wrote May 11 at 6:23 AM

Fixed in c02becc.