Item 804 of 881 Previous | Next

1
Vote

8-bit strings can't contain characters > 0x80

description

In the latest version of IronPython (2.0A6), I noticed some weird behaviour with 8-bit strings:

IronPython console: IronPython 2.0A6 (2.0.11102.00) on .NET 2.0.50727.1378
Copyright (c) Microsoft Corporation. All rights reserved.
>>> str("\x7e")
'~'
>>> str("\x7f")
u'\x7f'
>>> str("\x80")
u'\x80'
>>> str("\x81")
Traceback (most recent call last):
File , line 0, in ##23
File mscorlib, line unknown, in GetString
File mscorlib, line unknown, in GetChars
File mscorlib, line unknown, in Fallback
File mscorlib, line unknown, in Throw
UnicodeDecodeError: Unable to translate bytes [81] at index 0 from
specified code page to Unicode.

This appears to be a bug in IronPython, since the CPython interpreter will allow 8-bit strings to contain characters all the way up to 0xFF.

No files are attached

comments

slide_o_mix wrote Feb 9 at 6:53 PM

It's failing here:

IronPython.Runtime.PythonAsciiEncoding.GetCharCount(Byte[] bytes, Int32 index, Int32 count)

There is a check if the byte value is > 0x7f