Bug me not

I spent the greater part of yesterday putting together a patch for the unicode_internal bug I discovered in CPython. I had to make use of the dreadful-looking unicode_decode_call_errorhandler function in unicodeobject.c which takes 13 (!) arguments, mutating some of them in undocumented, non-obvious ways. This was one of those rare occasions where I had to print out source code on paper and read it out loud to myself to understand what’s going on.

After submitting the patch, it suddenly struck me that I had overlooked another very similar bug right under my nose (again, this is only on UCS-4 builds):

>>> from array import array
>>> array("u", "\x88\x88\x88\x88")
Segmentation fault

D’oh. Now I’ll have to fix that one, too …

On a related note: If unicode is still a mystery to you, AMK has just posted an excellent Unicode HOWTO geared towards Python.

Comments are closed.