Bug me not
I spent the greater part of yesterday putting together a patch for the unicode_internal bug I discovered in CPython. I had to make use of the dreadful-looking unicode_decode_call_errorhandler function in unicodeobject.c which takes 13 (!) arguments, mutating some of them in undocumented, non-obvious ways. This was one of those rare occasions where I had to print out source code on paper and read it out loud to myself to understand what’s going on.
After submitting the patch, it suddenly struck me that I had overlooked another very similar bug right under my nose (again, this is only on UCS-4 builds):
>>> from array import array
>>> array("u", "\x88\x88\x88\x88")
Segmentation fault
D’oh. Now I’ll have to fix that one, too …
On a related note: If unicode is still a mystery to you, AMK has just posted an excellent Unicode HOWTO geared towards Python.