[vtk-developers] Python 3 and unicode

David Gobbi david.gobbi at gmail.com
Mon Aug 24 11:42:36 EDT 2015


On Mon, Aug 24, 2015 at 9:10 AM, Ben Boeckel <ben.boeckel at kitware.com>
wrote:

> On Mon, Aug 24, 2015 at 08:48:54 -0600, David Gobbi wrote:
> > On Windows, kwsys and most IO classes still do filesystem operations
> using
> > the local 8-bit encoding.  Which is silly, I know, considering that
> Windows
> > provided unicode APIs 22 years ago. But I don't see anyone volunteering
> to
> > fix this, which means that, right now, some people will have to use
> > encodings other than utf-8 for their filenames.
>
> FWIW, POSIX filenames are arbitrary bytestrings, so it's not
> *technically* different there either. And there's no indication or place
> to query what encoding should be used either since it's more a property
> of the one who wrote the file than the one who is reading it (so LANG
> might not help).
>

It's quite different.  At the file-system level, Windows always uses UTF-16.
So on Windows, if you always use the unicode (wide char) APIs, then as a
developer you are always safe.  The user can't screw you up by switching
the code page.


> In any case, won't just handing over raw bytes for invalid utf-8
> sequences be fine at that point (if we tried to normalize, I could see a
> problem though)?


That's my gut feeling.  And we won't normalize (in the wrappers) because
that would screw things up.  Thankfully, the python encoders/decoders don't
normalize, either, so round-trip conversions between utf-8 and unicode
works.

Normalization can definitely be an issue with filenames, however, because
1) OS X uses NFD, 2) Windows prefers NFC, 3) POSIX doesn't care.
I was quite surprised the first time I did a non-ascii directory listing in
OS X, and the accented characters were decomposed.

 - David
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://public.kitware.com/pipermail/vtk-developers/attachments/20150824/d8452149/attachment.html>


More information about the vtk-developers mailing list