[vtkusers] VTK and Unicode

David Gobbi david.gobbi at gmail.com
Fri Jun 6 14:34:47 EDT 2014


Using short filenames is a clever work-around.

When you allocate wchFilename, use length + 1, or else there is
no room for the terminating null character that you are appending.




On Fri, Jun 6, 2014 at 11:56 AM, Maarten Beek <beekmaarten at yahoo.com> wrote:
> For completeness:
> The following code enables me to load a file with a non-Canadian character
> into vtkImageReader2 on a Canadian computer. Except the last line needed to
> be commented out to avoid a crash, but this might be a Qt issue.... Of
> course this code is not multi-platform and it won't work if the short path
> feature is turned off for the hard drive.
>
>
> QString qFilename = <bla-bla>;
>
> wchar_t* wchFilename = new wchar_t[qFilename.length()];
> int len = qFilename.toWCharArray(wchFilename);
> wchFilename[len2] = '\0';
>
> ulong len2 = GetShortPathNameW(wchFilename, NULL, 0);
> wchar_t* wchShortFilename = new wchar_t[len2];
> GetShortPathNameW(wchFilename, wchShortFilename, len2);
>
> int len3 = WideCharToMultiByte(CP_ACP, 0, wchShortFilename,
> lstrlenW(wchShortFileName)+1, NULL, 0, NULL, NULL);
> char* chFilename = new char[len3];
> WideCharToMultiByte(CP_ACP, 0, wchShortFilename,
> lstrlenW(wchShortFilename)+1, chFilename, len3, NULL, NULL);
>
> loadFile(chFilename);
>
> delete [] chFilename;
> delete [] wchShortFilename;
> //delete [] wchFilename; // crash
>
>
> Maarten
>
>
> On Friday, June 6, 2014 1:15:09 PM, David Gobbi <david.gobbi at gmail.com>
> wrote:
>
>
> That's a pretty good summary of the situation. Since
> there is no option in VTK to use unicode filenames,
> you're at the mercy of whatever codepage is active
> on the user's computer.
>
>   David
>
>
> On Fri, Jun 6, 2014 at 11:03 AM, Maarten Beek <beekmaarten at yahoo.com> wrote:
>> So I program I make in Canada, would work normally in China? However as
>> soon
>> as I use Chinese characters or the user in China uses Russian characters,
>> things will break in Windows?
>>
>> So before I try to read a file, I'll have to guess the code page first?
>> In other words, apply WideCharToMultiByte() with different codepages until
>> I
>> can successfully read the file?
>>
>> Now I have to figure out how to set a codepage...
>>
>> Maarten
>>
>>
>>
>> On Friday, June 6, 2014 12:30:15 PM, David Gobbi <david.gobbi at gmail.com>
>> wrote:
>>
>>
>> Well, in China people use the Chinese edition of Windows which defaults
>> to a Chinese codepage... so everything works fine for them as long as
>> they stick to Chinese.
>>
>> Things only get complicated when you try to mix several languages on the
>> same computer.  Working with one language in isolation is never a problem.
>>
>> OS X was very smart to standardize of utf-8.  Linux has pretty much done
>> the same.  But Windows still uses a different 8-bit encoding for every
>> language.
>>
>>  David
>>
>>
>> On Fri, Jun 6, 2014 at 10:08 AM, Maarten Beek <beekmaarten at yahoo.com>
>> wrote:
>>> Hi David,
>>>
>>> You're the only one replying to this thread and as long as you give me
>>> new
>>> suggestions I keep poking ;-)
>>>
>>> I started using WideCharToMultiByte() directly as well, but this returns
>>> a
>>> '?' on the spot of my Chinese character. VtkImageReader2 (and I think all
>>> other readers as well) cannot find a file with this file name.
>>>
>>> How do people use VTK in China (or Russia, Greece...)?
>>>
>>> Maarten
>>>
>>>
>>> On Friday, June 6, 2014 11:41:22 AM, David Gobbi <david.gobbi at gmail.com>
>>> wrote:
>>>
>>>
>>> Hi Maarten,
>>>
>>> I've never used ToNarrow() myself, so I think you're relying a bit too
>>> much on my expertise ;)
>>>
>>> However, I have used WideCharToMultiByte() directly, and if given
>>> CP_UTF8 as the first argument, it converts to UTF8.  It simply does
>>> the conversion according to the codepage passed in the first argument.
>>>
>>> - David
>>>
>>> On Fri, Jun 6, 2014 at 9:14 AM, Maarten Beek <beekmaarten at yahoo.com>
>>> wrote:
>>>> Hi David,
>>>>
>>>> After copying the ushort string into a wchar_t string, ::ToNarrow()
>>>> would
>>>> return nonsense.
>>>>
>>>> I gave the variable KWSYS_ENCODING_DEFAULT_CODEPAGE the value CP_UTF8
>>>> (defaults to CP_ACP in the vtksys CMakeLists.txt if nothing is defined)
>>>> by
>>>> using the add entry button in CMake.
>>>> After rebuilding VTK, this didn't solve the problem.
>>>>
>>>> Would the codepage setting of the computer have to be set to CP_UTF8 at
>>>> the
>>>> moment of file creation for this to work? Although, this wouldn't affect
>>>> my
>>>> problem of getting the correct char* into the SetFileName() function.
>>>> Would
>>>> it?
>>>>
>>>> Maarten
>>>>
>>>>
>>>> On Thursday, June 5, 2014 6:45:38 PM, David Gobbi
>>>> <david.gobbi at gmail.com>
>>>> wrote:
>>>>
>>>>
>>>> Hi Maarten,
>>>>
>>>> Yes, vtksys::Encoding::ToNarrow() requires a wchart_t *.  If you like to
>>>> live
>>>> dangerously, you could just use a reinterpret_cast to convert your
>>>> ushort
>>>> *
>>>> to a wchar_t *, but to be safe you should allocate a wchar_t string of
>>>> the
>>>> right size and then copy your ushort string into it.
>>>>
>>>> VTK doesn't have any cmake options related to unicode.  All file-level
>>>> stuff
>>>> in VTK always uses 8-bit strings no matter what build options you
>>>> choose.
>>>>
>>>>  David
>>>>
>>>>
>>>> On Thu, Jun 5, 2014 at 4:20 PM, Maarten Beek <beekmaarten at yahoo.com>
>>>> wrote:
>>>>> It's complaining about the ToNarrow() function.
>>>>> (... symbol "__declspec(dllimport) <bla-bla> __cdecl
>>>>> vtksys::Encoding::ToNarrow(<bla-bla>) referenced in function <bla-bla>
>>>>> )
>>>>>
>>>>> I got the VS solution from svn before my unicode changes and added the
>>>>> ToNarrow() function and everything builds fine.
>>>>> I guess if I build my app with unicode, I should do this with VTK as
>>>>> well
>>>>> (how? don't see such an option in cmake...).
>>>>>
>>>>>
>>>>> However the app crashes in Debug, but not in Release.
>>>>> In another app, I need to set wchar_t as a build-in type to avoid the
>>>>> unresolved external symbol error, but then an ushort* (returned by
>>>>> QString::utf16()) cannot be cast to a wchar_t*... I guess
>>>>> ::ToNarrow(const
>>>>> wchar_t*) needs a ::ToNarrow(const ushort*) overload?
>>>>>
>>>>> Maarten
>>>>>
>>>>>
>>>>>
>>>>> On Thursday, June 5, 2014 5:03:56 PM, David Gobbi
>>>>> <david.gobbi at gmail.com>
>>>>> wrote:
>>>>>
>>>>>
>>>>> Hi Maarten,
>>>>>
>>>>> I use the functions in vtksys all the time.  Not specifically the
>>>>> Encoding functions, but they shouldn't be any different.
>>>>>
>>>>> You said it reported an unresolved symbol error... what symbol
>>>>> was it complaining about?
>>>>>
>>>>>  David
>>>>>
>>>>>
>>>>> On Thu, Jun 5, 2014 at 2:56 PM, Maarten Beek <beekmaarten at yahoo.com>
>>>>> wrote:
>>>>>> I am linking my app with vtksys.lib.
>>>>>>
>>>>>> But your answer also tells me I can use this function in my app.
>>>>>>
>>>>>> So it must be something else... maybe the fact that my vtk is built
>>>>>> with
>>>>>> multibyte and the app with unicode?
>>>>>> Should find a function in vtksys that is unrelated to this to test.
>>>>>>
>>>>>> Maarten
>>>>>>
>>>>>>
>>>>>> On Thursday, June 5, 2014 4:31:20 PM, David Gobbi
>>>>>> <david.gobbi at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> An unresolved symbol just means that you have to link to the library,
>>>>>> e.g. if you have a target_link_libraries() call in your
>>>>>> CMakeLists.txt,
>>>>>> make sure that "vtksys" is listed.
>>>>>>
>>>>>> On Thu, Jun 5, 2014 at 2:20 PM, Maarten Beek <beekmaarten at yahoo.com>
>>>>>> wrote:
>>>>>>> Hi David,
>>>>>>>
>>>>>>> Thanks for the quick reply.
>>>>>>>
>>>>>>> Sounds complicated, but I'll browse through the windows docs.
>>>>>>>
>>>>>>> In my search for a solution I also bumped into
>>>>>>> vtksys::Encoding::ToNarrow(),
>>>>>>> however this gives me an 'unresolved external symbol' link error. I
>>>>>>> have
>>>>>>> never used stuff in vtksys (and similar libs like vtkpng, vtktiff)
>>>>>>> before,
>>>>>>> so I am not really sure I am supposed to, i.e., these are just
>>>>>>> functions
>>>>>>> used by cmake and/or in just the vtk build.
>>>>>>>
>>>>>>> Maarten
>>>>>>>
>>>>>>>
>>>>>>> On Thursday, June 5, 2014 3:44:46 PM, David Gobbi
>>>>>>> <david.gobbi at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hi Maarten,
>>>>>>>
>>>>>>> VTK just uses C++ streams in most of its readers/writers.  On OS X,
>>>>>>> you'll find that you can use utf-8 filenames just fine.  Same for
>>>>>>> linux,
>>>>>>> for the most part at least.
>>>>>>>
>>>>>>> For Windows, you'd have to set the codepage to 65001 (utf-8).  I've
>>>>>>> done this successfully for console I/O (via SetConsoleOutputCP()),
>>>>>>> but you'll have to look through the Windows docs to see what function
>>>>>>> is needed to change the codepage used by CreateFileA.
>>>>>>>
>>>>>>> - David
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jun 5, 2014 at 1:17 PM, Maarten Beek <beekmaarten at yahoo.com>
>>>>>>> wrote:
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I was wondering if there is a multi-platform way of loading a file
>>>>>>>> (e.g.
>>>>>>>> tif, stl) with chinese, russian characters in the file path in VTK.
>>>>>>>> How would I build VTK with Unicode characters?
>>>>>>>> Can I use a non-unicode VTK in a unicode app?
>>>>>>>>
>>>>>>>> Thanks - Maarten
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>


More information about the vtkusers mailing list