MantisBT - ITK
View Issue Details
0009623ITKpublic2009-09-30 10:152012-02-17 09:47
Benjamin Tourne 
Brad King 
normalmajoralways
assignedopen 
 
 
0009623: ITK cannot use Unicode filenames on Visual Studio
Hi all,

It seems that the ITK API does not support Unicode filenames on Visual Studio environment.

Here is a patch for itk with a new test called itkImageFileWriterUnicodeTest.
This test tries to create a ImageFileWriter object with a name that contains the greek symbol '?' (lowcase alpha). This is done by 4 different ways:

1 Using SetFilename(std::string filename, ...) function
2 Using SetFilename(char * filename, ...) function
3 Using SetFileName with an UTF-8 encoded char array

(The last test is for Visual Studio users only:)
4 Using SetFileName with an array encoded with windows system local codepage.

On Visual Studio, the 4 tests fail: The file cannot be created, or the file is created with an incorrect name.

A good solution to this problem would bo add an overloaded function
SetFileName(wchar_t* filename) or SetFileName(std::wstring filename).

Best regards,

Benjamin Tourne.
The patch with the new test is joined to this report.
Unicode, visual
patch itk-unicodewritetest-2009-09-30.patch (8,153) 2009-09-30 10:15
https://public.kitware.com/Bug/file/2500/itk-unicodewritetest-2009-09-30.patch
zip utfcpptest.zip (11,535) 2009-10-20 13:22
https://public.kitware.com/Bug/file/2574/utfcpptest.zip
patch itk-msvc-unicode-2009-10-26.patch (70,177) 2009-10-26 15:56
https://public.kitware.com/Bug/file/2601/itk-msvc-unicode-2009-10-26.patch
zip itkUnicodeIOTest.zip (13,460) 2009-10-27 14:46
https://public.kitware.com/Bug/file/2604/itkUnicodeIOTest.zip
zip itkUnicodeIOTest-2009-11-02.zip (5,097) 2009-11-02 13:04
https://public.kitware.com/Bug/file/2621/itkUnicodeIOTest-2009-11-02.zip
cxx itkUnicodeIOTest.cxx (13,128) 2009-11-11 08:38
https://public.kitware.com/Bug/file/2649/itkUnicodeIOTest.cxx
patch itk-unicodeio-2010-01-12.patch (19,782) 2010-01-12 06:38
https://public.kitware.com/Bug/file/2757/itk-unicodeio-2010-01-12.patch
Issue History
2009-09-30 10:15Benjamin TourneNew Issue
2009-09-30 10:15Benjamin TourneFile Added: itk-unicodewritetest-2009-09-30.patch
2009-09-30 10:16Benjamin TourneTag Attached: visual
2009-09-30 10:16Benjamin TourneTag Attached: Unicode
2009-09-30 12:05Tom VercauterenNote Added: 0017844
2009-10-20 13:22Tom VercauterenFile Added: utfcpptest.zip
2009-10-26 15:56Tom VercauterenFile Added: itk-msvc-unicode-2009-10-26.patch
2009-10-26 15:58Tom VercauterenNote Added: 0018243
2009-10-27 10:26Brad KingNote Added: 0018248
2009-10-27 10:28Brad KingNote Added: 0018249
2009-10-27 14:46Tom VercauterenFile Added: itkUnicodeIOTest.zip
2009-10-27 14:47Tom VercauterenNote Added: 0018251
2009-10-27 15:13Brad KingNote Added: 0018252
2009-10-28 10:59Tom VercauterenNote Added: 0018256
2009-11-02 13:04Tom VercauterenFile Added: itkUnicodeIOTest-2009-11-02.zip
2009-11-02 13:16Tom VercauterenNote Added: 0018318
2009-11-02 13:26Brad KingNote Added: 0018319
2009-11-02 14:01Tom VercauterenNote Added: 0018320
2009-11-03 03:22Tom VercauterenNote Added: 0018326
2009-11-10 20:19Tom VercauterenFile Added: itkUnicodeIOTest.cxx
2009-11-11 08:38Tom VercauterenFile Deleted: itkUnicodeIOTest.cxx
2009-11-11 08:38Tom VercauterenFile Added: itkUnicodeIOTest.cxx
2010-01-12 06:38Tom VercauterenFile Added: itk-unicodeio-2010-01-12.patch
2010-01-12 06:43Tom VercauterenNote Added: 0019094
2010-01-12 09:06Brad KingNote Added: 0019097
2010-01-12 09:58Tom VercauterenNote Added: 0019099
2010-02-15 01:20ediceNote Added: 0019527
2010-11-07 09:01Hans JohnsonStatusnew => assigned
2010-11-07 09:01Hans JohnsonAssigned To => Brad King

Notes
(0017844)
Tom Vercauteren   
2009-09-30 12:05   
For the record, on linux, this unit test ( itk-unicodewritetest-2009-09-30.patch) passes without issue.
(0018243)
Tom Vercauteren   
2009-10-26 15:58   
I have attached for review a preliminary patch (itk-msvc-unicode-2009-10-26.patch) that allows the use of utf-8 encoded filenames on windows for the following formats:
- jpeg
- png
- meta (mhd and mha)
- tiff

Feedback is welcome!
(0018248)
Brad King   
2009-10-27 10:26   
I applied itk-msvc-unicode-2009-10-26.patch locally and scrolled through the changes. I think blocks like

+#ifdef _MSC_VER
+ // Convert to utf16

should test _WIN32 instead...we want to convert to utf16 and use the wide character *windows* API. TIFFOpenW does this already.
(0018249)
Brad King   
2009-10-27 10:28   
For reference, here is the mailing list thread in which this bug is discussed:

  http://www.itk.org/mailman/private/insight-developers/2009-October/013464.html [^]
(0018251)
Tom Vercauteren   
2009-10-27 14:47   
Apparently things are a bit more complex than I thought.

* cygwin (latest stable version) has no unicode support at all:
  * _wfopen and _wunlink are NOT available
  * std::ofstream and std::ifstream have NO open(wchar_t * filename) function

* mingw (latest stable version) has partial unicode support:
  * _wfopen and _wunlink are available
  * std::ofstream and std::ifstream have NO open(wchar_t * filename) function

My proposal fully works only on MSVC. Making it work on mingw will require more change to metaio, i.e. moving from std::ofstream and std::ifstream to FILE * approaches.

The attached test project (itkUnicodeIOTest.zip) shows the results of my experiments.
(0018252)
Brad King   
2009-10-27 15:13   
FYI, I was able to read a unicode filename with the GNU compiler and C++ streams on cygwin like this:

$ cat myfile.txt
hello, world
$ cat stdio_filebuf.cxx
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/fcntl.h>
#include <ext/stdio_filebuf.h>
#include <iostream>
#include <io.h>

int main()
{
  int fd = _wopen(L"myfile.txt", O_RDONLY);
  __gnu_cxx::stdio_filebuf<char> ibuf(fd, std::ios::in);
  std::istream in(&ibuf);
  std::cout << in.rdbuf();
  return 0;
}
$ g++ -mno-cygwin stdio_filebuf.cxx
$ ./a.exe
hello, world

I think it also works with stdio.h-style C FILE* buffers. However, if there is no _wfopen then that may not be an option.
(0018256)
Tom Vercauteren   
2009-10-28 10:59   
Thanks for the information Brad!

I managed to use this code with MinGW but not with Cygwin (with gcc 4). On Cygwin I get:
  '_wopen' was not declared in this scope
(This is exactly the same as what I got for _wfopen).

This seems to be related to the -mno-cygwin flag:
  http://www.delorie.com/howto/cygwin/mno-cygwin-howto.html [^]
However adding the -mno-cygwin flag leads to
  g++: The -mno-cygwin flag has been removed; use a mingw-targeted cross-compiler.

This is apparently a know issue of cygwin's gcc-4:
  http://cygwin.com/ml/cygwin/2009-10/msg00061.html [^]

Anyhow, I also found an alternative to __gnu_cxx::stdio_filebuf, that consist of a single header file and is apparently portable to the platforms that we target:
  http://www.josuttis.com/cppcode/fdstream.html [^]

More experimenting is required, I'll keep information coming on the bug tracker when I get some more time to work on it.
(0018318)
Tom Vercauteren   
2009-11-02 13:16   
I have been experimenting with fdstream. fdstream allows the creation of an istream or ostream from a file descriptor. It seems to work just fine on all plateforms I tried (linux 32 bit with gcc, windows with MSVC, cygwin's gcc and mingw).

Therefore if file with a unicode encoded filename can be opened, performing IO operations on a stream should work.

The attached test (itkUnicodeIOTest-2009-11-02.zip) showed that IO operations on file with unicode filenames works on:
* linux
* windows with MSVC
* windows with MinGW


-----
Note that cygwin doesn't work. This does not contradict Brad's experiment because adding the -mno-cygwin flag to cygwin's compiler essentially turns the compiler into the mingw compiler as __MINGW32__ becomes defined and __CYGWIN__ becomes undefined:

/cygdrive/c/cygwin/bin/gcc-3.exe -mno-cygwin -dM -E- < /dev/null | sort

#define WIN32 1
#define WINNT 1
#define _WIN32 1
#define _X86_ 1
#define __CHAR_BIT__ 8
#define __DBL_DENORM_MIN__ 4.9406564584124654e-324
#define __DBL_DIG__ 15
#define __DBL_EPSILON__ 2.2204460492503131e-16
#define __DBL_HAS_INFINITY__ 1
#define __DBL_HAS_QUIET_NAN__ 1
#define __DBL_MANT_DIG__ 53
#define __DBL_MAX_10_EXP__ 308
#define __DBL_MAX_EXP__ 1024
#define __DBL_MAX__ 1.7976931348623157e+308
#define __DBL_MIN_10_EXP__ (-307)
#define __DBL_MIN_EXP__ (-1021)
#define __DBL_MIN__ 2.2250738585072014e-308
#define __DECIMAL_DIG__ 21
#define __FINITE_MATH_ONLY__ 0
#define __FLT_DENORM_MIN__ 1.40129846e-45F
#define __FLT_DIG__ 6
#define __FLT_EPSILON__ 1.19209290e-7F
#define __FLT_EVAL_METHOD__ 2
#define __FLT_HAS_INFINITY__ 1
#define __FLT_HAS_QUIET_NAN__ 1
#define __FLT_MANT_DIG__ 24
#define __FLT_MAX_10_EXP__ 38
#define __FLT_MAX_EXP__ 128
#define __FLT_MAX__ 3.40282347e+38F
#define __FLT_MIN_10_EXP__ (-37)
#define __FLT_MIN_EXP__ (-125)
#define __FLT_MIN__ 1.17549435e-38F
#define __FLT_RADIX__ 2
#define __GNUC_MINOR__ 4
#define __GNUC_PATCHLEVEL__ 4
#define __GNUC__ 3
#define __GXX_ABI_VERSION 1002
#define __INT_MAX__ 2147483647
#define __LDBL_DENORM_MIN__ 3.64519953188247460253e-4951L
#define __LDBL_DIG__ 18
#define __LDBL_EPSILON__ 1.08420217248550443401e-19L
#define __LDBL_HAS_INFINITY__ 1
#define __LDBL_HAS_QUIET_NAN__ 1
#define __LDBL_MANT_DIG__ 64
#define __LDBL_MAX_10_EXP__ 4932
#define __LDBL_MAX_EXP__ 16384
#define __LDBL_MAX__ 1.18973149535723176502e+4932L
#define __LDBL_MIN_10_EXP__ (-4931)
#define __LDBL_MIN_EXP__ (-16381)
#define __LDBL_MIN__ 3.36210314311209350626e-4932L
#define __LONG_LONG_MAX__ 9223372036854775807LL
#define __LONG_MAX__ 2147483647L
#define __MINGW32__ 1
#define __MSVCRT__ 1
#define __NO_INLINE__ 1
#define __PTRDIFF_TYPE__ int
#define __REGISTER_PREFIX__
#define __SCHAR_MAX__ 127
#define __SHRT_MAX__ 32767
#define __SIZE_TYPE__ unsigned int
#define __STDC_HOSTED__ 1
#define __USER_LABEL_PREFIX__ _
#define __USING_SJLJ_EXCEPTIONS__ 1
#define __VERSION__ "3.4.4 (cygming special, gdc 0.12, using dmd 0.125)"
#define __WCHAR_MAX__ 65535U
#define __WCHAR_TYPE__ short unsigned int
#define __WIN32 1
#define __WIN32__ 1
#define __WINT_TYPE__ unsigned int
#define __cdecl __attribute__((__cdecl__))
#define __declspec(x) __attribute__((x))
#define __fastcall __attribute__((__fastcall__))
#define __i386 1
#define __i386__ 1
#define __stdcall __attribute__((__stdcall__))
#define __tune_i686__ 1
#define __tune_pentiumpro__ 1
#define _cdecl __attribute__((__cdecl__))
#define _fastcall __attribute__((__fastcall__))
#define _stdcall __attribute__((__stdcall__))
#define i386 1
(0018319)
Brad King   
2009-11-02 13:26   
Does "-mwin32" help on cygwin?
(0018320)
Tom Vercauteren   
2009-11-02 14:01   
Unfortunately, adding the "-mwin32" flag does not help on cygwin. As far as I understand it, this is really a cygwin limitation that cannot be overcome. See also this (old) email thread:
http://www.mail-archive.com/cygwin@cygwin.com/msg66767.html [^]
(0018326)
Tom Vercauteren   
2009-11-03 03:22   
Good news for cygwin. The new 1.7 version that is currently in beta gets closer to the linux/mac behavior. Namely, the default encoding for filenames is set to utf-8 and things work out of the box (as on linux and mac).
http://cygwin.com/1.7/cygwin-ug-net/ov-new1.7.html [^]
(0019094)
Tom Vercauteren   
2010-01-12 06:43   
In an attempt to move a little further on this issue, I would like to put all the helpers functions from my unit test
  http://www.itk.org/cgi-bin/viewcvs.cgi/Testing/Code/IO/itkUnicodeIOTest.cxx?root=Insight&sortby=date&view=markup [^]
to one header file. I was thinking of using Code/Common/itkI18nIOHelpers.h and putting the functions in the itk::I18n namespace:
  https://public.kitware.com/Bug/file/2757/itk-unicodeio-2010-01-12.patch [^]

Thoughts?
(0019097)
Brad King   
2010-01-12 09:06   
Fine with me.

BTW, I noticed the use of the "boost" namespace in "itkExtHdrs/fdstream.hpp". When the header was only included in .cxx files that was okay. Now that it may be included through a header we need to be more careful about conflicts. If an application really tries to use boost or has its own version of that header it may conflict. Can you move the code into an itk namespace?
(0019099)
Tom Vercauteren   
2010-01-12 09:58   
fdstream.hpp is not part of boost (http://www.josuttis.com/cppcode/fdstream.html [^]). It has been proposed to boost but was not accepted. Anyway, changing the namespace to itk also seems cleaner to me. I'll commit it together with itkI18nIOHelpers.h tomorrow if I don't get any negative feedback from the itk list.
(0019527)
edice   
2010-02-15 01:20   
Can this work be extended to add unicode support to all of the VTK file readers, etc vtkPNGReader ?

It would be good to be able to pass a std::wstring