View Issue Details [ Jump to Notes ] | [ Print ] | ||||||||||||
ID | Project | Category | View Status | Date Submitted | Last Update | ||||||||
0009623 | ITK | public | 2009-09-30 10:15 | 2012-02-17 09:47 | |||||||||
Reporter | Benjamin Tourne | ||||||||||||
Assigned To | Brad King | ||||||||||||
Priority | normal | Severity | major | Reproducibility | always | ||||||||
Status | assigned | Resolution | open | ||||||||||
Platform | OS | OS Version | |||||||||||
Product Version | |||||||||||||
Target Version | Fixed in Version | ||||||||||||
Summary | 0009623: ITK cannot use Unicode filenames on Visual Studio | ||||||||||||
Description | Hi all, It seems that the ITK API does not support Unicode filenames on Visual Studio environment. Here is a patch for itk with a new test called itkImageFileWriterUnicodeTest. This test tries to create a ImageFileWriter object with a name that contains the greek symbol '?' (lowcase alpha). This is done by 4 different ways: 1 Using SetFilename(std::string filename, ...) function 2 Using SetFilename(char * filename, ...) function 3 Using SetFileName with an UTF-8 encoded char array (The last test is for Visual Studio users only:) 4 Using SetFileName with an array encoded with windows system local codepage. On Visual Studio, the 4 tests fail: The file cannot be created, or the file is created with an incorrect name. A good solution to this problem would bo add an overloaded function SetFileName(wchar_t* filename) or SetFileName(std::wstring filename). Best regards, Benjamin Tourne. | ||||||||||||
Additional Information | The patch with the new test is joined to this report. | ||||||||||||
Tags | Unicode, visual | ||||||||||||
Resolution Date | |||||||||||||
Sprint | |||||||||||||
Sprint Status | |||||||||||||
Attached Files | itk-unicodewritetest-2009-09-30.patch [^] (8,153 bytes) 2009-09-30 10:15 [Show Content] [Hide Content]Index: Testing/Code/IO/CMakeLists.txt =================================================================== RCS file: /cvsroot/Insight/Insight/Testing/Code/IO/CMakeLists.txt,v retrieving revision 1.229 diff -u -r1.229 CMakeLists.txt --- Testing/Code/IO/CMakeLists.txt 11 Aug 2009 12:41:15 -0000 1.229 +++ Testing/Code/IO/CMakeLists.txt 30 Sep 2009 09:21:05 -0000 @@ -68,6 +68,7 @@ itkImageFileReaderDimensionsTest.cxx itkImageFileReaderStreamingTest.cxx itkImageFileWriterTest.cxx +itkImageFileWriterUnicodeTest.cxx itkImageFileWriterTest2.cxx itkImageFileWriterPastingTest1.cxx itkImageFileWriterPastingTest2.cxx @@ -220,6 +221,7 @@ ADD_TEST(itkIOHeaderTest ${IO_HEADER_TEST}) ADD_TEST(itkPolygonGroupSpatialObjectXMLFileTest ${IO_TESTS} itkPolygonGroupSpatialObjectXMLFileTest ${TEMP}) ADD_TEST(itkImageFileWriterTest ${IO_TESTS} itkImageFileWriterTest ${TEMP}/test.png) +ADD_TEST(itkImageFileWriterUnicodeTest ${IO_TESTS} itkImageFileWriterUnicodeTest) ADD_EXECUTABLE(itkIOHeaderTest itkIOHeaderTest.cxx) Index: Testing/Code/IO/itkIOTests.cxx =================================================================== RCS file: /cvsroot/Insight/Insight/Testing/Code/IO/itkIOTests.cxx,v retrieving revision 1.83 diff -u -r1.83 itkIOTests.cxx --- Testing/Code/IO/itkIOTests.cxx 11 Aug 2009 12:41:13 -0000 1.83 +++ Testing/Code/IO/itkIOTests.cxx 30 Sep 2009 09:21:05 -0000 @@ -49,6 +49,7 @@ REGISTER_TEST(itkImageFileReaderTest1); REGISTER_TEST(itkImageFileReaderDimensionsTest); REGISTER_TEST(itkImageFileWriterTest); + REGISTER_TEST(itkImageFileWriterUnicodeTest); REGISTER_TEST(itkImageFileWriterTest2); REGISTER_TEST(itkImageFileWriterPastingTest1); REGISTER_TEST(itkImageFileWriterPastingTest2); Index: Testing/Code/IO/itkImageFileWriterUnicodeTest.cxx =================================================================== RCS file: Testing/Code/IO/itkImageFileWriterUnicodeTest.cxx diff -N Testing/Code/IO/itkImageFileWriterUnicodeTest.cxx --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ Testing/Code/IO/itkImageFileWriterUnicodeTest.cxx 30 Sep 2009 09:21:05 -0000 @@ -0,0 +1,232 @@ +/*========================================================================= + + Program: Insight Segmentation & Registration Toolkit + Module: $RCSfile: itkImageFileWriterTest.cxx,v $ + Language: C++ + Date: $Date: 2008-04-18 20:43:13 $xgoto-l + + Version: $Revision: 1.4 $ + + Copyright (c) 2002 Insight Consortium. All rights reserved. + See ITKCopyright.txt or http://www.itk.org/HTML/Copyright.htm for details. + + This software is distributed WITHOUT ANY WARRANTY; without even + the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR + PURPOSE. See the above copyright notices for more information. + +=========================================================================*/ +#if defined(_MSC_VER) +#pragma warning ( disable : 4786 ) +#endif +#include "itkImage.h" +#include "itkImageFileWriter.h" + +// Check if alpha.mha exists using a wstring on MSVC and fopen with UTF-8 char * otherwise +bool checkAlphaExists() +{ +#ifdef _MSC_VER + const std::wstring wstr( L"\u03B1.mha" ); + return _wfopen(wstr.c_str(), L"r")!=0; + +#else + char utf8_ch[7]; + utf8_ch[0]=0xCE; + utf8_ch[1]=0xB1; + utf8_ch[2]='.'; + utf8_ch[3]='m'; + utf8_ch[4]='h'; + utf8_ch[5]='a'; + utf8_ch[6]=0; + + return fopen(utf8_ch, "r")!=0; +#endif +} + +// Try to delete alpha.mha using a wstring on MSVC and unlink with UTF-8 char * otherwise +void removeAlpha() +{ +#ifdef _MSC_VER + const std::wstring wstr( L"\u03B1.mha" ); + _wunlink(wstr.c_str()); + +#else + char utf8_ch[7]; + utf8_ch[0]=0xCE; + utf8_ch[1]=0xB1; + utf8_ch[2]='.'; + utf8_ch[3]='m'; + utf8_ch[4]='h'; + utf8_ch[5]='a'; + utf8_ch[6]=0; + + unlink(utf8_ch); +#endif + +} + +int itkImageFileWriterUnicodeTest(int ac, char* av[]) +{ + if (ac != 1) + { + std::cout << "usage: itkIOTests itkImageFileWriterUnicodeTest. This tests create an empty mha image, the file name contains greek lettres alpha, beta, gamma." << std::endl; + return EXIT_FAILURE; + } + + typedef itk::Image<short,2> ImageNDType; + typedef itk::ImageFileWriter<ImageNDType> WriterType; + + ImageNDType::Pointer image = ImageNDType::New(); + ImageNDType::RegionType region; + ImageNDType::IndexType index; + ImageNDType::SizeType size; + + + size.Fill(5); + index.Fill(0); + region.SetSize(size); + region.SetIndex(index); + + image->SetRegions(region); + image->Allocate(); + + int nberr = 0; + + removeAlpha(); + // Check if unicode works with std::string version of SetFileName + try + { + WriterType::Pointer writer = WriterType::New(); + writer->SetInput(image); + // \u03B1 : lowercase alpha + + std::string str = "\u03B1.mha"; + writer->SetFileName(str); + writer->Update(); + + if (!checkAlphaExists()) + { + std::cout << "Writing str failed." << std::endl; + ++nberr; + } + else + { + removeAlpha(); + } + } + catch (itk::ExceptionObject &ex) + { + std::cout << "------------------ Caught exception while writing str!" << std::endl; + std::cout << ex; + ++nberr; + } + + // Check if unicode works with char * version of SetFileName (using default encoding from std::string) + try + { + WriterType::Pointer writer = WriterType::New(); + writer->SetInput(image); + + std::string str = "\u03B1.mha"; + writer->SetFileName(str.c_str()); + writer->Update(); + + if (!checkAlphaExists()) + { + std::cout << "Writing str.c_str() failed." << std::endl; + ++nberr; + } + else + { + removeAlpha(); + } + } + catch (itk::ExceptionObject &ex) + { + std::cout << "------------------ Caught exception while writing str.c_str()!" << std::endl; + std::cout << ex; + ++nberr; + } + + // Check if unicode works with char * version of SetFileName (using UTF-8 encoding) + try + { + WriterType::Pointer writer = WriterType::New(); + writer->SetInput(image); + + // 0xCE 0xB1 : UTF-8 encoding for lowercase alpha + char utf8_ch[7]; + utf8_ch[0]=0xCE; + utf8_ch[1]=0xB1; + utf8_ch[2]='.'; + utf8_ch[3]='m'; + utf8_ch[4]='h'; + utf8_ch[5]='a'; + utf8_ch[6]=0; + std::cout << "utf8_ch(" << utf8_ch <<")" << std::endl; + + writer->SetFileName(utf8_ch); + writer->Update(); + + if (!checkAlphaExists()) + { + std::cout << "Writing utf8_ch failed." << std::endl; + ++nberr; + } + else + { + removeAlpha(); + } + } + catch (itk::ExceptionObject &ex) + { + std::cout << "------------------ Caught exception while writing utf8_ch!" << std::endl; + std::cout << ex; + ++nberr; + } + + +#ifdef _MSC_VER + // Check if unicode works with char * version of SetFileName (using local codepage encoding) + // We use MSDN's function WideCharToMultiByte to convert wide string into ANSI Code Page (CP_ACP) + try + { + WriterType::Pointer writer = WriterType::New(); + writer->SetInput(image); + + const std::wstring wstr( L"\u03B1.mha" ); + + char localcp_ch[6]; + WideCharToMultiByte(CP_ACP, 0, wstr.c_str(), -1, localcp_ch, 6, NULL, NULL); + + std::cout << "localcp_ch(" << localcp_ch <<")" << std::endl; + writer->SetFileName(localcp_ch); + writer->Update(); + + if (!checkAlphaExists()) + { + std::cout << "Writing localcp_ch failed." << std::endl; + ++nberr; + } + else + { + removeAlpha(); + } + } + catch (itk::ExceptionObject &ex) + { + std::cout << "------------------ Caught exception while writing localcp_ch!" << std::endl; + std::cout << ex; + ++nberr; + } +#endif + + if (nberr) + { + std::cout << "Failed test. "<< nberr << " error(s)." << std::endl; + return EXIT_FAILURE; + } + + + return EXIT_SUCCESS; + +} utfcpptest.zip [^] (11,535 bytes) 2009-10-20 13:22 itk-msvc-unicode-2009-10-26.patch [^] (70,177 bytes) 2009-10-26 15:56 [Show Content] [Hide Content] Index: CMakeLists.txt =================================================================== RCS file: /cvsroot/Insight/Insight/CMakeLists.txt,v retrieving revision 1.355 diff -u -r1.355 CMakeLists.txt --- CMakeLists.txt 22 Oct 2009 16:49:40 -0000 1.355 +++ CMakeLists.txt 26 Oct 2009 12:51:22 -0000 @@ -227,6 +227,22 @@ ENDIF(ITK_USE_REVIEW_STATISTICS) #----------------------------------------------------------------------------- +# ITK use experimental UTF8 encoding in strings +OPTION( ITK_USE_REVIEW_UTF8_STRINGS "Use experimental utf8 encoding of strings." OFF) +MARK_AS_ADVANCED( ITK_USE_REVIEW_UTF8_STRINGS ) +IF(ITK_USE_REVIEW_UTF8_STRINGS) + IF(NOT ITK_USE_REVIEW) + MESSAGE(FATAL_ERROR "ITK_USE_REVIEW is currently OFF but it should be ON if you want to use the experimental utf8 encoding of strings.") + ENDIF(NOT ITK_USE_REVIEW) + + # Warn the user about the implications of using the new statistics framework. + SET(msg "Attention: You have chosen to use utf8 encoding instead of the default one.") + SET(msg "${msg} You will have to provide utf8 strings at all time, this may require some changes") + SET(msg "${msg} in the code that calls ITK if you were relying on local codepage encoding before.") + MESSAGE("${msg}") +ENDIF(ITK_USE_REVIEW_UTF8_STRINGS) + +#----------------------------------------------------------------------------- # ITK turn on experimental version of accelerated image registration OPTION(ITK_USE_CONSOLIDATED_MORPHOLOGY "Turn on the experimental consolidated morphology." OFF) MARK_AS_ADVANCED(ITK_USE_CONSOLIDATED_MORPHOLOGY) Index: itkConfigure.h.in =================================================================== RCS file: /cvsroot/Insight/Insight/itkConfigure.h.in,v retrieving revision 1.33 diff -u -r1.33 itkConfigure.h.in --- itkConfigure.h.in 16 Jun 2009 07:58:46 -0000 1.33 +++ itkConfigure.h.in 23 Oct 2009 15:22:36 -0000 @@ -80,6 +80,7 @@ #cmakedefine ITK_USE_MINC2 #cmakedefine ITK_USE_OPTIMIZED_REGISTRATION_METHODS #cmakedefine ITK_USE_REVIEW_STATISTICS +#cmakedefine ITK_USE_REVIEW_UTF8_STRINGS #cmakedefine ITK_USE_CONSOLIDATED_MORPHOLOGY #cmakedefine ITK_USE_TRANSFORM_IO_FACTORIES #cmakedefine ITK_USE_ORIENTED_IMAGE_DIRECTION Index: Code/Common/itkMacro.h =================================================================== RCS file: /cvsroot/Insight/Insight/Code/Common/itkMacro.h,v retrieving revision 1.97 diff -u -r1.97 itkMacro.h --- Code/Common/itkMacro.h 16 Jun 2009 07:58:46 -0000 1.97 +++ Code/Common/itkMacro.h 23 Oct 2009 15:57:32 -0000 @@ -268,6 +268,30 @@ /** Set character string. Creates member Set"name"() * (e.g., SetFilename(char *)). The macro assumes that * the class member (name) is declared a type std::string. */ +#ifdef ITK_USE_REVIEW_UTF8_STRINGS + +#define itkSetStringMacro(name) \ + virtual void Set##name (const char* _arg) \ + { \ + Utf8::ThrowIfNotUtf8(_arg); \ + if ( _arg && (_arg == this->m_##name) ) { return;} \ + if (_arg) \ + { \ + this->m_##name = _arg;\ + } \ + else \ + { \ + this->m_##name = ""; \ + } \ + this->Modified(); \ + } \ + virtual void Set##name (const std::string & _arg) \ + { \ + this->Set##name( _arg.c_str() ); \ + } \ + +#else + #define itkSetStringMacro(name) \ virtual void Set##name (const char* _arg) \ { \ @@ -287,6 +311,8 @@ this->Set##name( _arg.c_str() ); \ } \ +#endif + /** Get character string. Creates member Get"name"() * (e.g., SetFilename(char *)). The macro assumes that @@ -1044,4 +1070,10 @@ itkAssertInDebugOrThrowInReleaseMacro( msgstr.str().c_str() ); \ } + + +#ifdef ITK_USE_REVIEW_UTF8_STRINGS +# include "itkUtf8.h" +#endif + #endif //end of itkMacro.h Index: Code/IO/itkImageIOFactory.cxx =================================================================== RCS file: /cvsroot/Insight/Insight/Code/IO/itkImageIOFactory.cxx,v retrieving revision 1.37 diff -u -r1.37 itkImageIOFactory.cxx --- Code/IO/itkImageIOFactory.cxx 25 Feb 2008 02:41:03 -0000 1.37 +++ Code/IO/itkImageIOFactory.cxx 26 Oct 2009 19:46:32 -0000 @@ -113,13 +113,17 @@ // Factory because AnalyzeImageIO->CanRead() has been instrumented to // reject .hdr/.img files with the nifti magic number tags. ObjectFactoryBase::RegisterFactory( AnalyzeImageIOFactory::New()); - ObjectFactoryBase::RegisterFactory( NiftiImageIOFactory::New()); + //Nifti may crash on unicode filenames. Register it last to avoid issues + //ObjectFactoryBase::RegisterFactory( NiftiImageIOFactory::New()); ObjectFactoryBase::RegisterFactory( StimulateImageIOFactory::New()); ObjectFactoryBase::RegisterFactory( JPEGImageIOFactory::New()); ObjectFactoryBase::RegisterFactory( TIFFImageIOFactory::New()); ObjectFactoryBase::RegisterFactory( NrrdImageIOFactory::New() ); ObjectFactoryBase::RegisterFactory( BMPImageIOFactory::New() ); ObjectFactoryBase::RegisterFactory( DICOMImageIO2Factory::New() ); + + // Nifti may crash on unicode filenames. Register it last to avoid issues + ObjectFactoryBase::RegisterFactory( NiftiImageIOFactory::New()); firstTime = false; } } Index: Code/IO/itkJPEGImageIO.cxx =================================================================== RCS file: /cvsroot/Insight/Insight/Code/IO/itkJPEGImageIO.cxx,v retrieving revision 1.27 diff -u -r1.27 itkJPEGImageIO.cxx --- Code/IO/itkJPEGImageIO.cxx 23 Apr 2009 15:21:52 -0000 1.27 +++ Code/IO/itkJPEGImageIO.cxx 26 Oct 2009 13:16:31 -0000 @@ -26,6 +26,10 @@ #include <stdio.h> #include <itksys/SystemTools.hxx> +#ifdef ITK_USE_REVIEW_UTF8_STRINGS +# include "itkUtf8.h" +#endif + extern "C" { // The regular jpeg lossy lib is the 8bits one: @@ -86,7 +90,11 @@ public: JPEGFileWrapper(const char * const fname, const char * const openMode):m_FilePointer(NULL) { +#ifdef ITK_USE_REVIEW_UTF8_STRINGS + m_FilePointer = itk::Utf8::fopen(fname, openMode); +#else m_FilePointer = fopen(fname, openMode); +#endif } virtual ~JPEGFileWrapper() { Index: Code/IO/itkPNGImageIO.cxx =================================================================== RCS file: /cvsroot/Insight/Insight/Code/IO/itkPNGImageIO.cxx,v retrieving revision 1.70 diff -u -r1.70 itkPNGImageIO.cxx --- Code/IO/itkPNGImageIO.cxx 30 Sep 2008 22:01:48 -0000 1.70 +++ Code/IO/itkPNGImageIO.cxx 26 Oct 2009 16:56:28 -0000 @@ -52,7 +52,11 @@ public: PNGFileWrapper(const char * const fname, const char * const openMode):m_FilePointer(NULL) { +#ifdef ITK_USE_REVIEW_UTF8_STRINGS + m_FilePointer = itk::Utf8::fopen(fname, openMode); +#else m_FilePointer = fopen(fname, openMode); +#endif } virtual ~PNGFileWrapper() { Index: Code/IO/itkTIFFImageIO.cxx =================================================================== RCS file: /cvsroot/Insight/Insight/Code/IO/itkTIFFImageIO.cxx,v retrieving revision 1.67 diff -u -r1.67 itkTIFFImageIO.cxx --- Code/IO/itkTIFFImageIO.cxx 3 Jul 2009 18:41:50 -0000 1.67 +++ Code/IO/itkTIFFImageIO.cxx 26 Oct 2009 19:38:30 -0000 @@ -76,7 +76,14 @@ return 0; } +#if defined(_MSC_VER) && defined(ITK_USE_REVIEW_UTF8_STRINGS) + std::string str(filename); + std::wstring str_utf16; + utf8::utf8to16(str.begin(), str.end(), std::back_inserter(str_utf16)); + this->m_Image = TIFFOpenW(str_utf16.c_str(), "r"); +#else this->m_Image = TIFFOpen(filename, "r"); +#endif if ( !this->m_Image) { this->Clean(); @@ -1697,7 +1704,13 @@ int predictor; +#if defined(_MSC_VER) && defined(ITK_USE_REVIEW_UTF8_STRINGS) + std::wstring str_utf16; + utf8::utf8to16(m_FileName.begin(), m_FileName.end(), std::back_inserter(str_utf16)); + TIFF *tif = TIFFOpenW(str_utf16.c_str(), "w"); +#else TIFF *tif = TIFFOpen(m_FileName.c_str(), "w"); +#endif if ( !tif ) { itkExceptionMacro("Error while trying to open file for writing: " Index: Testing/Code/IO/CMakeLists.txt =================================================================== RCS file: /cvsroot/Insight/Insight/Testing/Code/IO/CMakeLists.txt,v retrieving revision 1.229 diff -u -r1.229 CMakeLists.txt --- Testing/Code/IO/CMakeLists.txt 11 Aug 2009 12:41:15 -0000 1.229 +++ Testing/Code/IO/CMakeLists.txt 23 Oct 2009 15:54:04 -0000 @@ -68,6 +68,7 @@ itkImageFileReaderDimensionsTest.cxx itkImageFileReaderStreamingTest.cxx itkImageFileWriterTest.cxx +itkImageFileWriterUnicodeTest.cxx itkImageFileWriterTest2.cxx itkImageFileWriterPastingTest1.cxx itkImageFileWriterPastingTest2.cxx @@ -220,6 +221,7 @@ ADD_TEST(itkIOHeaderTest ${IO_HEADER_TEST}) ADD_TEST(itkPolygonGroupSpatialObjectXMLFileTest ${IO_TESTS} itkPolygonGroupSpatialObjectXMLFileTest ${TEMP}) ADD_TEST(itkImageFileWriterTest ${IO_TESTS} itkImageFileWriterTest ${TEMP}/test.png) +ADD_TEST(itkImageFileWriterUnicodeTest ${IO_TESTS} itkImageFileWriterUnicodeTest ) ADD_EXECUTABLE(itkIOHeaderTest itkIOHeaderTest.cxx) Index: Testing/Code/IO/itkIOTests.cxx =================================================================== RCS file: /cvsroot/Insight/Insight/Testing/Code/IO/itkIOTests.cxx,v retrieving revision 1.83 diff -u -r1.83 itkIOTests.cxx --- Testing/Code/IO/itkIOTests.cxx 11 Aug 2009 12:41:13 -0000 1.83 +++ Testing/Code/IO/itkIOTests.cxx 23 Oct 2009 15:55:22 -0000 @@ -49,6 +49,7 @@ REGISTER_TEST(itkImageFileReaderTest1); REGISTER_TEST(itkImageFileReaderDimensionsTest); REGISTER_TEST(itkImageFileWriterTest); + REGISTER_TEST(itkImageFileWriterUnicodeTest); REGISTER_TEST(itkImageFileWriterTest2); REGISTER_TEST(itkImageFileWriterPastingTest1); REGISTER_TEST(itkImageFileWriterPastingTest2); Index: Utilities/MetaIO/CMakeLists.txt =================================================================== RCS file: /cvsroot/Insight/Insight/Utilities/MetaIO/CMakeLists.txt,v retrieving revision 1.48 diff -u -r1.48 CMakeLists.txt --- Utilities/MetaIO/CMakeLists.txt 11 Jun 2009 20:54:05 -0000 1.48 +++ Utilities/MetaIO/CMakeLists.txt 26 Oct 2009 17:36:27 -0000 @@ -82,6 +82,10 @@ SET_TARGET_PROPERTIES(${METAIO_NAMESPACE} PROPERTIES ${ITK_LIBRARY_PROPERTIES}) ENDIF(ITK_LIBRARY_PROPERTIES) + + IF(ITK_USE_REVIEW_UTF8_STRINGS) + ADD_DEFINITIONS(-DMETAIO_USE_REVIEW_UTF8_STRINGS) + ENDIF(ITK_USE_REVIEW_UTF8_STRINGS) ELSE(METAIO_FOR_ITK) TARGET_LINK_LIBRARIES(${METAIO_NAMESPACE} ${ZLIB_LIBRARIES} ${KWSYS_NAMESPACE}) Index: Utilities/MetaIO/metaImage.cxx =================================================================== RCS file: /cvsroot/Insight/Insight/Utilities/MetaIO/metaImage.cxx,v retrieving revision 1.119 diff -u -r1.119 metaImage.cxx --- Utilities/MetaIO/metaImage.cxx 20 Jul 2009 15:34:20 -0000 1.119 +++ Utilities/MetaIO/metaImage.cxx 26 Oct 2009 19:07:48 -0000 @@ -23,6 +23,10 @@ #include <stdlib.h> // for atoi #include <math.h> +#if defined(_MSC_VER) && defined(METAIO_USE_REVIEW_UTF8_STRINGS) +#include "itkExtHdrs\utf8.h" +#endif + #if defined (__BORLANDC__) && (__BORLANDC__ >= 0x0580) #include <mem.h> #endif @@ -43,10 +47,93 @@ #include <signal.h> /* sigprocmask */ #endif + #if (METAIO_USE_NAMESPACE) namespace METAIO_NAMESPACE { #endif +namespace { + +METAIO_STREAM::ifstream * createReadStream(const char * filename) +{ +#if defined(_MSC_VER) && defined(METAIO_USE_REVIEW_UTF8_STRINGS) + std::string str(filename); + std::wstring str_utf16; + utf8::utf8to16(str.begin(), str.end(), std::back_inserter(str_utf16)); + FILE * file = _wfopen(str_utf16.c_str(), L"rb"); + return new METAIO_STREAM::ifstream(file); +#else + METAIO_STREAM::ifstream * tmpReadStream = new METAIO_STREAM::ifstream; + +#ifdef __sgi + tmpReadStream->open(filename, METAIO_STREAM::ios::in); +#else + tmpReadStream->open(filename, METAIO_STREAM::ios::binary | + METAIO_STREAM::ios::in); + + return tmpReadStream; +#endif +#endif +} + +METAIO_STREAM::ofstream * createWriteStream(const char * filename, const bool append) +{ +#ifdef __sgi + METAIO_STREAM::ofstream * tmpWriteStream = new METAIO_STREAM::ofstream; + + // Some older sgi compilers have a error in the ofstream constructor + // that requires a file to exist for output + { + METAIO_STREAM::ofstream tFile(filename, METAIO_STREAM::ios::out); + tFile.close(); + } + + if(!append) + { + tmpWriteStream->open(filename, METAIO_STREAM::ios::out); + } + else + { + tmpWriteStream->open(filename, METAIO_STREAM::ios::app | + METAIO_STREAM::ios::out); + } + + return tmpWriteStream; +#elif defined(_MSC_VER) && defined(METAIO_USE_REVIEW_UTF8_STRINGS) + std::string str(filename); + std::wstring str_utf16; + utf8::utf8to16(str.begin(), str.end(), std::back_inserter(str_utf16)); + + if(!append) + { + FILE * file = _wfopen(str_utf16.c_str(), L"wb"); + return new METAIO_STREAM::ofstream(file); + } + else + { + FILE * file = _wfopen(str_utf16.c_str(), L"wab"); + return new METAIO_STREAM::ofstream(file); + } +#else + METAIO_STREAM::ofstream * tmpWriteStream = new METAIO_STREAM::ofstream; + + if(!append) + { + tmpWriteStream->open(filename, METAIO_STREAM::ios::binary | + METAIO_STREAM::ios::out); + } + else + { + tmpWriteStream->open(filename, METAIO_STREAM::ios::binary | + METAIO_STREAM::ios::app | + METAIO_STREAM::ios::out); + } + + return tmpWriteStream; +#endif +} + +} // anonymous namespace // // MetaImage Constructors @@ -1109,16 +1196,9 @@ } // Now check the file content - METAIO_STREAM::ifstream inputStream; + METAIO_STREAM::ifstream * inputStream = createReadStream(fname.c_str()); -#ifdef __sgi - inputStream.open( fname.c_str(), METAIO_STREAM::ios::in ); -#else - inputStream.open( fname.c_str(), METAIO_STREAM::ios::in | - METAIO_STREAM::ios::binary ); -#endif - - if( inputStream.fail() ) + if( inputStream->fail() ) { return false; } @@ -1128,13 +1208,14 @@ usePath = MET_GetFilePath(_headerName, pathName); char* buf = new char[8001]; - inputStream.read(buf,8000); - unsigned long fileSize = inputStream.gcount(); + inputStream->read(buf,8000); + unsigned long fileSize = inputStream->gcount(); buf[fileSize] = 0; METAIO_STL::string header(buf); header.resize(fileSize); delete [] buf; - inputStream.close(); + inputStream->close(); + delete inputStream; stringPos = header.find("NDims"); if( stringPos == METAIO_STL::string::npos ) @@ -1163,14 +1244,7 @@ M_PrepareNewReadStream(); - METAIO_STREAM::ifstream * tmpReadStream = new METAIO_STREAM::ifstream; - -#ifdef __sgi - tmpReadStream->open(m_FileName, METAIO_STREAM::ios::in); -#else - tmpReadStream->open(m_FileName, METAIO_STREAM::ios::binary | - METAIO_STREAM::ios::in); -#endif + METAIO_STREAM::ifstream * tmpReadStream = createReadStream(m_FileName); if(!tmpReadStream->rdbuf()->is_open()) { @@ -1271,7 +1345,6 @@ fileImageDim = m_NDims-1; } char s[1024]; - METAIO_STREAM::ifstream* readStreamTemp = new METAIO_STREAM::ifstream; int elementSize; MET_SizeOfType(m_ElementType, &elementSize); elementSize *= m_ElementNumberOfChannels; @@ -1298,13 +1371,8 @@ { strcpy(fName, s); } + METAIO_STREAM::ifstream * readStreamTemp = createReadStream(fName); -#ifdef __sgi - readStreamTemp->open(fName, METAIO_STREAM::ios::in); -#else - readStreamTemp->open(fName, METAIO_STREAM::ios::binary | - METAIO_STREAM::ios::in); -#endif if(!readStreamTemp->rdbuf()->is_open()) { METAIO_STREAM::cerr << "MetaImage: Read: cannot open slice" @@ -1316,9 +1384,9 @@ elementSize]), m_SubQuantity[fileImageDim]); readStreamTemp->close(); + delete readStreamTemp; } } - delete readStreamTemp; } else if(strstr(m_ElementDataFileName, "%")) { @@ -1332,7 +1400,6 @@ int maxV = m_DimSize[m_NDims-1]; int stepV = 1; char s[255]; - METAIO_STREAM::ifstream* readStreamTemp = new METAIO_STREAM::ifstream; MET_StringToWordArray(m_ElementDataFileName, &nWrds, &wrds); if(nWrds >= 2) { @@ -1360,12 +1427,8 @@ { strcpy(fName, s); } -#ifdef __sgi - readStreamTemp->open(fName, METAIO_STREAM::ios::in); -#else - readStreamTemp->open(fName, METAIO_STREAM::ios::binary - | METAIO_STREAM::ios::in); -#endif + METAIO_STREAM::ifstream * readStreamTemp = createReadStream(fName); + if(!readStreamTemp->rdbuf()->is_open()) { METAIO_STREAM::cerr << "MetaImage: Read: cannot construct file" @@ -1380,8 +1443,8 @@ cnt++; readStreamTemp->close(); + delete readStreamTemp; } - delete readStreamTemp; for(i=0; i<nWrds; i++) { delete [] wrds[i]; @@ -1399,14 +1462,7 @@ strcpy(fName, m_ElementDataFileName); } - METAIO_STREAM::ifstream* readStreamTemp = new METAIO_STREAM::ifstream; - -#ifdef __sgi - readStreamTemp->open(fName, METAIO_STREAM::ios::in); -#else - readStreamTemp->open(fName, METAIO_STREAM::ios::binary | - METAIO_STREAM::ios::in); -#endif + METAIO_STREAM::ifstream * readStreamTemp = createReadStream(fName); if(!readStreamTemp->rdbuf()->is_open()) { @@ -1504,37 +1560,7 @@ } } - METAIO_STREAM::ofstream * tmpWriteStream = new METAIO_STREAM::ofstream; - -// Some older sgi compilers have a error in the ofstream constructor -// that requires a file to exist for output -#ifdef __sgi - { - METAIO_STREAM::ofstream tFile(m_FileName, METAIO_STREAM::ios::out); - tFile.close(); - } -#endif - - if(!_append) - { -#ifdef __sgi - tmpWriteStream->open(m_FileName, METAIO_STREAM::ios::out); -#else - tmpWriteStream->open(m_FileName, METAIO_STREAM::ios::binary | - METAIO_STREAM::ios::out); -#endif - } - else - { -#ifdef __sgi - tmpWriteStream->open(m_FileName, METAIO_STREAM::ios::app | - METAIO_STREAM::ios::out); -#else - tmpWriteStream->open(m_FileName, METAIO_STREAM::ios::binary | - METAIO_STREAM::ios::app | - METAIO_STREAM::ios::out); -#endif - } + METAIO_STREAM::ofstream * tmpWriteStream = createWriteStream(m_FileName, _append); if(!tmpWriteStream->rdbuf()->is_open()) { @@ -1675,9 +1701,7 @@ } // Find the start of the data - METAIO_STREAM::ifstream * readStream = new METAIO_STREAM::ifstream; - readStream->open( m_FileName, METAIO_STREAM::ios::binary | - METAIO_STREAM::ios::in); + METAIO_STREAM::ifstream * readStream = createReadStream(m_FileName); // File must be readable if( !MetaObject::ReadStream( m_NDims, readStream ) ) @@ -1860,37 +1884,7 @@ } } - METAIO_STREAM::ofstream * tmpWriteStream = new METAIO_STREAM::ofstream; - - // Some older sgi compilers have a error in the ofstream constructor - // that requires a file to exist for output - #ifdef __sgi - { - METAIO_STREAM::ofstream tFile(m_FileName, METAIO_STREAM::ios::out); - tFile.close(); - } - #endif - - if(!_append) - { - #ifdef __sgi - tmpWriteStream->open(m_FileName, METAIO_STREAM::ios::out); - #else - tmpWriteStream->open(m_FileName, METAIO_STREAM::ios::binary | - METAIO_STREAM::ios::out); - #endif - } - else - { - #ifdef __sgi - tmpWriteStream->open(m_FileName, METAIO_STREAM::ios::app | - METAIO_STREAM::ios::out); - #else - tmpWriteStream->open(m_FileName, METAIO_STREAM::ios::binary | - METAIO_STREAM::ios::app | - METAIO_STREAM::ios::out); - #endif - } + METAIO_STREAM::ofstream * tmpWriteStream = createWriteStream(m_FileName, _append); if(!tmpWriteStream->rdbuf()->is_open()) { @@ -1923,43 +1917,11 @@ { m_WriteStream = NULL; tmpWriteStream->close(); + delete tmpWriteStream; dataPos = 0; - // Some older sgi compilers have a error in the ofstream constructor - // that requires a file to exist for output - #ifdef __sgi - { - METAIO_STREAM::ofstream tFile( m_ElementDataFileName, - METAIO_STREAM::ios::out ); - tFile.close(); - } - #endif - - if( !_append ) - { - #ifdef __sgi - tmpWriteStream->open( m_ElementDataFileName, - METAIO_STREAM::ios::out ); - #else - tmpWriteStream->open( m_ElementDataFileName, - METAIO_STREAM::ios::binary | - METAIO_STREAM::ios::out ); - #endif - } - else - { - #ifdef __sgi - tmpWriteStream->open( m_ElementDataFileName, - METAIO_STREAM::ios::app | - METAIO_STREAM::ios::out ); - #else - tmpWriteStream->open( m_ElementDataFileName, - METAIO_STREAM::ios::binary | - METAIO_STREAM::ios::app | - METAIO_STREAM::ios::out ); - #endif - } + tmpWriteStream = createWriteStream(m_ElementDataFileName, _append); m_WriteStream = tmpWriteStream; } @@ -2561,23 +2523,11 @@ METAIO_STL::streamsize elementNumberOfBytes = elementSize*m_ElementNumberOfChannels; METAIO_STL::streamsize sliceNumberOfBytes = m_SubQuantity[m_NDims-1]*elementNumberOfBytes; - METAIO_STREAM::ofstream* writeStreamTemp = new METAIO_STREAM::ofstream; for(i=1; i<=m_DimSize[m_NDims-1]; i++) { sprintf(fName, dataFileName, i); -// Some older sgi compilers have a error in the ofstream constructor -// that requires a file to exist for output -#ifdef __sgi - { - METAIO_STREAM::ofstream tFile(fName, METAIO_STREAM::ios::out); - tFile.close(); - } - writeStreamTemp->open(fName, METAIO_STREAM::ios::out); -#else - writeStreamTemp->open(fName, METAIO_STREAM::ios::binary | - METAIO_STREAM::ios::out); -#endif + METAIO_STREAM::ofstream * writeStreamTemp = createWriteStream(fName, false); if(!m_CompressedData) { @@ -2605,26 +2555,13 @@ } writeStreamTemp->close(); - } - delete writeStreamTemp; + delete writeStreamTemp; + } } else // write the image in one unique other file { -// Some older sgi compilers have a error in the ofstream constructor -// that requires a file to exist for output - METAIO_STREAM::ofstream* writeStreamTemp = new METAIO_STREAM::ofstream; - -#ifdef __sgi - { - METAIO_STREAM::ofstream tFile(dataFileName, METAIO_STREAM::ios::out); - tFile.close(); - } - writeStreamTemp->open(dataFileName, METAIO_STREAM::ios::out); -#else - writeStreamTemp->open(dataFileName, METAIO_STREAM::ios::binary | - METAIO_STREAM::ios::out); -#endif + METAIO_STREAM::ofstream * writeStreamTemp = createWriteStream(dataFileName, false); MetaImage::M_WriteElementData(writeStreamTemp, _data, _dataQuantity); @@ -2699,14 +2636,7 @@ M_PrepareNewReadStream(); - METAIO_STREAM::ifstream * tmpReadStream = new METAIO_STREAM::ifstream; - -#ifdef __sgi - tmpReadStream->open(m_FileName, METAIO_STREAM::ios::in); -#else - tmpReadStream->open(m_FileName, METAIO_STREAM::ios::binary | - METAIO_STREAM::ios::in); -#endif + METAIO_STREAM::ifstream * tmpReadStream = createReadStream(m_FileName); if(!tmpReadStream->rdbuf()->is_open()) { @@ -2809,7 +2739,6 @@ fileImageDim = m_NDims-1; } char s[1024]; - METAIO_STREAM::ifstream* readStreamTemp = new METAIO_STREAM::ifstream; int elementSize; MET_SizeOfType(m_ElementType, &elementSize); elementSize *= m_ElementNumberOfChannels; @@ -2844,12 +2773,7 @@ strcpy(fName, s); } -#ifdef __sgi - readStreamTemp->open(fName, METAIO_STREAM::ios::in); -#else - readStreamTemp->open(fName, METAIO_STREAM::ios::binary | - METAIO_STREAM::ios::in); -#endif + METAIO_STREAM::ifstream * readStreamTemp = createReadStream(fName); if(!readStreamTemp->rdbuf()->is_open()) { METAIO_STREAM::cerr << "MetaImage: Read: cannot open slice" @@ -2879,9 +2803,9 @@ cnt++; readStreamTemp->close(); + delete readStreamTemp; } } - delete readStreamTemp; } else if(strstr(m_ElementDataFileName, "%")) { @@ -2895,7 +2819,6 @@ int maxV = m_DimSize[m_NDims-1]; int stepV = 1; char s[255]; - METAIO_STREAM::ifstream* readStreamTemp = new METAIO_STREAM::ifstream; MET_StringToWordArray(m_ElementDataFileName, &nWrds, &wrds); if(nWrds >= 2) { @@ -2929,13 +2852,7 @@ strcpy(fName, s); } - -#ifdef __sgi - readStreamTemp->open(fName, METAIO_STREAM::ios::in); -#else - readStreamTemp->open(fName, METAIO_STREAM::ios::binary - | METAIO_STREAM::ios::in); -#endif + METAIO_STREAM::ifstream * readStreamTemp = createReadStream(fName); if(!readStreamTemp->rdbuf()->is_open()) { METAIO_STREAM::cerr << "MetaImage: Read: cannot construct file" @@ -2968,6 +2885,8 @@ delete [] indexMax; readStreamTemp->close(); + + delete readStreamTemp; } for(i=0; i<nWrds; i++) @@ -2975,8 +2894,6 @@ delete [] wrds[i]; } delete [] wrds; - - delete readStreamTemp; } else { @@ -2989,14 +2906,7 @@ strcpy(fName, m_ElementDataFileName); } - METAIO_STREAM::ifstream* readStreamTemp = new METAIO_STREAM::ifstream; - -#ifdef __sgi - readStreamTemp->open(fName, METAIO_STREAM::ios::in); -#else - readStreamTemp->open(fName, METAIO_STREAM::ios::binary | - METAIO_STREAM::ios::in); -#endif + METAIO_STREAM::ifstream * readStreamTemp = createReadStream(fName); if(!readStreamTemp->rdbuf()->is_open()) { --- Code/Common/itkUtf8.h +++ Code/Common/itkUtf8.h @@ -0,0 +1,59 @@ +/*========================================================================= + + Program: Insight Segmentation & Registration Toolkit + Module: $RCSfile: itkUtf8.h,v $ + Language: C++ + Date: $Date: 2009-10-23 16:12:43 $ + Version: $Revision: 1.0 $ + + Copyright (c) Insight Software Consortium. All rights reserved. + See ITKCopyright.txt or http://www.itk.org/HTML/Copyright.htm for details. + + Portions of this code are covered under the VTK copyright. + See VTKCopyright.txt or http://www.kitware.com/VTKCopyright.htm for details. + + This software is distributed WITHOUT ANY WARRANTY; without even + the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR + PURPOSE. See the above copyright notices for more information. + +=========================================================================*/ + +#ifndef __itkUtf8_h +#define __itkUtf8_h + +#include "itkExtHdrs\utf8.h" + +namespace itk +{ +namespace Utf8 +{ + +inline bool IsValidUtf8(const std::string & str) +{ + return ( str.end() == utf8::find_invalid(str.begin(), str.end()) ); +} + +inline void ThrowIfNotUtf8(const std::string & str) +{ + if ( !IsValidUtf8(str) ) + { + itkGenericExceptionMacro( << "A non-utf8 string was used." ); + } +} + +inline FILE * fopen ( const std::string & str, const std::string & mode ) +{ +#ifdef _MSC_VER + // Convert to utf16 + std::wstring str_utf16, mode_utf16; + utf8::utf8to16(str.begin(), str.end(), std::back_inserter(str_utf16)); + utf8::utf8to16(mode.begin(), mode.end(), std::back_inserter(mode_utf16)); + return _wfopen(str_utf16.c_str(), mode_utf16.c_str()); +#else + return fopen(str.c_str(), mode.c_str()); +#endif +} + +} // end namespace Utf8 +} // end namespace itk +#endif // end of itkUtf8.h --- Testing/Code/IO/itkImageFileWriterUnicodeTest.cxx +++ Testing/Code/IO/itkImageFileWriterUnicodeTest.cxx @@ -0,0 +1,161 @@ +/*========================================================================= + + Program: Insight Segmentation & Registration Toolkit + Module: $RCSfile: itkImageFileWriterTest.cxx,v $ + Language: C++ + Date: $Date: 2008-04-18 20:43:13 $xgoto-l + + Version: $Revision: 1.4 $ + + Copyright (c) 2002 Insight Consortium. All rights reserved. + See ITKCopyright.txt or http://www.itk.org/HTML/Copyright.htm for details. + + This software is distributed WITHOUT ANY WARRANTY; without even + the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR + PURPOSE. See the above copyright notices for more information. + +=========================================================================*/ +#if defined(_MSC_VER) +#pragma warning ( disable : 4786 ) +#endif +#include "itkImage.h" +#include "itkImageFileWriter.h" +#include "itkUtf8.h" + +// Check if alpha.xxx exists using a wstring on MSVC and fopen with UTF-8 char * otherwise +bool checkAlphaExists(const std::string & extention) +{ +#ifdef _MSC_VER + std::wstring extention_utf16; + utf8::utf8to16(extention.begin(), extention.end(), std::back_inserter(extention_utf16)); + std::wstring wstr( L"\u03B1." ); + wstr += extention_utf16; + return _wfopen(wstr.c_str(), L"r")!=0; + +#else + std::string utf8_str; + utf8_str.push_back(0xCE); + utf8_str.push_back(0xB1); + utf8_str += "."; + utf8_str += extention; + + return fopen(utf8_str.c_str(), "r")!=0; +#endif +} + +// Try to delete alpha.xxx using a wstring on MSVC and unlink with UTF-8 char * otherwise +void removeAlpha(const std::string & extention) +{ +#ifdef _MSC_VER + std::wstring extention_utf16; + utf8::utf8to16(extention.begin(), extention.end(), std::back_inserter(extention_utf16)); + std::wstring wstr( L"\u03B1." ); + wstr += extention_utf16; + + _wunlink(wstr.c_str()); + +#else + std::string utf8_str; + utf8_str.push_back(0xCE); + utf8_str.push_back(0xB1); + utf8_str += "."; + utf8_str += extention; + + unlink(utf8_str.c_str()); +#endif +} + +int itkImageFileWriterUnicodeTest(int ac, char* av[]) +{ + if (ac != 1) + { + std::cout << "usage: itkIOTests itkImageFileWriterUnicodeTest. This tests create an empty image, the file name contains the greek lettre alpha." << std::endl; + return EXIT_FAILURE; + } + + typedef itk::Image<unsigned char,2> ImageNDType; + typedef itk::ImageFileWriter<ImageNDType> WriterType; + + ImageNDType::Pointer image = ImageNDType::New(); + ImageNDType::RegionType region; + ImageNDType::IndexType index; + ImageNDType::SizeType size; + + + size.Fill(5); + index.Fill(0); + region.SetSize(size); + region.SetIndex(index); + + image->SetRegions(region); + image->Allocate(); + image->FillBuffer(0); + + int nberr = 0; + + std::vector<std::string> extentions; + extentions.push_back("jpeg"); + extentions.push_back("jpg"); + extentions.push_back("png"); + extentions.push_back("mha"); + extentions.push_back("mhd"); + extentions.push_back("tiff"); + + for ( unsigned int i=0; i<extentions.size(); ++i ) + { + removeAlpha(extentions[i]); + + // Check if unicode works with std::string version of SetFileName + try + { + WriterType::Pointer writer = WriterType::New(); + writer->SetInput(image); + + // lowercase alpha + std::string str; + str.push_back(0xCE); + str.push_back(0xB1); + str += "."; + str += extentions[i]; + + writer->SetFileName(str); + writer->Update(); + + if (!checkAlphaExists(extentions[i])) + { + std::cout << "Writing str failed." << std::endl; + ++nberr; + } + else + { + removeAlpha(extentions[i]); + } + } + catch (itk::ExceptionObject &ex) + { + std::cout << "------------------ Caught itk exception while writing str!" << std::endl; + std::cout << ex; + ++nberr; + } + catch (std::exception &ex) + { + std::cout << "------------------ Caught std exception while writing str!" << std::endl; + std::cout << ex.what(); + ++nberr; + } + catch (...) + { + std::cout << "------------------ Caught uknown exception while writing str!" << std::endl; + ++nberr; + } + } + + if (nberr) + { + std::cout << "Failed test. "<< nberr << " error(s)." << std::endl; + return EXIT_FAILURE; + } + + std::cout << "Test passed."<< std::endl; + return EXIT_SUCCESS; +} --- Utilities/itkExtHdrs/utf8.h +++ Utilities/itkExtHdrs/utf8.h @@ -0,0 +1,34 @@ +// Copyright 2006 Nemanja Trifunovic + +/* +Permission is hereby granted, free of charge, to any person or organization +obtaining a copy of the software and accompanying documentation covered by +this license (the "Software") to use, reproduce, display, distribute, +execute, and transmit the Software, and to prepare derivative works of the +Software, and to permit third-parties to whom the Software is furnished to +do so, all subject to the following: + +The copyright notices in the Software and this entire statement, including +the above license grant, this restriction and the following disclaimer, +must be included in all copies of the Software, in whole or in part, and +all derivative works of the Software, unless such copies or derivative +works are solely in the form of machine-executable object code generated by +a source language processor. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT +SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE +FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE, +ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER +DEALINGS IN THE SOFTWARE. +*/ + + +#ifndef UTF8_FOR_CPP_2675DCD0_9480_4c0c_B92A_CC14C027B731 +#define UTF8_FOR_CPP_2675DCD0_9480_4c0c_B92A_CC14C027B731 + +#include "utf8/checked.h" +#include "utf8/unchecked.h" + +#endif // header guard --- Utilities/itkExtHdrs/utf8/checked.h +++ Utilities/itkExtHdrs/utf8/checked.h @@ -0,0 +1,319 @@ +// Copyright 2006 Nemanja Trifunovic + +/* +Permission is hereby granted, free of charge, to any person or organization +obtaining a copy of the software and accompanying documentation covered by +this license (the "Software") to use, reproduce, display, distribute, +execute, and transmit the Software, and to prepare derivative works of the +Software, and to permit third-parties to whom the Software is furnished to +do so, all subject to the following: + +The copyright notices in the Software and this entire statement, including +the above license grant, this restriction and the following disclaimer, +must be included in all copies of the Software, in whole or in part, and +all derivative works of the Software, unless such copies or derivative +works are solely in the form of machine-executable object code generated by +a source language processor. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT +SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE +FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE, +ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER +DEALINGS IN THE SOFTWARE. +*/ + + +#ifndef UTF8_FOR_CPP_CHECKED_H_2675DCD0_9480_4c0c_B92A_CC14C027B731 +#define UTF8_FOR_CPP_CHECKED_H_2675DCD0_9480_4c0c_B92A_CC14C027B731 + +#include "core.h" +#include <stdexcept> + +namespace utf8 +{ + // Exceptions that may be thrown from the library functions. + class invalid_code_point : public std::exception { + uint32_t cp; + public: + invalid_code_point(uint32_t cp) : cp(cp) {} + virtual const char* what() const throw() { return "Invalid code point"; } + uint32_t code_point() const {return cp;} + }; + + class invalid_utf8 : public std::exception { + uint8_t u8; + public: + invalid_utf8 (uint8_t u) : u8(u) {} + virtual const char* what() const throw() { return "Invalid UTF-8"; } + uint8_t utf8_octet() const {return u8;} + }; + + class invalid_utf16 : public std::exception { + uint16_t u16; + public: + invalid_utf16 (uint16_t u) : u16(u) {} + virtual const char* what() const throw() { return "Invalid UTF-16"; } + uint16_t utf16_word() const {return u16;} + }; + + class not_enough_room : public std::exception { + public: + virtual const char* what() const throw() { return "Not enough space"; } + }; + + /// The library API - functions intended to be called by the users + + template <typename octet_iterator, typename output_iterator> + output_iterator replace_invalid(octet_iterator start, octet_iterator end, output_iterator out, uint32_t replacement) + { + while (start != end) { + octet_iterator sequence_start = start; + internal::utf_error err_code = internal::validate_next(start, end); + switch (err_code) { + case internal::UTF8_OK : + for (octet_iterator it = sequence_start; it != start; ++it) + *out++ = *it; + break; + case internal::NOT_ENOUGH_ROOM: + throw not_enough_room(); + case internal::INVALID_LEAD: + append (replacement, out); + ++start; + break; + case internal::INCOMPLETE_SEQUENCE: + case internal::OVERLONG_SEQUENCE: + case internal::INVALID_CODE_POINT: + append (replacement, out); + ++start; + // just one replacement mark for the sequence + while (internal::is_trail(*start) && start != end) + ++start; + break; + } + } + return out; + } + + template <typename octet_iterator, typename output_iterator> + inline output_iterator replace_invalid(octet_iterator start, octet_iterator end, output_iterator out) + { + static const uint32_t replacement_marker = internal::mask16(0xfffd); + return replace_invalid(start, end, out, replacement_marker); + } + + template <typename octet_iterator> + octet_iterator append(uint32_t cp, octet_iterator result) + { + if (!internal::is_code_point_valid(cp)) + throw invalid_code_point(cp); + + if (cp < 0x80) // one octet + *(result++) = static_cast<uint8_t>(cp); + else if (cp < 0x800) { // two octets + *(result++) = static_cast<uint8_t>((cp >> 6) | 0xc0); + *(result++) = static_cast<uint8_t>((cp & 0x3f) | 0x80); + } + else if (cp < 0x10000) { // three octets + *(result++) = static_cast<uint8_t>((cp >> 12) | 0xe0); + *(result++) = static_cast<uint8_t>(((cp >> 6) & 0x3f) | 0x80); + *(result++) = static_cast<uint8_t>((cp & 0x3f) | 0x80); + } + else { // four octets + *(result++) = static_cast<uint8_t>((cp >> 18) | 0xf0); + *(result++) = static_cast<uint8_t>(((cp >> 12) & 0x3f) | 0x80); + *(result++) = static_cast<uint8_t>(((cp >> 6) & 0x3f) | 0x80); + *(result++) = static_cast<uint8_t>((cp & 0x3f) | 0x80); + } + return result; + } + + template <typename octet_iterator> + uint32_t next(octet_iterator& it, octet_iterator end) + { + uint32_t cp = 0; + internal::utf_error err_code = internal::validate_next(it, end, &cp); + switch (err_code) { + case internal::UTF8_OK : + break; + case internal::NOT_ENOUGH_ROOM : + throw not_enough_room(); + case internal::INVALID_LEAD : + case internal::INCOMPLETE_SEQUENCE : + case internal::OVERLONG_SEQUENCE : + throw invalid_utf8(*it); + case internal::INVALID_CODE_POINT : + throw invalid_code_point(cp); + } + return cp; + } + + template <typename octet_iterator> + uint32_t peek_next(octet_iterator it, octet_iterator end) + { + return next(it, end); + } + + template <typename octet_iterator> + uint32_t prior(octet_iterator& it, octet_iterator start) + { + octet_iterator end = it; + while (internal::is_trail(*(--it))) + if (it < start) + throw invalid_utf8(*it); // error - no lead byte in the sequence + octet_iterator temp = it; + return next(temp, end); + } + + /// Deprecated in versions that include "prior" + template <typename octet_iterator> + uint32_t previous(octet_iterator& it, octet_iterator pass_start) + { + octet_iterator end = it; + while (internal::is_trail(*(--it))) + if (it == pass_start) + throw invalid_utf8(*it); // error - no lead byte in the sequence + octet_iterator temp = it; + return next(temp, end); + } + + template <typename octet_iterator, typename distance_type> + void advance (octet_iterator& it, distance_type n, octet_iterator end) + { + for (distance_type i = 0; i < n; ++i) + next(it, end); + } + + template <typename octet_iterator> + typename std::iterator_traits<octet_iterator>::difference_type + distance (octet_iterator first, octet_iterator last) + { + typename std::iterator_traits<octet_iterator>::difference_type dist; + for (dist = 0; first < last; ++dist) + next(first, last); + return dist; + } + + template <typename u16bit_iterator, typename octet_iterator> + octet_iterator utf16to8 (u16bit_iterator start, u16bit_iterator end, octet_iterator result) + { + while (start != end) { + uint32_t cp = internal::mask16(*start++); + // Take care of surrogate pairs first + if (internal::is_lead_surrogate(cp)) { + if (start != end) { + uint32_t trail_surrogate = internal::mask16(*start++); + if (internal::is_trail_surrogate(trail_surrogate)) + cp = (cp << 10) + trail_surrogate + internal::SURROGATE_OFFSET; + else + throw invalid_utf16(static_cast<uint16_t>(trail_surrogate)); + } + else + throw invalid_utf16(static_cast<uint16_t>(*start)); + + } + // Lone trail surrogate + else if (internal::is_trail_surrogate(cp)) + throw invalid_utf16(static_cast<uint16_t>(cp)); + + result = append(cp, result); + } + return result; + } + + template <typename u16bit_iterator, typename octet_iterator> + u16bit_iterator utf8to16 (octet_iterator start, octet_iterator end, u16bit_iterator result) + { + while (start != end) { + uint32_t cp = next(start, end); + if (cp > 0xffff) { //make a surrogate pair + *result++ = static_cast<uint16_t>((cp >> 10) + internal::LEAD_OFFSET); + *result++ = static_cast<uint16_t>((cp & 0x3ff) + internal::TRAIL_SURROGATE_MIN); + } + else + *result++ = static_cast<uint16_t>(cp); + } + return result; + } + + template <typename octet_iterator, typename u32bit_iterator> + octet_iterator utf32to8 (u32bit_iterator start, u32bit_iterator end, octet_iterator result) + { + while (start != end) + result = append(*(start++), result); + + return result; + } + + template <typename octet_iterator, typename u32bit_iterator> + u32bit_iterator utf8to32 (octet_iterator start, octet_iterator end, u32bit_iterator result) + { + while (start < end) + (*result++) = next(start, end); + + return result; + } + + // The iterator class + template <typename octet_iterator> + class iterator : public std::iterator <std::bidirectional_iterator_tag, uint32_t> { + octet_iterator it; + octet_iterator range_start; + octet_iterator range_end; + public: + iterator () {}; + explicit iterator (const octet_iterator& octet_it, + const octet_iterator& range_start, + const octet_iterator& range_end) : + it(octet_it), range_start(range_start), range_end(range_end) + { + if (it < range_start || it > range_end) + throw std::out_of_range("Invalid utf-8 iterator position"); + } + // the default "big three" are OK + octet_iterator base () const { return it; } + uint32_t operator * () const + { + octet_iterator temp = it; + return next(temp, range_end); + } + bool operator == (const iterator& rhs) const + { + if (range_start != rhs.range_start || range_end != rhs.range_end) + throw std::logic_error("Comparing utf-8 iterators defined with different ranges"); + return (it == rhs.it); + } + bool operator != (const iterator& rhs) const + { + return !(operator == (rhs)); + } + iterator& operator ++ () + { + next(it, range_end); + return *this; + } + iterator operator ++ (int) + { + iterator temp = *this; + next(it, range_end); + return temp; + } + iterator& operator -- () + { + prior(it, range_start); + return *this; + } + iterator operator -- (int) + { + iterator temp = *this; + prior(it, range_start); + return temp; + } + }; // class iterator + +} // namespace utf8 + +#endif //header guard + + --- Utilities/itkExtHdrs/utf8/core.h +++ Utilities/itkExtHdrs/utf8/core.h @@ -0,0 +1,346 @@ +// Copyright 2006 Nemanja Trifunovic + +/* +Permission is hereby granted, free of charge, to any person or organization +obtaining a copy of the software and accompanying documentation covered by +this license (the "Software") to use, reproduce, display, distribute, +execute, and transmit the Software, and to prepare derivative works of the +Software, and to permit third-parties to whom the Software is furnished to +do so, all subject to the following: + +The copyright notices in the Software and this entire statement, including +the above license grant, this restriction and the following disclaimer, +must be included in all copies of the Software, in whole or in part, and +all derivative works of the Software, unless such copies or derivative +works are solely in the form of machine-executable object code generated by +a source language processor. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT +SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE +FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE, +ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER +DEALINGS IN THE SOFTWARE. +*/ + + +#ifndef UTF8_FOR_CPP_CORE_H_2675DCD0_9480_4c0c_B92A_CC14C027B731 +#define UTF8_FOR_CPP_CORE_H_2675DCD0_9480_4c0c_B92A_CC14C027B731 + +#include <iterator> + +namespace utf8 +{ + // The typedefs for 8-bit, 16-bit and 32-bit unsigned integers + // You may need to change them to match your system. + // These typedefs have the same names as ones from cstdint, or boost/cstdint + typedef unsigned char uint8_t; + typedef unsigned short uint16_t; + typedef unsigned int uint32_t; + +// Helper code - not intended to be directly called by the library users. May be changed at any time +namespace internal +{ + // Unicode constants + // Leading (high) surrogates: 0xd800 - 0xdbff + // Trailing (low) surrogates: 0xdc00 - 0xdfff + const uint16_t LEAD_SURROGATE_MIN = 0xd800u; + const uint16_t LEAD_SURROGATE_MAX = 0xdbffu; + const uint16_t TRAIL_SURROGATE_MIN = 0xdc00u; + const uint16_t TRAIL_SURROGATE_MAX = 0xdfffu; + const uint16_t LEAD_OFFSET = LEAD_SURROGATE_MIN - (0x10000 >> 10); + const uint32_t SURROGATE_OFFSET = 0x10000u - (LEAD_SURROGATE_MIN << 10) - TRAIL_SURROGATE_MIN; + + // Maximum valid value for a Unicode code point + const uint32_t CODE_POINT_MAX = 0x0010ffffu; + + template<typename octet_type> + inline uint8_t mask8(octet_type oc) + { + return static_cast<uint8_t>(0xff & oc); + } + template<typename u16_type> + inline uint16_t mask16(u16_type oc) + { + return static_cast<uint16_t>(0xffff & oc); + } + template<typename octet_type> + inline bool is_trail(octet_type oc) + { + return ((mask8(oc) >> 6) == 0x2); + } + + template <typename u16> + inline bool is_lead_surrogate(u16 cp) + { + return (cp >= LEAD_SURROGATE_MIN && cp <= LEAD_SURROGATE_MAX); + } + + template <typename u16> + inline bool is_trail_surrogate(u16 cp) + { + return (cp >= TRAIL_SURROGATE_MIN && cp <= TRAIL_SURROGATE_MAX); + } + + template <typename u16> + inline bool is_surrogate(u16 cp) + { + return (cp >= LEAD_SURROGATE_MIN && cp <= TRAIL_SURROGATE_MAX); + } + + template <typename u32> + inline bool is_code_point_valid(u32 cp) + { + return (cp <= CODE_POINT_MAX && !is_surrogate(cp) && cp != 0xfffe && cp != 0xffff); + } + + template <typename octet_iterator> + inline typename std::iterator_traits<octet_iterator>::difference_type + sequence_length(octet_iterator lead_it) + { + uint8_t lead = mask8(*lead_it); + if (lead < 0x80) + return 1; + else if ((lead >> 5) == 0x6) + return 2; + else if ((lead >> 4) == 0xe) + return 3; + else if ((lead >> 3) == 0x1e) + return 4; + else + return 0; + } + + inline bool is_overlong_sequence(uint32_t cp, int length) + { + if (cp < 0x80) { + if (length != 1) + return true; + } + else if (cp < 0x800) { + if (length != 2) + return true; + } + else if (cp < 0x10000) { + if (length != 3) + return true; + } + + return false; + } + + enum utf_error {UTF8_OK, NOT_ENOUGH_ROOM, INVALID_LEAD, INCOMPLETE_SEQUENCE, OVERLONG_SEQUENCE, INVALID_CODE_POINT}; + + /// get_sequence_x functions decode utf-8 sequences of the length x + + template <typename octet_iterator> + utf_error get_sequence_1(octet_iterator& it, octet_iterator end, uint32_t* code_point) + { + if (it != end) { + if (code_point) + *code_point = mask8(*it); + return UTF8_OK; + } + return NOT_ENOUGH_ROOM; + } + + template <typename octet_iterator> + utf_error get_sequence_2(octet_iterator& it, octet_iterator end, uint32_t* code_point) + { + utf_error ret_code = NOT_ENOUGH_ROOM; + + if (it != end) { + uint32_t cp = mask8(*it); + if (++it != end) { + if (is_trail(*it)) { + cp = ((cp << 6) & 0x7ff) + ((*it) & 0x3f); + + if (code_point) + *code_point = cp; + ret_code = UTF8_OK; + } + else + ret_code = INCOMPLETE_SEQUENCE; + } + else + ret_code = NOT_ENOUGH_ROOM; + } + + return ret_code; + } + + template <typename octet_iterator> + utf_error get_sequence_3(octet_iterator& it, octet_iterator end, uint32_t* code_point) + { + utf_error ret_code = NOT_ENOUGH_ROOM; + + if (it != end) { + uint32_t cp = mask8(*it); + if (++it != end) { + if (is_trail(*it)) { + cp = ((cp << 12) & 0xffff) + ((mask8(*it) << 6) & 0xfff); + if (++it != end) { + if (is_trail(*it)) { + cp += (*it) & 0x3f; + + if (code_point) + *code_point = cp; + ret_code = UTF8_OK; + } + else + ret_code = INCOMPLETE_SEQUENCE; + } + else + ret_code = NOT_ENOUGH_ROOM; + } + else + ret_code = INCOMPLETE_SEQUENCE; + } + else + ret_code = NOT_ENOUGH_ROOM; + } + + return ret_code; + } + + template <typename octet_iterator> + utf_error get_sequence_4(octet_iterator& it, octet_iterator end, uint32_t* code_point) + { + utf_error ret_code = NOT_ENOUGH_ROOM; + + if (it != end) { + uint32_t cp = mask8(*it); + if (++it != end) { + if (is_trail(*it)) { + cp = ((cp << 18) & 0x1fffff) + ((mask8(*it) << 12) & 0x3ffff); + if (++it != end) { + if (is_trail(*it)) { + cp += (mask8(*it) << 6) & 0xfff; + if (++it != end) { + if (is_trail(*it)) { + cp += (*it) & 0x3f; + + if (code_point) + *code_point = cp; + ret_code = UTF8_OK; + } + else + ret_code = INCOMPLETE_SEQUENCE; + } + else + ret_code = NOT_ENOUGH_ROOM; + } + else + ret_code = INCOMPLETE_SEQUENCE; + } + else + ret_code = NOT_ENOUGH_ROOM; + } + else + ret_code = INCOMPLETE_SEQUENCE; + } + else + ret_code = NOT_ENOUGH_ROOM; + } + + return ret_code; + } + + template <typename octet_iterator> + utf_error validate_next(octet_iterator& it, octet_iterator end, uint32_t* code_point) + { + // Save the original value of it so we can go back in case of failure + // Of course, it does not make much sense with i.e. stream iterators + octet_iterator original_it = it; + + uint32_t cp = 0; + // Determine the sequence length based on the lead octet + typedef typename std::iterator_traits<octet_iterator>::difference_type octet_difference_type; + octet_difference_type length = sequence_length(it); + if (length == 0) + return INVALID_LEAD; + + // Now that we have a valid sequence length, get trail octets and calculate the code point + utf_error err = UTF8_OK; + switch (length) { + case 1: + err = get_sequence_1(it, end, &cp); + break; + case 2: + err = get_sequence_2(it, end, &cp); + break; + case 3: + err = get_sequence_3(it, end, &cp); + break; + case 4: + err = get_sequence_4(it, end, &cp); + break; + } + + if (err == UTF8_OK) { + // Decoding succeeded. Now, security checks... + if (is_code_point_valid(cp)) { + if (!is_overlong_sequence(cp, length)){ + // Passed! Return here. + if (code_point) + *code_point = cp; + ++it; + return UTF8_OK; + } + else + err = OVERLONG_SEQUENCE; + } + else + err = INVALID_CODE_POINT; + } + + // Failure branch - restore the original value of the iterator + it = original_it; + return err; + } + + template <typename octet_iterator> + inline utf_error validate_next(octet_iterator& it, octet_iterator end) { + return validate_next(it, end, 0); + } + +} // namespace internal + + /// The library API - functions intended to be called by the users + + // Byte order mark + const uint8_t bom[] = {0xef, 0xbb, 0xbf}; + + template <typename octet_iterator> + octet_iterator find_invalid(octet_iterator start, octet_iterator end) + { + octet_iterator result = start; + while (result != end) { + internal::utf_error err_code = internal::validate_next(result, end); + if (err_code != internal::UTF8_OK) + return result; + } + return result; + } + + template <typename octet_iterator> + inline bool is_valid(octet_iterator start, octet_iterator end) + { + return (find_invalid(start, end) == end); + } + + template <typename octet_iterator> + inline bool is_bom (octet_iterator it) + { + return ( + (internal::mask8(*it++)) == bom[0] && + (internal::mask8(*it++)) == bom[1] && + (internal::mask8(*it)) == bom[2] + ); + } +} // namespace utf8 + +#endif // header guard + + --- Utilities/itkExtHdrs/utf8/unchecked.h +++ Utilities/itkExtHdrs/utf8/unchecked.h @@ -0,0 +1,228 @@ +// Copyright 2006 Nemanja Trifunovic + +/* +Permission is hereby granted, free of charge, to any person or organization +obtaining a copy of the software and accompanying documentation covered by +this license (the "Software") to use, reproduce, display, distribute, +execute, and transmit the Software, and to prepare derivative works of the +Software, and to permit third-parties to whom the Software is furnished to +do so, all subject to the following: + +The copyright notices in the Software and this entire statement, including +the above license grant, this restriction and the following disclaimer, +must be included in all copies of the Software, in whole or in part, and +all derivative works of the Software, unless such copies or derivative +works are solely in the form of machine-executable object code generated by +a source language processor. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT +SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE +FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE, +ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER +DEALINGS IN THE SOFTWARE. +*/ + + +#ifndef UTF8_FOR_CPP_UNCHECKED_H_2675DCD0_9480_4c0c_B92A_CC14C027B731 +#define UTF8_FOR_CPP_UNCHECKED_H_2675DCD0_9480_4c0c_B92A_CC14C027B731 + +#include "core.h" + +namespace utf8 +{ + namespace unchecked + { + template <typename octet_iterator> + octet_iterator append(uint32_t cp, octet_iterator result) + { + if (cp < 0x80) // one octet + *(result++) = static_cast<uint8_t>(cp); + else if (cp < 0x800) { // two octets + *(result++) = static_cast<uint8_t>((cp >> 6) | 0xc0); + *(result++) = static_cast<uint8_t>((cp & 0x3f) | 0x80); + } + else if (cp < 0x10000) { // three octets + *(result++) = static_cast<uint8_t>((cp >> 12) | 0xe0); + *(result++) = static_cast<uint8_t>(((cp >> 6) & 0x3f) | 0x80); + *(result++) = static_cast<uint8_t>((cp & 0x3f) | 0x80); + } + else { // four octets + *(result++) = static_cast<uint8_t>((cp >> 18) | 0xf0); + *(result++) = static_cast<uint8_t>(((cp >> 12) & 0x3f)| 0x80); + *(result++) = static_cast<uint8_t>(((cp >> 6) & 0x3f) | 0x80); + *(result++) = static_cast<uint8_t>((cp & 0x3f) | 0x80); + } + return result; + } + + template <typename octet_iterator> + uint32_t next(octet_iterator& it) + { + uint32_t cp = internal::mask8(*it); + typename std::iterator_traits<octet_iterator>::difference_type length = utf8::internal::sequence_length(it); + switch (length) { + case 1: + break; + case 2: + it++; + cp = ((cp << 6) & 0x7ff) + ((*it) & 0x3f); + break; + case 3: + ++it; + cp = ((cp << 12) & 0xffff) + ((internal::mask8(*it) << 6) & 0xfff); + ++it; + cp += (*it) & 0x3f; + break; + case 4: + ++it; + cp = ((cp << 18) & 0x1fffff) + ((internal::mask8(*it) << 12) & 0x3ffff); + ++it; + cp += (internal::mask8(*it) << 6) & 0xfff; + ++it; + cp += (*it) & 0x3f; + break; + } + ++it; + return cp; + } + + template <typename octet_iterator> + uint32_t peek_next(octet_iterator it) + { + return next(it); + } + + template <typename octet_iterator> + uint32_t prior(octet_iterator& it) + { + while (internal::is_trail(*(--it))) ; + octet_iterator temp = it; + return next(temp); + } + + // Deprecated in versions that include prior, but only for the sake of consistency (see utf8::previous) + template <typename octet_iterator> + inline uint32_t previous(octet_iterator& it) + { + return prior(it); + } + + template <typename octet_iterator, typename distance_type> + void advance (octet_iterator& it, distance_type n) + { + for (distance_type i = 0; i < n; ++i) + next(it); + } + + template <typename octet_iterator> + typename std::iterator_traits<octet_iterator>::difference_type + distance (octet_iterator first, octet_iterator last) + { + typename std::iterator_traits<octet_iterator>::difference_type dist; + for (dist = 0; first < last; ++dist) + next(first); + return dist; + } + + template <typename u16bit_iterator, typename octet_iterator> + octet_iterator utf16to8 (u16bit_iterator start, u16bit_iterator end, octet_iterator result) + { + while (start != end) { + uint32_t cp = internal::mask16(*start++); + // Take care of surrogate pairs first + if (internal::is_lead_surrogate(cp)) { + uint32_t trail_surrogate = internal::mask16(*start++); + cp = (cp << 10) + trail_surrogate + internal::SURROGATE_OFFSET; + } + result = append(cp, result); + } + return result; + } + + template <typename u16bit_iterator, typename octet_iterator> + u16bit_iterator utf8to16 (octet_iterator start, octet_iterator end, u16bit_iterator result) + { + while (start != end) { + uint32_t cp = next(start); + if (cp > 0xffff) { //make a surrogate pair + *result++ = static_cast<uint16_t>((cp >> 10) + internal::LEAD_OFFSET); + *result++ = static_cast<uint16_t>((cp & 0x3ff) + internal::TRAIL_SURROGATE_MIN); + } + else + *result++ = static_cast<uint16_t>(cp); + } + return result; + } + + template <typename octet_iterator, typename u32bit_iterator> + octet_iterator utf32to8 (u32bit_iterator start, u32bit_iterator end, octet_iterator result) + { + while (start != end) + result = append(*(start++), result); + + return result; + } + + template <typename octet_iterator, typename u32bit_iterator> + u32bit_iterator utf8to32 (octet_iterator start, octet_iterator end, u32bit_iterator result) + { + while (start < end) + (*result++) = next(start); + + return result; + } + + // The iterator class + template <typename octet_iterator> + class iterator : public std::iterator <std::bidirectional_iterator_tag, uint32_t> { + octet_iterator it; + public: + iterator () {}; + explicit iterator (const octet_iterator& octet_it): it(octet_it) {} + // the default "big three" are OK + octet_iterator base () const { return it; } + uint32_t operator * () const + { + octet_iterator temp = it; + return next(temp); + } + bool operator == (const iterator& rhs) const + { + return (it == rhs.it); + } + bool operator != (const iterator& rhs) const + { + return !(operator == (rhs)); + } + iterator& operator ++ () + { + std::advance(it, internal::sequence_length(it)); + return *this; + } + iterator operator ++ (int) + { + iterator temp = *this; + std::advance(it, internal::sequence_length(it)); + return temp; + } + iterator& operator -- () + { + prior(it); + return *this; + } + iterator operator -- (int) + { + iterator temp = *this; + prior(it); + return temp; + } + }; // class iterator + + } // namespace utf8::unchecked +} // namespace utf8 + + +#endif // header guard + itkUnicodeIOTest.zip [^] (13,460 bytes) 2009-10-27 14:46 itkUnicodeIOTest-2009-11-02.zip [^] (5,097 bytes) 2009-11-02 13:04 itkUnicodeIOTest.cxx [^] (13,128 bytes) 2009-11-11 08:38 itk-unicodeio-2010-01-12.patch [^] (19,782 bytes) 2010-01-12 06:38 [Show Content] [Hide Content] Index: Code/Common/itkI18nIOHelpers.h =================================================================== RCS file: Code/Common/itkI18nIOHelpers.h diff -N Code/Common/itkI18nIOHelpers.h --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ Code/Common/itkI18nIOHelpers.h 12 Jan 2010 11:34:12 -0000 @@ -0,0 +1,262 @@ +/*========================================================================= + + Program: Insight Segmentation & Registration Toolkit + Module: $RCSfile: itkI18nIOHelpers.h,v $ + Language: C++ + Date: $Date: 2010-01-12 14:59:04 $ + Version: $Revision: 1.0 $ + + Copyright (c) Insight Software Consortium. All rights reserved. + See ITKCopyright.txt or http://www.itk.org/HTML/Copyright.htm for details. + + This software is distributed WITHOUT ANY WARRANTY; without even + the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR + PURPOSE. See the above copyright notices for more information. + +=========================================================================*/ +#ifndef __itkI18nIOHelpers_h +#define __itkI18nIOHelpers_h + +#include "itkConfigure.h" + +#ifdef ITK_HAVE_UNISTD_H +# include <unistd.h> // for unlink +#else +# include <io.h> +#endif + +#include <stdio.h> // Borland needs this (cstdio does not work easy) +#include <fcntl.h> +#include <iostream> +#include <string> +#include <sys/stat.h> + +// Find out how to handle unicode filenames: +// * VS>=8.0 has _wopen and _wfopen and can open a (i/o)fstream using a wide string +// * cygwin has NO _wopen an NO _wfopen. If you really need unicode +// filenames on cygwin, just use cygwin >= 1.7 for now, it works with utf8 +// natively. Alternatively, we could try and use pure win32 functions such as +// CreateFileW and convert the win32 file handle using _open_osfhandle and _fdopen +// * VS6.0 has _wopen and _wfopen but cannot open a (i/o)fstream using a wide string +// nor can it compile fdstream => disable unicode filename support +// * Borland c++, VS7.x and MinGW have _wopen and _wfopen but cannot open a +// (i/o)fstream using a wide string. They can however compile fdstream + +#if defined(ITK_SUPPORTS_WCHAR_T_FILENAME_CSTYLEIO) \ + && ( defined(ITK_SUPPORTS_WCHAR_T_FILENAME_IOSTREAMS_CONSTRUCTORS) || defined(ITK_SUPPORTS_FDSTREAM_HPP) ) +# define LOCAL_USE_WIN32_WOPEN 1 +# include <windows.h> // required by winnls.h +# include <winnls.h> // for MultiByteToWideChar +#else +# define LOCAL_USE_WIN32_WOPEN 0 +#endif + +#if (LOCAL_USE_WIN32_WOPEN && defined(ITK_SUPPORTS_WCHAR_T_FILENAME_IOSTREAMS_CONSTRUCTORS)) \ + || (!LOCAL_USE_WIN32_WOPEN) +# define LOCAL_USE_FDSTREAM 0 +# include <fstream> +#else +# define LOCAL_USE_FDSTREAM 1 +# include "fdstream.hpp" +#endif + + +namespace itk +{ +namespace I18n +{ + +// Check if the string is correctly encoded +inline bool IsStringEncodingValid(const std::string & str) +{ +#if LOCAL_USE_WIN32_WOPEN + // Check if the string is really encoded in utf-8 using windows API + // MultiByteToWideChar returns 0 if there was a problem during conversion + // when given the MB_ERR_INVALID_CHARS flag + const int utf16_size = MultiByteToWideChar(CP_UTF8, MB_ERR_INVALID_CHARS, str.c_str(), + static_cast<int>(str.length()), 0, 0); + return (utf16_size != 0); +#else + return true; +#endif +} + +#if LOCAL_USE_WIN32_WOPEN +// Convert a utf8 encoded std::string to a utf16 encoded wstring on windows +inline std::wstring Utf8StringToWString( const std::string & str ) +{ + // We do not set the MB_ERR_INVALID_CHARS to do an approximate conversion when non + // utf8 characters are found. An alternative would be to throw an exception + + // First get the size + const int utf16_size = MultiByteToWideChar(CP_UTF8, 0, str.c_str(), + static_cast<int>(str.length()), 0, 0); + + // Now do the conversion + std::wstring wstr; + wstr.resize(utf16_size); + MultiByteToWideChar(CP_UTF8, 0, str.c_str(), + static_cast<int>(str.length()), &wstr[0], utf16_size); + + return wstr; +} + +#endif + +// Get a file descriptor from a filename (using utf8 to wstring +// on windows if requested) without specifying any specific permissions +inline int i18nopen( const std::string & str, const int & flags ) +{ +#if LOCAL_USE_WIN32_WOPEN + // cygwin has NO _wopen but mingw has + // If you really need unicode filenames on cygwin, just use cygwin >= 1.7 + // Convert to utf16 + const std::wstring str_utf16 = Utf8StringToWString( str ); + return _wopen(str_utf16.c_str(), flags); +#else + return open(str.c_str(), flags); +#endif +} + +// Get a file descriptor from a filename (using utf8 to wstring +// on windows if requested) +inline int i18nopen( const std::string & str, const int & flags, const int & mode ) +{ +#if LOCAL_USE_WIN32_WOPEN + // cygwin has NO _wopen but mingw has + // If you really need unicode filenames on cygwin, just use cygwin >= 1.7 + // Convert to utf16 + const std::wstring str_utf16 = Utf8StringToWString( str ); + return _wopen(str_utf16.c_str(), flags, mode); +#else + return open(str.c_str(), flags, mode); +#endif +} + +// Reading wrapper around i18nopen to avoid explicitely specifying the flags +inline int i18nopenforreading( const std::string & str ) +{ +#if LOCAL_USE_WIN32_WOPEN + return i18nopen(str, _O_RDONLY | _O_BINARY ); +#else + ///\todo check if cygwin has and needs the O_BINARY flag + return i18nopen(str, O_RDONLY ); +#endif +} + +// Writting wrapper around i18nopen to avoid explicitely specifying the flags +inline int i18nopenforwritting( const std::string & str, const bool append = false ) +{ +#if LOCAL_USE_WIN32_WOPEN + if (!append) return i18nopen(str, _O_WRONLY | _O_CREAT | _O_BINARY, _S_IREAD | _S_IWRITE ); + else return i18nopen(str, _O_WRONLY | _O_CREAT | _O_APPEND | _O_BINARY, _S_IREAD | _S_IWRITE ); +#else + ///\todo check if cygwin has and needs the O_BINARY flag + if (!append) return i18nopen(str, O_WRONLY | O_CREAT, S_IREAD | S_IWRITE ); + else return i18nopen(str, O_WRONLY | O_CREAT | O_APPEND, S_IREAD | S_IWRITE ); +#endif +} + +// Get a FILE * pointer from a filename (using utf8 to wstring +// on windows if requested) +inline FILE * i18nfopen( const std::string & str, const std::string & mode ) +{ +#if LOCAL_USE_WIN32_WOPEN + // cygwin has NO _wfopen but mingw has + // If you really need unicode filenames on cygwin, just use cygwin >= 1.7 + // Convert to utf16 + const std::wstring str_utf16 = Utf8StringToWString( str ); + const std::wstring mode_utf16 = Utf8StringToWString( mode ); + return _wfopen(str_utf16.c_str(), mode_utf16.c_str()); +#else + return fopen(str.c_str(), mode.c_str()); +#endif +} + +#if LOCAL_USE_FDSTREAM +class i18nofstream : public std::ostream +{ +public: + i18nofstream( const char * str, + std::ios_base::openmode mode = std::ios_base::out ) + : std::ostream(0) + , m_fd( i18nopenforwritting( str, (mode & std::ios::app)?true:false ) ) + , m_buf( m_fd ) + { + ///\todo better handle mode flag + this->rdbuf(&m_buf); + } + + ~i18nofstream() { this->close(); } + + bool is_open() { return (m_fd!=-1); } + + void close() + { + if ( m_fd!=-1 ) ::close( m_fd ); + m_fd = -1; + } + +private: + int m_fd; + boost::fdoutbuf m_buf; +}; + +class i18nifstream : public std::istream +{ +public: + i18nifstream( const char * str, + std::ios_base::openmode mode = std::ios_base::in ) + : std::istream(0) + , m_fd( i18nopenforreading( str ) ) + , m_buf( m_fd ) + { + ///\todo better handle mode flag + this->rdbuf(&m_buf); + } + + ~i18nifstream() { this->close(); } + + bool is_open() { return (m_fd!=-1); } + + void close() + { + if ( m_fd!=-1 ) ::close( m_fd ); + m_fd = -1; + } + +private: + int m_fd; + boost::fdinbuf m_buf; +}; +#elif LOCAL_USE_WIN32_WOPEN +class i18nofstream : public std::ofstream +{ +public: + i18nofstream( const char * str, std::ios_base::openmode mode = std::ios_base::out ) + : std::ofstream( Utf8StringToWString(str).c_str(), mode ) + { + } +}; + +class i18nifstream : public std::ifstream +{ +public: + i18nifstream( const char * str, std::ios_base::openmode mode = std::ios_base::in ) + : std::ifstream( Utf8StringToWString(str).c_str(), mode ) + { + } +}; +#else +typedef std::ofstream i18nofstream; +typedef std::ifstream i18nifstream; +#endif + +} // end namespace +} // end namespace + + +#undef LOCAL_USE_WIN32_WOPEN +#undef LOCAL_USE_FDSTREAM + +#endif /* __itkI18nIOHelpers_h */ Index: Testing/Code/IO/itkUnicodeIOTest.cxx =================================================================== RCS file: /cvsroot/Insight/Insight/Testing/Code/IO/itkUnicodeIOTest.cxx,v retrieving revision 1.11 diff -u -r1.11 itkUnicodeIOTest.cxx --- Testing/Code/IO/itkUnicodeIOTest.cxx 24 Nov 2009 15:16:01 -0000 1.11 +++ Testing/Code/IO/itkUnicodeIOTest.cxx 12 Jan 2010 11:34:14 -0000 @@ -15,245 +15,19 @@ PURPOSE. See the above copyright notices for more information. =========================================================================*/ -#include "itkConfigure.h" +#include "itkI18nIOHelpers.h" #include <cstdlib> // for EXIT_FAILURE and EXIT_SUCCESS #include <string.h> // for strcmp (cstring cannot be used on both Sun and VS6) -#ifdef ITK_HAVE_UNISTD_H -# include <unistd.h> // for unlink -#else -# include <io.h> -#endif - -#include <stdio.h> // Borland needs this (cstdio does not work easy) -#include <fcntl.h> -#include <iostream> -#include <string> -#include <sys/stat.h> - -// Find out how to handle unicode filenames: -// * VS>=8.0 has _wopen and _wfopen and can open a (i/o)fstream using a wide string -// * cygwin has NO _wopen an NO _wfopen. If you really need unicode -// filenames on cygwin, just use cygwin >= 1.7 for now, it works with utf8 -// natively. Alternatively, we could try and use pure win32 functions such as -// CreateFileW and convert the win32 file handle using _open_osfhandle and _fdopen -// * VS6.0 has _wopen and _wfopen but cannot open a (i/o)fstream using a wide string -// nor can it compile fdstream => disable unicode filename support -// * Borland c++, VS7.x and MinGW have _wopen and _wfopen but cannot open a -// (i/o)fstream using a wide string. They can however compile fdstream - +// Some utility functions for the test #if defined(ITK_SUPPORTS_WCHAR_T_FILENAME_CSTYLEIO) \ && ( defined(ITK_SUPPORTS_WCHAR_T_FILENAME_IOSTREAMS_CONSTRUCTORS) || defined(ITK_SUPPORTS_FDSTREAM_HPP) ) # define LOCAL_USE_WIN32_WOPEN 1 -# include <windows.h> // required by winnls.h -# include <winnls.h> // for MultiByteToWideChar #else # define LOCAL_USE_WIN32_WOPEN 0 #endif -#if (LOCAL_USE_WIN32_WOPEN && defined(ITK_SUPPORTS_WCHAR_T_FILENAME_IOSTREAMS_CONSTRUCTORS)) \ - || (!LOCAL_USE_WIN32_WOPEN) -# define LOCAL_USE_FDSTREAM 0 -# include <fstream> -#else -# define LOCAL_USE_FDSTREAM 1 -# include "fdstream.hpp" -#endif - - -namespace itk -{ -namespace Utf8 -{ - -#if LOCAL_USE_WIN32_WOPEN - -// Check if the string is really encoded in utf-8 using windows API -inline bool IsValidUtf8(const std::string & str) -{ - // MultiByteToWideChar returns 0 if there was a problem during conversion - // when given the MB_ERR_INVALID_CHARS flag - const int utf16_size = MultiByteToWideChar(CP_UTF8, MB_ERR_INVALID_CHARS, str.c_str(), - static_cast<int>(str.length()), 0, 0); - return (utf16_size != 0); -} - -// Convert a utf8 encoded std::string to a utf16 encoded wstring on windows -inline std::wstring Utf8StringToWString( const std::string & str ) -{ - // We do not set the MB_ERR_INVALID_CHARS to do an approximate conversion when non - // utf8 characters are found. An alternative would be to throw an exception - - // First get the size - const int utf16_size = MultiByteToWideChar(CP_UTF8, 0, str.c_str(), - static_cast<int>(str.length()), 0, 0); - - // Now do the conversion - std::wstring wstr; - wstr.resize(utf16_size); - MultiByteToWideChar(CP_UTF8, 0, str.c_str(), - static_cast<int>(str.length()), &wstr[0], utf16_size); - - return wstr; -} - -#endif - -// Get a file descriptor from a utf8 encoded filename -// without specifying any specific permissions -inline int utf8open( const std::string & str, const int & flags ) -{ -#if LOCAL_USE_WIN32_WOPEN - // cygwin has NO _wopen but mingw has - // If you really need unicode filenames on cygwin, just use cygwin >= 1.7 - // Convert to utf16 - const std::wstring str_utf16 = Utf8StringToWString( str ); - return _wopen(str_utf16.c_str(), flags); -#else - return open(str.c_str(), flags); -#endif -} - -// Get a file descriptor from a utf8 encoded filename -inline int utf8open( const std::string & str, const int & flags, const int & mode ) -{ -#if LOCAL_USE_WIN32_WOPEN - // cygwin has NO _wopen but mingw has - // If you really need unicode filenames on cygwin, just use cygwin >= 1.7 - // Convert to utf16 - const std::wstring str_utf16 = Utf8StringToWString( str ); - return _wopen(str_utf16.c_str(), flags, mode); -#else - return open(str.c_str(), flags, mode); -#endif -} - -// Reading wrapper around utf8open to avoid explicitely specifying the flags -inline int utf8openforreading( const std::string & str ) -{ -#if LOCAL_USE_WIN32_WOPEN - return utf8open(str, _O_RDONLY | _O_BINARY ); -#else - ///\todo check if cygwin has and needs the O_BINARY flag - return utf8open(str, O_RDONLY ); -#endif -} - -// Writting wrapper around utf8open to avoid explicitely specifying the flags -inline int utf8openforwritting( const std::string & str, const bool append = false ) -{ -#if LOCAL_USE_WIN32_WOPEN - if (!append) return utf8open(str, _O_WRONLY | _O_CREAT | _O_BINARY, _S_IREAD | _S_IWRITE ); - else return utf8open(str, _O_WRONLY | _O_CREAT | _O_APPEND | _O_BINARY, _S_IREAD | _S_IWRITE ); -#else - ///\todo check if cygwin has and needs the O_BINARY flag - if (!append) return utf8open(str, O_WRONLY | O_CREAT, S_IREAD | S_IWRITE ); - else return utf8open(str, O_WRONLY | O_CREAT | O_APPEND, S_IREAD | S_IWRITE ); -#endif -} - -// Get a FILE * pointer from a utf8 encoded filename -inline FILE * utf8fopen( const std::string & str, const std::string & mode ) -{ -#if LOCAL_USE_WIN32_WOPEN - // cygwin has NO _wfopen but mingw has - // If you really need unicode filenames on cygwin, just use cygwin >= 1.7 - // Convert to utf16 - const std::wstring str_utf16 = Utf8StringToWString( str ); - const std::wstring mode_utf16 = Utf8StringToWString( mode ); - return _wfopen(str_utf16.c_str(), mode_utf16.c_str()); -#else - return fopen(str.c_str(), mode.c_str()); -#endif -} - -#if LOCAL_USE_FDSTREAM -class utf8ofstream : public std::ostream -{ -public: - utf8ofstream( const char * str, - std::ios_base::openmode mode = std::ios_base::out ) - : std::ostream(0) - , m_fd( utf8openforwritting( str, (mode & std::ios::app)?true:false ) ) - , m_buf( m_fd ) - { - ///\todo better handle mode flag - this->rdbuf(&m_buf); - } - - ~utf8ofstream() { this->close(); } - - bool is_open() { return (m_fd!=-1); } - - void close() - { - if ( m_fd!=-1 ) ::close( m_fd ); - m_fd = -1; - } - -private: - int m_fd; - boost::fdoutbuf m_buf; -}; - -class utf8ifstream : public std::istream -{ -public: - utf8ifstream( const char * str, - std::ios_base::openmode mode = std::ios_base::in ) - : std::istream(0) - , m_fd( utf8openforreading( str ) ) - , m_buf( m_fd ) - { - ///\todo better handle mode flag - this->rdbuf(&m_buf); - } - - ~utf8ifstream() { this->close(); } - - bool is_open() { return (m_fd!=-1); } - - void close() - { - if ( m_fd!=-1 ) ::close( m_fd ); - m_fd = -1; - } - -private: - int m_fd; - boost::fdinbuf m_buf; -}; -#elif LOCAL_USE_WIN32_WOPEN -class utf8ofstream : public std::ofstream -{ -public: - utf8ofstream( const char * str, std::ios_base::openmode mode = std::ios_base::out ) - : std::ofstream( Utf8StringToWString(str).c_str(), mode ) - { - } -}; - -class utf8ifstream : public std::ifstream -{ -public: - utf8ifstream( const char * str, std::ios_base::openmode mode = std::ios_base::in ) - : std::ifstream( Utf8StringToWString(str).c_str(), mode ) - { - } -}; -#else -typedef std::ofstream utf8ofstream; -typedef std::ifstream utf8ifstream; -#endif - -} // end namespace -} // end namespace - - - -// Some utility functions for the test - // Check if alpha.txt exists using _wfopen with a wstring on MSVC and mingw // and fopen with a UTF-8 char * otherwise bool checkAlphaExists() @@ -321,21 +95,21 @@ utf8_str.append(1, (char)(0xCE)); utf8_str.append(1, (char)(0xB1)); utf8_str += ".txt"; - -#if LOCAL_USE_WIN32_WOPEN - // Check if we actually find it is a valid utf-8 string - if ( !itk::Utf8::IsValidUtf8(utf8_str) ) + + // Check if we actually find it is a valid string + if ( !itk::I18n::IsStringEncodingValid(utf8_str) ) { std::cout << "Wrongly detected invalid utf8 string." << std::endl; ++nberror; } - // Check that the conversion worked +#if LOCAL_USE_WIN32_WOPEN + // Check that the string to wide string conversion works // Borland does not understand L"\u03B1.txt" std::wstring utf16_str; utf16_str.push_back((wchar_t)(0x03B1)); utf16_str += L".txt"; - const std::wstring fromutf8_utf16_str = itk::Utf8::Utf8StringToWString( utf8_str ); + const std::wstring fromutf8_utf16_str = itk::I18n::Utf8StringToWString( utf8_str ); if ( fromutf8_utf16_str != utf16_str ) { @@ -350,7 +124,7 @@ bad_utf8_str += ".txt"; // Check if we actually find it is a non-valid utf-8 string - if ( itk::Utf8::IsValidUtf8(bad_utf8_str) ) + if ( itk::I18n::IsStringEncodingValid(bad_utf8_str) ) { std::cout << "Did not detect invalid utf8 string using windows API." << std::endl; ++nberror; @@ -364,7 +138,7 @@ // Create alpha.txt using utf8fopen - FILE * wfile = itk::Utf8::utf8fopen(utf8_str, "wb"); + FILE * wfile = itk::I18n::i18nfopen(utf8_str, "wb"); if (!checkAlphaExists()) { @@ -383,7 +157,7 @@ ++nberror; } - FILE * rfile = itk::Utf8::utf8fopen(utf8_str, "rb"); + FILE * rfile = itk::I18n::i18nfopen(utf8_str, "rb"); if (rfile!=NULL) { @@ -401,7 +175,7 @@ } else { - std::cout << "Could not read from file after utf8fopen." << std::endl; + std::cout << "Could not read from file after i18nfopen." << std::endl; ++nberror; } fclose(rfile); @@ -414,16 +188,16 @@ if (!removeAlpha()) { - std::cout << "Could not remove alpha.txt after utf8fopen." << std::endl; + std::cout << "Could not remove alpha.txt after i18nfopen." << std::endl; ++nberror; } // Create alpha.txt using open and write to it using streams - itk::Utf8::utf8ofstream wstream(utf8_str.c_str(), std::ios::binary | std::ios::out ); + itk::I18n::i18nofstream wstream(utf8_str.c_str(), std::ios::binary | std::ios::out ); if (!checkAlphaExists()) { - std::cout << "alpha.txt does not exist after utf8ofstream creation." << std::endl; + std::cout << "alpha.txt does not exist after i18nofstream creation." << std::endl; ++nberror; } @@ -439,7 +213,7 @@ wstream.close(); - itk::Utf8::utf8ifstream rstream(utf8_str.c_str(), std::ios::binary | std::ios::in ); + itk::I18n::i18nifstream rstream(utf8_str.c_str(), std::ios::binary | std::ios::in ); if (rstream.is_open()) { @@ -464,7 +238,7 @@ if (!removeAlpha()) { - std::cout << "Could not remove alpha.txt after utf8ofstreamcreation." << std::endl; + std::cout << "Could not remove alpha.txt after i18nofstreamcreation." << std::endl; ++nberror; } | ||||||||||||
Relationships | |
Relationships |
Notes | |
(0017844) Tom Vercauteren (developer) 2009-09-30 12:05 |
For the record, on linux, this unit test ( itk-unicodewritetest-2009-09-30.patch) passes without issue. |
(0018243) Tom Vercauteren (developer) 2009-10-26 15:58 |
I have attached for review a preliminary patch (itk-msvc-unicode-2009-10-26.patch) that allows the use of utf-8 encoded filenames on windows for the following formats: - jpeg - png - meta (mhd and mha) - tiff Feedback is welcome! |
(0018248) Brad King (manager) 2009-10-27 10:26 |
I applied itk-msvc-unicode-2009-10-26.patch locally and scrolled through the changes. I think blocks like +#ifdef _MSC_VER + // Convert to utf16 should test _WIN32 instead...we want to convert to utf16 and use the wide character *windows* API. TIFFOpenW does this already. |
(0018249) Brad King (manager) 2009-10-27 10:28 |
For reference, here is the mailing list thread in which this bug is discussed: http://www.itk.org/mailman/private/insight-developers/2009-October/013464.html [^] |
(0018251) Tom Vercauteren (developer) 2009-10-27 14:47 |
Apparently things are a bit more complex than I thought. * cygwin (latest stable version) has no unicode support at all: * _wfopen and _wunlink are NOT available * std::ofstream and std::ifstream have NO open(wchar_t * filename) function * mingw (latest stable version) has partial unicode support: * _wfopen and _wunlink are available * std::ofstream and std::ifstream have NO open(wchar_t * filename) function My proposal fully works only on MSVC. Making it work on mingw will require more change to metaio, i.e. moving from std::ofstream and std::ifstream to FILE * approaches. The attached test project (itkUnicodeIOTest.zip) shows the results of my experiments. |
(0018252) Brad King (manager) 2009-10-27 15:13 |
FYI, I was able to read a unicode filename with the GNU compiler and C++ streams on cygwin like this: $ cat myfile.txt hello, world $ cat stdio_filebuf.cxx #include <sys/types.h> #include <sys/stat.h> #include <sys/fcntl.h> #include <ext/stdio_filebuf.h> #include <iostream> #include <io.h> int main() { int fd = _wopen(L"myfile.txt", O_RDONLY); __gnu_cxx::stdio_filebuf<char> ibuf(fd, std::ios::in); std::istream in(&ibuf); std::cout << in.rdbuf(); return 0; } $ g++ -mno-cygwin stdio_filebuf.cxx $ ./a.exe hello, world I think it also works with stdio.h-style C FILE* buffers. However, if there is no _wfopen then that may not be an option. |
(0018256) Tom Vercauteren (developer) 2009-10-28 10:59 |
Thanks for the information Brad! I managed to use this code with MinGW but not with Cygwin (with gcc 4). On Cygwin I get: '_wopen' was not declared in this scope (This is exactly the same as what I got for _wfopen). This seems to be related to the -mno-cygwin flag: http://www.delorie.com/howto/cygwin/mno-cygwin-howto.html [^] However adding the -mno-cygwin flag leads to g++: The -mno-cygwin flag has been removed; use a mingw-targeted cross-compiler. This is apparently a know issue of cygwin's gcc-4: http://cygwin.com/ml/cygwin/2009-10/msg00061.html [^] Anyhow, I also found an alternative to __gnu_cxx::stdio_filebuf, that consist of a single header file and is apparently portable to the platforms that we target: http://www.josuttis.com/cppcode/fdstream.html [^] More experimenting is required, I'll keep information coming on the bug tracker when I get some more time to work on it. |
(0018318) Tom Vercauteren (developer) 2009-11-02 13:16 |
I have been experimenting with fdstream. fdstream allows the creation of an istream or ostream from a file descriptor. It seems to work just fine on all plateforms I tried (linux 32 bit with gcc, windows with MSVC, cygwin's gcc and mingw). Therefore if file with a unicode encoded filename can be opened, performing IO operations on a stream should work. The attached test (itkUnicodeIOTest-2009-11-02.zip) showed that IO operations on file with unicode filenames works on: * linux * windows with MSVC * windows with MinGW ----- Note that cygwin doesn't work. This does not contradict Brad's experiment because adding the -mno-cygwin flag to cygwin's compiler essentially turns the compiler into the mingw compiler as __MINGW32__ becomes defined and __CYGWIN__ becomes undefined: /cygdrive/c/cygwin/bin/gcc-3.exe -mno-cygwin -dM -E- < /dev/null | sort #define WIN32 1 #define WINNT 1 #define _WIN32 1 #define _X86_ 1 #define __CHAR_BIT__ 8 #define __DBL_DENORM_MIN__ 4.9406564584124654e-324 #define __DBL_DIG__ 15 #define __DBL_EPSILON__ 2.2204460492503131e-16 #define __DBL_HAS_INFINITY__ 1 #define __DBL_HAS_QUIET_NAN__ 1 #define __DBL_MANT_DIG__ 53 #define __DBL_MAX_10_EXP__ 308 #define __DBL_MAX_EXP__ 1024 #define __DBL_MAX__ 1.7976931348623157e+308 #define __DBL_MIN_10_EXP__ (-307) #define __DBL_MIN_EXP__ (-1021) #define __DBL_MIN__ 2.2250738585072014e-308 #define __DECIMAL_DIG__ 21 #define __FINITE_MATH_ONLY__ 0 #define __FLT_DENORM_MIN__ 1.40129846e-45F #define __FLT_DIG__ 6 #define __FLT_EPSILON__ 1.19209290e-7F #define __FLT_EVAL_METHOD__ 2 #define __FLT_HAS_INFINITY__ 1 #define __FLT_HAS_QUIET_NAN__ 1 #define __FLT_MANT_DIG__ 24 #define __FLT_MAX_10_EXP__ 38 #define __FLT_MAX_EXP__ 128 #define __FLT_MAX__ 3.40282347e+38F #define __FLT_MIN_10_EXP__ (-37) #define __FLT_MIN_EXP__ (-125) #define __FLT_MIN__ 1.17549435e-38F #define __FLT_RADIX__ 2 #define __GNUC_MINOR__ 4 #define __GNUC_PATCHLEVEL__ 4 #define __GNUC__ 3 #define __GXX_ABI_VERSION 1002 #define __INT_MAX__ 2147483647 #define __LDBL_DENORM_MIN__ 3.64519953188247460253e-4951L #define __LDBL_DIG__ 18 #define __LDBL_EPSILON__ 1.08420217248550443401e-19L #define __LDBL_HAS_INFINITY__ 1 #define __LDBL_HAS_QUIET_NAN__ 1 #define __LDBL_MANT_DIG__ 64 #define __LDBL_MAX_10_EXP__ 4932 #define __LDBL_MAX_EXP__ 16384 #define __LDBL_MAX__ 1.18973149535723176502e+4932L #define __LDBL_MIN_10_EXP__ (-4931) #define __LDBL_MIN_EXP__ (-16381) #define __LDBL_MIN__ 3.36210314311209350626e-4932L #define __LONG_LONG_MAX__ 9223372036854775807LL #define __LONG_MAX__ 2147483647L #define __MINGW32__ 1 #define __MSVCRT__ 1 #define __NO_INLINE__ 1 #define __PTRDIFF_TYPE__ int #define __REGISTER_PREFIX__ #define __SCHAR_MAX__ 127 #define __SHRT_MAX__ 32767 #define __SIZE_TYPE__ unsigned int #define __STDC_HOSTED__ 1 #define __USER_LABEL_PREFIX__ _ #define __USING_SJLJ_EXCEPTIONS__ 1 #define __VERSION__ "3.4.4 (cygming special, gdc 0.12, using dmd 0.125)" #define __WCHAR_MAX__ 65535U #define __WCHAR_TYPE__ short unsigned int #define __WIN32 1 #define __WIN32__ 1 #define __WINT_TYPE__ unsigned int #define __cdecl __attribute__((__cdecl__)) #define __declspec(x) __attribute__((x)) #define __fastcall __attribute__((__fastcall__)) #define __i386 1 #define __i386__ 1 #define __stdcall __attribute__((__stdcall__)) #define __tune_i686__ 1 #define __tune_pentiumpro__ 1 #define _cdecl __attribute__((__cdecl__)) #define _fastcall __attribute__((__fastcall__)) #define _stdcall __attribute__((__stdcall__)) #define i386 1 |
(0018319) Brad King (manager) 2009-11-02 13:26 |
Does "-mwin32" help on cygwin? |
(0018320) Tom Vercauteren (developer) 2009-11-02 14:01 |
Unfortunately, adding the "-mwin32" flag does not help on cygwin. As far as I understand it, this is really a cygwin limitation that cannot be overcome. See also this (old) email thread: http://www.mail-archive.com/cygwin@cygwin.com/msg66767.html [^] |
(0018326) Tom Vercauteren (developer) 2009-11-03 03:22 |
Good news for cygwin. The new 1.7 version that is currently in beta gets closer to the linux/mac behavior. Namely, the default encoding for filenames is set to utf-8 and things work out of the box (as on linux and mac). http://cygwin.com/1.7/cygwin-ug-net/ov-new1.7.html [^] |
(0019094) Tom Vercauteren (developer) 2010-01-12 06:43 |
In an attempt to move a little further on this issue, I would like to put all the helpers functions from my unit test http://www.itk.org/cgi-bin/viewcvs.cgi/Testing/Code/IO/itkUnicodeIOTest.cxx?root=Insight&sortby=date&view=markup [^] to one header file. I was thinking of using Code/Common/itkI18nIOHelpers.h and putting the functions in the itk::I18n namespace: https://public.kitware.com/Bug/file/2757/itk-unicodeio-2010-01-12.patch [^] Thoughts? |
(0019097) Brad King (manager) 2010-01-12 09:06 |
Fine with me. BTW, I noticed the use of the "boost" namespace in "itkExtHdrs/fdstream.hpp". When the header was only included in .cxx files that was okay. Now that it may be included through a header we need to be more careful about conflicts. If an application really tries to use boost or has its own version of that header it may conflict. Can you move the code into an itk namespace? |
(0019099) Tom Vercauteren (developer) 2010-01-12 09:58 |
fdstream.hpp is not part of boost (http://www.josuttis.com/cppcode/fdstream.html [^]). It has been proposed to boost but was not accepted. Anyway, changing the namespace to itk also seems cleaner to me. I'll commit it together with itkI18nIOHelpers.h tomorrow if I don't get any negative feedback from the itk list. |
(0019527) edice (reporter) 2010-02-15 01:20 |
Can this work be extended to add unicode support to all of the VTK file readers, etc vtkPNGReader ? It would be good to be able to pass a std::wstring |
Notes |
Issue History | |||
Date Modified | Username | Field | Change |
2009-09-30 10:15 | Benjamin Tourne | New Issue | |
2009-09-30 10:15 | Benjamin Tourne | File Added: itk-unicodewritetest-2009-09-30.patch | |
2009-09-30 10:16 | Benjamin Tourne | Tag Attached: visual | |
2009-09-30 10:16 | Benjamin Tourne | Tag Attached: Unicode | |
2009-09-30 12:05 | Tom Vercauteren | Note Added: 0017844 | |
2009-10-20 13:22 | Tom Vercauteren | File Added: utfcpptest.zip | |
2009-10-26 15:56 | Tom Vercauteren | File Added: itk-msvc-unicode-2009-10-26.patch | |
2009-10-26 15:58 | Tom Vercauteren | Note Added: 0018243 | |
2009-10-27 10:26 | Brad King | Note Added: 0018248 | |
2009-10-27 10:28 | Brad King | Note Added: 0018249 | |
2009-10-27 14:46 | Tom Vercauteren | File Added: itkUnicodeIOTest.zip | |
2009-10-27 14:47 | Tom Vercauteren | Note Added: 0018251 | |
2009-10-27 15:13 | Brad King | Note Added: 0018252 | |
2009-10-28 10:59 | Tom Vercauteren | Note Added: 0018256 | |
2009-11-02 13:04 | Tom Vercauteren | File Added: itkUnicodeIOTest-2009-11-02.zip | |
2009-11-02 13:16 | Tom Vercauteren | Note Added: 0018318 | |
2009-11-02 13:26 | Brad King | Note Added: 0018319 | |
2009-11-02 14:01 | Tom Vercauteren | Note Added: 0018320 | |
2009-11-03 03:22 | Tom Vercauteren | Note Added: 0018326 | |
2009-11-10 20:19 | Tom Vercauteren | File Added: itkUnicodeIOTest.cxx | |
2009-11-11 08:38 | Tom Vercauteren | File Deleted: itkUnicodeIOTest.cxx | |
2009-11-11 08:38 | Tom Vercauteren | File Added: itkUnicodeIOTest.cxx | |
2010-01-12 06:38 | Tom Vercauteren | File Added: itk-unicodeio-2010-01-12.patch | |
2010-01-12 06:43 | Tom Vercauteren | Note Added: 0019094 | |
2010-01-12 09:06 | Brad King | Note Added: 0019097 | |
2010-01-12 09:58 | Tom Vercauteren | Note Added: 0019099 | |
2010-02-15 01:20 | edice | Note Added: 0019527 | |
2010-11-07 09:01 | Hans Johnson | Status | new => assigned |
2010-11-07 09:01 | Hans Johnson | Assigned To | => Brad King |
Issue History |
Copyright © 2000 - 2018 MantisBT Team |