ITK/Policy and Procedures for Internationalization: Difference between revisions

From KitwarePublic
< ITK
Jump to navigationJump to search
No edit summary
 
(4 intermediate revisions by one other user not shown)
Line 9: Line 9:
==== Writing Conventions ====
==== Writing Conventions ====
Writing conventions affect date/time formats, time zones and how numbers are formatted. Number format is critical to itk applications since many file formats contain numbers in their headers. These number represent critical image information such as origin, spacing and direction. If they are improperly decoded or encoded, incorrect image information may be produced. For example, it the spacing of an image is stored as 1.7, a change in the locale of an application could interpret 1.7 as 1.0, since many regions of the world use "," as a separator rather than ".".
Writing conventions affect date/time formats, time zones and how numbers are formatted. Number format is critical to itk applications since many file formats contain numbers in their headers. These number represent critical image information such as origin, spacing and direction. If they are improperly decoded or encoded, incorrect image information may be produced. For example, it the spacing of an image is stored as 1.7, a change in the locale of an application could interpret 1.7 as 1.0, since many regions of the world use "," as a separator rather than ".".
This is not a theoretical problem. Here is a recent (edited) question on the vtk users mailing list:
<pre>
This looks weird to me. Spacing of vtkDICOMImageReader depends on
whether I use wx or not.
Can anyone reproduce?
It fails in vtk-5.2.1 and vtk-5.4.2 on linux, python-2.6.2 and wxpython-2.8.10.1
It works ok in XP with python-2.5.2, vtk-5.2.0 and wx-2.8.7.1
...
$ ./test2.py  -- with wx
(0, 511, 0, 511, 0, 370)
(0.0, 0.0, 0.0)
$ ./test3.py  -- without wx
(0, 511, 0, 511, 0, 370)
(0.48828125, 0.48828125, 0.329986572265625)
</pre>
a later post brings light on the issue:
<pre>
DICOMParser is using internally sscanf to decode the spacing. It
fails to work when LC_NUMERIC is not compatible with "C" style. I
believe by default GTK (underlying implementation of wx on linux) sets
the LC_NUMERIC corresponding to your locales.
</pre>
So, the application made a global change to the way numeric values are interpreted. itk needs to protect against these global changes.
=== I18N and ITK ===
As of release 3.16, itk does not have facilities to handle non-ascii filenames or numeric decimal character variations. A recent experiment shows that if the default locale is changed to "french", 49 out of 240 tests fail in Testing/Code/IO. There are two efforts to resolve these issues and move itk towards being a robust i18n toolkit.
# Unicode filenames and I/O.
# Locale management
==== Approaches to Handle Non-Ascii Filenames ====
==== Approaches to Handle Regional Numeric Encoding ====
Both C and C++ provide mechanisms to deal with number encoding (as well as other issues such as time/date). C locale changes affect calls to sscanf, printf, and atof. C++ locale changes can be applied to individual streams.

Latest revision as of 01:27, 11 February 2012

Background

In software, internationalization, or i18n, deals with the adaptation of software to various languages and customs used throughout the world. Since ITK is used world-wide for biomedical imaging, segmentation and registration, it is important that itk applications produce consistent results regardless of the regional differences.

Wikipedia describes four areas to consider for i18n software engineering: Language, Culture, Writing Conventions and Regulatory Compliance. Language issues and writing conventions have the biggest impact in itk.

Language Issues

Computer encoded text is an important i18n issue since it affects the internal representation of strings. Since itk has no user interface components, the primary i18n impact is on filenames.

Writing Conventions

Writing conventions affect date/time formats, time zones and how numbers are formatted. Number format is critical to itk applications since many file formats contain numbers in their headers. These number represent critical image information such as origin, spacing and direction. If they are improperly decoded or encoded, incorrect image information may be produced. For example, it the spacing of an image is stored as 1.7, a change in the locale of an application could interpret 1.7 as 1.0, since many regions of the world use "," as a separator rather than ".". This is not a theoretical problem. Here is a recent (edited) question on the vtk users mailing list:

This looks weird to me. Spacing of vtkDICOMImageReader depends on
whether I use wx or not.

Can anyone reproduce?

It fails in vtk-5.2.1 and vtk-5.4.2 on linux, python-2.6.2 and wxpython-2.8.10.1
It works ok in XP with python-2.5.2, vtk-5.2.0 and wx-2.8.7.1
...
$ ./test2.py   -- with wx
(0, 511, 0, 511, 0, 370)
(0.0, 0.0, 0.0)

$ ./test3.py  -- without wx
(0, 511, 0, 511, 0, 370)
(0.48828125, 0.48828125, 0.329986572265625)

a later post brings light on the issue:

DICOMParser is using internally sscanf to decode the spacing. It
fails to work when LC_NUMERIC is not compatible with "C" style. I
believe by default GTK (underlying implementation of wx on linux) sets
the LC_NUMERIC corresponding to your locales.

So, the application made a global change to the way numeric values are interpreted. itk needs to protect against these global changes.

I18N and ITK

As of release 3.16, itk does not have facilities to handle non-ascii filenames or numeric decimal character variations. A recent experiment shows that if the default locale is changed to "french", 49 out of 240 tests fail in Testing/Code/IO. There are two efforts to resolve these issues and move itk towards being a robust i18n toolkit.

  1. Unicode filenames and I/O.
  2. Locale management

Approaches to Handle Non-Ascii Filenames

Approaches to Handle Regional Numeric Encoding

Both C and C++ provide mechanisms to deal with number encoding (as well as other issues such as time/date). C locale changes affect calls to sscanf, printf, and atof. C++ locale changes can be applied to individual streams.