[vtkusers] VTK 6.1 no longer supports ASCII characters > 128? (5.10.1 was ok)

David Gobbi david.gobbi at gmail.com
Tue Jan 20 13:28:48 EST 2015


Hi Serge,

I'm not sure if this will be useful to you, but here is a function that I
wrote a while ago
for converting Latin1 to utf8.  A warning: even if utf8 works in some parts
of VTK,
it definitely doesn't work in all parts of VTK.  In particular, I suspect
that most readers
and writers do not support utf8, or only support utf8 on certain operating
systems.

Also the following unicode library is pre-installed on many systems these
days:
http://site.icu-project.org/

 - David


std::string ConvertLatin1ToUtf8(const char *text, size_t l)
{
  // compute expected size of the utf8 string and allocate it
  const char *cp = text;
  size_t m = l;
  for (size_t n = 0; n < l; n++)
    {
    // add one byte for each 8-bit character in the string
    m += static_cast<unsigned char>(*cp++) >> 7;
    }
  s.resize(m);

  // encode each latin1 character as utf8
  cp = text;
  size_t i = 0;
  while (i < m)
    {
    while (i < m && (*cp & 0x80) == 0)
      {
      // convert 7-bit character to one utf8 byte
      s[i++] = *cp++;
      }
    if (i < m)
      {
      // convert 8-bit character to two utf8 bytes
      int code = static_cast<unsigned char>(*cp++);
      s[i++] = (0xC0 | (code >> 6));
      s[i++] = (0x80 | (code & 0x3F));
      }
    }

  return s;
}

On Tue, Jan 20, 2015 at 10:41 AM, Serge Lalonde <serge at infolytica.com>
wrote:

>  I realized that I was passing a TrueType font file instead of a FreeType
> one.
> So I downloaded the DejaVu fonts as recommended here:
> http://vtkusers.public.kitware.narkive.com/5fy8hSRK/unicode-support
> But it made no difference.
>
> I also looked at the TestChartUnicode and TextContextUnicode tests to see
> how it's done there and I'm doing the same thing.
>
> So I don't see what's wrong. Anyone have any suggestions?
>
> Thanks.
>
>
> On 1/20/2015 12:03 PM, Serge Lalonde wrote:
>
> Hi Sean,
>
> Thanks for the tips on encoding. I knew that I probably wasn't using the
> right terminology and I'll read up on those soon.
>
> In the meantime, I'm no closer to being able to display "J (A/m2)" as the
> title of a vtkScalarBarActor.
>
> I tried using these APIs on the TitleTextProperty of the vtkScalarBarActor
>
>    m_VTKScalarBarActor->GetTitleTextProperty()->SetFontFamily(VTK_FONT_FILE);
>    m_VTKScalarBarActor->GetTitleTextProperty()->SetFontFile("C:\\Windows\\winsxs\\amd64_microsoft-windows-font-truetype-arial_31bf3856ad364e35_6.1.7601.18528_none_d0a29012c3ff391b\\arial.ttf");
>
> and then encoding the string to be a sequence of hex escape sequences like
> the example here:
>
>    http://marc.info/?l=vtkusers&m=138868987612759&w=2)
>
> and it just spits out the string as-is, that is something like
> "\x4A\x20\x28\x41\x2F\x6D\xC2\xB2\x29".
>
> Is there a tutorial or a wiki that shows how to use Unicode strings in
> VTK? Ideally with the system fonts? What's the magic formula? ;-)
>
> Thanks.
>
> On 1/20/2015 11:05 AM, Sean McBride wrote:
>
> On Tue, 20 Jan 2015 10:18:44 -0500, Serge Lalonde said:
>
>
>  I'm upgrading from VTK 5.10.1 to VTK 6.1. All went smoothly until I ran
> into an error rendering a vtkScalarBarActor whose title was set to "J (A/m2)".
>
> This worked fine in 5.10.1, but in 6.1, the vtkutf8::is_valid() method
> called from vtkUnicodeString::from_utf8() returns false because the
> value of "2" is 0xB2 (in the extended ASCII range) but the vtkutf8 code
> internally stops at 0x80 (sequence_length() in core.h returns 0). That
> in turn causes a debug message "vtkUnicodeString::from_utf8(): not a
> valid UTF-8 string." to appear and then other problems with vtkTextActor
> not being able to calculate its bounds and so on.
>
>  I think you are a bit confused about character encodings (it's a confusing thing!).
>
> First, there are no ASCII characters above 127.  ASCII is a 7 bit code.  What you mean to refer to is ISO-8859-1 aka Latin1.  In that encoding, the 'square' character does indeed seem to be 0xB2:
> <https://en.wikipedia.org/wiki/ISO/IEC_8859-1> <https://en.wikipedia.org/wiki/ISO/IEC_8859-1>
>
> 8859 is not part of Unicode at all and is not the same as UTF-8.
>
> There are many online Unicode tools, ex:<http://utf8-chartable.de> <http://utf8-chartable.de>
>
> Where you can see the Unicode code point for 'superscript two' is U+00B2, which encoded as UTF-8 is 'c2 b2' in hex.
>
> You might want to read this, which is a helpful classic:
> <http://www.joelonsoftware.com/articles/Unicode.html> <http://www.joelonsoftware.com/articles/Unicode.html>
>
> Cheers,
>
>
>
> --
> www.infolytica.com
> 300 Leo Pariseau, Suite 2222, Montreal, QC, Canada, H2X 4B3
> (514) 849-8752 x236, Fax: (514) 849-4239
>
>
> _______________________________________________
> Powered by www.kitware.com
>
> Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html
>
> Please keep messages on-topic and check the VTK FAQ at: http://www.vtk.org/Wiki/VTK_FAQ
>
> Search the list archives at: http://markmail.org/search/?q=vtkusers
>
> Follow this link to subscribe/unsubscribe:http://public.kitware.com/mailman/listinfo/vtkusers
>
>
> --
> www.infolytica.com
> 300 Leo Pariseau, Suite 2222, Montreal, QC, Canada, H2X 4B3
> (514) 849-8752 x236, Fax: (514) 849-4239
>
> _______________________________________________
> Powered by www.kitware.com
>
> Visit other Kitware open-source projects at
> http://www.kitware.com/opensource/opensource.html
>
> Please keep messages on-topic and check the VTK FAQ at:
> http://www.vtk.org/Wiki/VTK_FAQ
>
> Search the list archives at: http://markmail.org/search/?q=vtkusers
>
> Follow this link to subscribe/unsubscribe:
> http://public.kitware.com/mailman/listinfo/vtkusers
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://public.kitware.com/pipermail/vtkusers/attachments/20150120/bc370f98/attachment.html>


More information about the vtkusers mailing list