[vtkusers] vtkUnicodeString::from_utf8(): not a valid UTF-8 string.

David Gobbi david.gobbi at gmail.com
Wed Aug 19 09:39:19 EDT 2015


Apologies, I missed a parameter when I did the cut-and-paste:

  std::string convertLatin1ToUTF8(const char *text, size_t l)
  {
    // latin1, codepage is identity
    const char *cp = text;
    size_t m = l;
    // compute the size of the UTF-8 string
    for (size_t n = 0; n < l; n++)
      {
      m += static_cast<unsigned char>(*cp++) >> 7;
      }
    cp = text;
    std::string s;
    s.resize(m);
    // encode as UTF-8
    size_t i = 0;
    while (i < m)
      {
      while (i < m && (*cp & 0x80) == 0)
        {
        s[i++] = *cp++;
        }
      if (i < m)
        {
        int code = static_cast<unsigned char>(*cp++);
        s[i++] = (0xC0 | (code >> 6));
        s[i++] = (0x80 | (code & 0x3F));
        }
      }
      return s;

On Wed, Aug 19, 2015 at 7:30 AM, David Gobbi <david.gobbi at gmail.com> wrote:

> Hi Jose,
>
> I have a function that efficiently converts latin1 to utf-8 (it's part of
> my DICOM code):
>
>   std::string convertLatin1ToUTF8(const char *text)
>   {
>     // latin1, codepage is identity
>     const char *cp = text;
>     size_t m = l;
>     // compute the size of the UTF-8 string
>     for (size_t n = 0; n < l; n++)
>       {
>       m += static_cast<unsigned char>(*cp++) >> 7;
>       }
>     cp = text;
>     std::string s;
>     s.resize(m);
>     // encode as UTF-8
>     size_t i = 0;
>     while (i < m)
>       {
>       while (i < m && (*cp & 0x80) == 0)
>         {
>         s[i++] = *cp++;
>         }
>       if (i < m)
>         {
>         int code = static_cast<unsigned char>(*cp++);
>         s[i++] = (0xC0 | (code >> 6));
>         s[i++] = (0x80 | (code & 0x3F));
>         }
>       }
>       return s;
>   }
>
>
> On Wed, Aug 19, 2015 at 7:18 AM, Jose Barreto <jose.de.paula at live.com>
> wrote:
>
>> Thank you David,
>> It works here.
>> I'm leaving two methods in a file if someone needs to use too.
>> A String ^ converts to char * and other formats for UTF-8.
>>
>> maniputação_String_VTK.txt
>> <
>> http://vtk.1045678.n5.nabble.com/file/n5733560/maniputa%C3%A7%C3%A3o_String_VTK.txt
>> >
>>
>>
>>
>> --
>> View this message in context:
>> http://vtk.1045678.n5.nabble.com/vtkUnicodeString-from-utf8-not-a-valid-UTF-8-string-tp5733543p5733560.html
>> Sent from the VTK - Users mailing list archive at Nabble.com.
>> _______________________________________________
>> Powered by www.kitware.com
>>
>> Visit other Kitware open-source projects at
>> http://www.kitware.com/opensource/opensource.html
>>
>> Please keep messages on-topic and check the VTK FAQ at:
>> http://www.vtk.org/Wiki/VTK_FAQ
>>
>> Search the list archives at: http://markmail.org/search/?q=vtkusers
>>
>> Follow this link to subscribe/unsubscribe:
>> http://public.kitware.com/mailman/listinfo/vtkusers
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://public.kitware.com/pipermail/vtkusers/attachments/20150819/8c0a0d42/attachment-0001.html>


More information about the vtkusers mailing list