VTK/Unwrappable Code

From KitwarePublic
Jump to navigationJump to search

The VTK Tcl, Java, and Python wrappers use a custom parser to read the VTK C++ header files. This parser consists of the following pieces:

  • a C++ preprocessor
  • a lex/yacc C++ parser (a GNU bison GLR parser)
  • a set of data structures for describing a C++ API

As of this writing, the above are based on the C++11 grammar and are being updated for C++14 and C++17.

Syntax that the wrapper's cannot parse

The parser was written based on the ISO draft standards for C++98, C++11, and C++14. However, there are specific parts of the C++ grammar that were not implemented. These are described below.

Backslash line continuation in odd places

According to the C++ standard, any backslash that occurs at the end of a line (unless it occurs within a raw string) is meant to indicate that the following newline should be ignored. The wrapper preprocessor, however, does not allow a backslash to be used within any token except for a string literal.

This code will work:

#define mymacro(x) \
  (2*(x))

const char *s = "this is a long\
 string broken in two.";

This code will not work:

class myClassHasAVeryLongNameSo\
   IBrokeItWithABackslash;

const int i = 'A\
  ';

Universal character names in identifiers

In C++11, universal character names \uXXXX and \UXXXXXXXX can be used in place of non-ASCII characters. The wrapper preprocessor only allows these in string literals and character literals, but not in identifiers. The wrappers do, however, allow you to use utf-8 encoding for identifiers (and in strings, characters, and comments).

 // this is fine
 const char16_t *s = u"Hello\u00A0There";
 
 // this will break things
 const char *encyclop\u00C6dia = "Britannica";

Ambiguous member variable definition

C++ has an ambiguous grammar. One of the most common sources of ambiguity is that a name will sometimes be identified as a type, and sometimes as a function or variable name, depending on context.

struct x {
  typedef int z;

  // this kinda looks like a constructor
  x(z);
 
  // so would you believe that this defines a variable y of type z?
  z(y);

  // it does, because it is equivalent to writing this!
  z y;
};

The wrapper's parser does not distinguish type names from other names within its grammar rules, therefore it cannot disambiguate between the constructor declaration at the top and the funny-looking variable declaration in the middle. It will try to interpret both as constructor declarations.

Ambiguous greater-than

The following code to break the wrappers. The breakage occurs when angle brackets occur in the RHS of an assignment, unless the assignment is taking place within a function body (the parser ignores all function bodies, because they are part of the "implementation" rather than part of the "interface".).

// this looks totally natural, and is valid C++ code
const T unity = static_cast<T>(1.0);

// whitespace makes it look a bit less natural
const T unity = static_cast < T > (1.0);

The difficulty is that our parser thinks that the angle brackets might be less-than and greater-than operators, because it doesn't know that T names a type and not a constant. So it conks out after reporting "syntax is ambiguous". It is possible to disambiguate by writing the code as follows, which causes the parser to take a different path:

// this causes to parse to succeed
const T unity = (static_cast<T>(1.0));