VTK/Unwrappable Code: Difference between revisions
Line 60: | Line 60: | ||
=== Ambiguous greater-than === | === Ambiguous greater-than === | ||
The following code to break the wrappers, unless | The following code to break the wrappers. The breakage occurs when angle brackets occur in the RHS of an assignment, unless the assignment is taking place within a function body (the parser ignores all function bodies, because they are part of the "implementation" rather than part of the "interface".). | ||
// this looks totally natural, and is valid C++ code | // this looks totally natural, and is valid C++ code | ||
Line 68: | Line 68: | ||
const T unity = static_cast < T > (1.0); | const T unity = static_cast < T > (1.0); | ||
The difficulty is that our parser thinks that the angle brackets might be less-than and greater-than operators, because it doesn't know that T names a type and not a constant. So it conks out after reporting "syntax is ambiguous". | The difficulty is that our parser thinks that the angle brackets might be less-than and greater-than operators, because it doesn't know that T names a type and not a constant. So it conks out after reporting "syntax is ambiguous". It is possible to disambiguate by writing the code as follows, which causes the parser to take a different path: | ||
// this causes to parse to succeed | |||
const T unity = (static_cast<T>(1.0)); |
Revision as of 00:29, 18 October 2015
The VTK Tcl, Java, and Python wrappers use a custom parser to read the VTK C++ header files. This parser consists of the following pieces:
- a C++ preprocessor
- a lex/yacc C++ parser (a GNU bison GLR parser)
- a set of data structures for describing a C++ API
As of this writing, the above are based on the C++11 grammar and are being updated for C++14 and C++17.
Syntax that the wrapper's cannot parse
The parser was written based on the ISO draft standards for C++98, C++11, and C++14. However, there are specific parts of the C++ grammar that were not implemented. These are described below.
Backslash line continuation in odd places
According to the C++ standard, any backslash that occurs at the end of a line (unless it occurs within a raw string) is meant to indicate that the following newline should be ignored. The wrapper preprocessor, however, does not allow a backslash to be used within any token except for a string literal.
This code will work:
#define mymacro(x) \ (2*(x)) const char *s = "this is a long\ string broken in two.";
This code will not work:
class myClassHasAVeryLongNameSo\ IBrokeItWithABackslash; const int i = 'A\ ';
Universal character names in identifiers
In C++11, universal character names \uXXXX and \UXXXXXXXX can be used in place of non-ASCII characters. The wrapper preprocessor only allows these in string literals and character literals, but not in identifiers. The wrappers do, however, allow you to use utf-8 encoding for identifiers (and in strings, characters, and comments).
// this is fine const char16_t *s = u"Hello\u00A0There"; // this will break things const char *encyclop\u00C6dia = "Britannica";
Ambiguous member variable definition
C++ has an ambiguous grammar. One of the most common sources of ambiguity is that a name will sometimes be identified as a type, and sometimes as a function or variable name, depending on context.
struct x { typedef int z; // this kinda looks like a constructor x(z); // so would you believe that this defines a variable y of type z? z(y); // it does, because it is equivalent to writing this! z y; };
The wrapper's parser does not distinguish type names from other names within its grammar rules, therefore it cannot disambiguate between the constructor declaration at the top and the funny-looking variable declaration in the middle. It will try to interpret both as constructor declarations.
Ambiguous greater-than
The following code to break the wrappers. The breakage occurs when angle brackets occur in the RHS of an assignment, unless the assignment is taking place within a function body (the parser ignores all function bodies, because they are part of the "implementation" rather than part of the "interface".).
// this looks totally natural, and is valid C++ code const T unity = static_cast<T>(1.0); // whitespace makes it look a bit less natural const T unity = static_cast < T > (1.0);
The difficulty is that our parser thinks that the angle brackets might be less-than and greater-than operators, because it doesn't know that T names a type and not a constant. So it conks out after reporting "syntax is ambiguous". It is possible to disambiguate by writing the code as follows, which causes the parser to take a different path:
// this causes to parse to succeed const T unity = (static_cast<T>(1.0));