[CMake] regex pitfalls

Mon Oct 22 10:26:25 EDT 2007

On 10/22/07, Josef Karthauser <joe.karthauser at geomerics.com> wrote:
>
> My first thought was to use a regex, however I couldn't get it to do
> what I expected - alas.  I've switched to a different approach.

CMake regexes have a number of pitfalls.

A big one is that ^ only matches the beginning of the input stream,
and $ only matches the end of the input stream.  They are not line
oriented.  So you end up having to do lots of "\nblahblahblah\r?\n"
dancing.  You also have to watch out for scan regions overlapping.
That \n at the beginning of a line, is also the end of the previous
line.  So if you matched the previous line, you won't match the next
line.  For this reason I've written a looping match function.

Another big one is that only greedy matches are available.  I wrote a
negation constructor to deal with that.

There's no support for matching on word boundaries.  I wrote a
split-on-word function to deal with that.  My negation constructors
were insufficient to deal with it.  Negations are good for preventing
a given string of characters, but not for making a decision about word
boundaries.

It's very clear to me that CMake regex processing would benefit
greatly from the PCRE library.  A couple of months ago, I talked about
doing something about that with another fellow.  I'm still not ready
to act upon it though.  I'm getting paid to finish my current project
using what I've got, not to implement new stuff.

Cheers,
Brandon Van Every