[CMake] Regular expressions

Brandon Van Every bvanevery at gmail.com
Fri Nov 30 17:35:23 EST 2007


On Nov 30, 2007 4:52 PM, Pau Garcia i Quiles <pgquiles at elpauer.org> wrote:
> Hello,
>
> I have an initial implementation of Perl-compatible regular
> expressions (PCRE) against current CVS. It does not replace classic
> regular expressions but co-exist with them.

How are you disambiguating them?  I had thought STRING(PCRE MATCH ...)
would be a decent convention.

> I'm trying to make sense of current limitations and know if I should
> keep them or I should get rid of them. For example, it looks like
> STRING(MATCH ... ) stores the first ten matches (and only ten) in
> CMAKE_MATCH_0 .. CMAKE_MATCH_9. Should I keep that or allow infinite
> CMAKE_MATCH_number?

I'd like more matches available, but how are matches > 9 disambiguated
in the command stream?  I looked at the PCRE docs briefly but I forgot
if there's more than 1 way to do this, or if it causes any problems.
There are so many compile options in PCRE.

I want a function that retains all subexpressions of a MATCH and makes
them available as arguments, rather than forcing me to run a MATCH and
then multiple REPLACEs to extract the subexpressions.  I've
implemented a macro I call match_sub() to give me the interface I
want; see below.  The implementation is the same old piggish
MATCH..REPLACE thing you always have to do, but at least I don't have
to keep typing it.  I use the constant "SKIP_SUB" to skip
subexpressions I'm not interested in.  Useful if I just don't care, or
if the subexpression doesn't exist in some instances.

I want a MATCHALL that returns an array of arguments, not a
semicolon-separated list of elements.  Arbitrary file input has
semicolons in it, making the current MATCHALL completely useless.

# dumping all the subexpression fields into variables after a match is
a common operation.
# SKIP_SUB can be used to skip subexpressions that we're not
interested in.  This is
# important because some parenthesized subexpressions do not in fact
exist and will
# give out-of-range errors if we don't skip them.

macro(match_sub
  matchsub_regex
  matchsub_output matchsub_input
) # VAR subexpression variables
  string(REGEX MATCH
    "${${matchsub_regex}}"
    ${matchsub_output} "${${matchsub_input}}")
  if(${matchsub_output})
    set(matchsub_subexpr_number 1)
    foreach(matchsub_subexpr ${ARGN})
      if(NOT "${matchsub_subexpr}" STREQUAL "SKIP_SUB")
        string(REGEX REPLACE
          "${${matchsub_regex}}"
          "\\${matchsub_subexpr_number}"
          ${matchsub_subexpr} "${${matchsub_output}}")
      endif(NOT "${matchsub_subexpr}" STREQUAL "SKIP_SUB")
      math(EXPR matchsub_subexpr_number
        "${matchsub_subexpr_number} + 1")
    endforeach(matchsub_subexpr)
  endif(${matchsub_output})
endmacro(match_sub)

I want
STRING(REGEX REPLACE_FIRST ...)
STRING(REGEX REPLACE_LAST ...)
STRING(REPLACE_FIRST ...)
STRING(REPLACE_LAST ...)

MATCH only grabs the 1st match.  It may be dangerous to REPLACE
everywhere in the file.  Sometimes it's what you want, sometimes it's
not.  Note the non-regex versions of REPLACE.  This is needed because
typically you MATCH, process the MATCH somehow, and now you've got
some ORIGINAL and ALTERED text.  It's not safe to use the ORIGINAL
text as a regex, as it'll probably contain regex special characters.
Instead you REPLACE the ORIGINAL text with the ALTERED text, avoiding
regexes at this stage.  And, often you only want to replace the 1st
thing you matched.

I want lines to be "properly" matched with anchors.  This is a compile
option in PCRE.
0005380: REGEX ^ and $ do not match on multi-line <input>
http://www.cmake.org/Bug/view.php?id=5380

Here are some bugs that hopefully just aren't present in PCRE.

0005999: REGEX ^ does not anchor against the original string
http://www.cmake.org/Bug/view.php?id=5999

0005537: REGEX MATCH and MATCHALL can be pathologically slow
http://www.cmake.org/Bug/view.php?id=5537


Cheers,
Brandon Van Every


More information about the CMake mailing list