[cmake-developers] slow regex implementation in RegularExpression

Alexander Neundorf neundorf at kde.org
Wed Nov 16 12:47:27 EST 2011


On Wednesday 16 November 2011, Alexandru Ciobanu wrote:
> Hi,
> 
> I was successful in making CMake work with PCRE. As expected, it was
> straightforward.
> 
> The problem is that PCRE is also slow. So, I tested the same string and
> regex with multiple different libraries in order to assess performance.
> 
> The regular expression in question is:
>       ([^:]+): warning[ \t]*[0-9]+[ \t]*:
> 
> The string is a 6k character string, a typical compiler command line. (See
> my first message for sample code).
> 
> For each library the steps are:
>    - regcomp() the regular expression
>    - regexec() the expression on the string
> 
> Here is how much time it takes to process the string *one* time:
>     current CMake   -- 860ms
>     TRex  --  680ms
>     PCRE  -- 610ms  ( with pcre_exec() )
>     PCRE  -- 990ms  ( with pcre_dfa_exec() )
>     re2  --  0.085ms
>     /usr/include/regex.h  -- 0.075ms

I wouldn't have expected this.
 
> As it can be seen re2 and the standard regex.h are orders of magnitude
> faster in executing this particular regular expression.
> 
> The difference between PCRE and re2 is also confirmed by this study:
>     http://swtch.com/~rsc/regexp/regexp3.html
> 
> CONCLUSTION:
>    - PCRE is not fast enough
> 
> QUESTION:
>    - is there a reason we shouldn't use the standard regex.h?

Does it exist everywhere, e.g. on Windows, e.g. with MSVC 6 ?

Alex



More information about the cmake-developers mailing list