[cmake-developers] slow regex implementation in RegularExpression

Alexandru Ciobanu alex at rogue-research.com
Wed Nov 16 12:44:26 EST 2011


Hi,

I was successful in making CMake work with PCRE. As expected, it was straightforward.

The problem is that PCRE is also slow. So, I tested the same string and regex with multiple different libraries in order to assess performance. 

The regular expression in question is:
      ([^:]+): warning[ \t]*[0-9]+[ \t]*:

The string is a 6k character string, a typical compiler command line. (See my first message for sample code).

For each library the steps are:
   - regcomp() the regular expression 
   - regexec() the expression on the string 

Here is how much time it takes to process the string *one* time:
    current CMake   -- 860ms
    TRex  --  680ms    
    PCRE  -- 610ms  ( with pcre_exec() )
    PCRE  -- 990ms  ( with pcre_dfa_exec() )    
    re2  --  0.085ms
    /usr/include/regex.h  -- 0.075ms

As it can be seen re2 and the standard regex.h are orders of magnitude faster in executing this particular regular expression. 

The difference between PCRE and re2 is also confirmed by this study:
    http://swtch.com/~rsc/regexp/regexp3.html

CONCLUSTION:
   - PCRE is not fast enough

QUESTION:
   - is there a reason we shouldn't use the standard regex.h?

sincerely,
Alex Ciobanu



On 2011-11-15, at 10:30 AM, Pau Garcia i Quiles wrote:

> Hi,
> 
> If it's of any help, I used the pcrecpp library by Google (it's part
> of PCRE). With pcrecpp, most operations were only 1-3 lines long. The
> only problem I found is PCRE provided no way to get the previous/next
> match, which CMake needs.
> 
> 
> 
> On Tue, Nov 15, 2011 at 4:25 PM, Alexandru Ciobanu
> <alex at rogue-research.com> wrote:
>> Hi Bill and Pau,
>> 
>> I am currently working on adding PCRE to CMake. Chances are very hight that it will work, given the very similar comp()/exec() API calls in both implementations.
>> 
>> I'll let you know about the results soon.
>> 
>> Alex
>> 
>> 
>> On 2011-11-14, at 10:31 PM, Bill Hoffman wrote:
>> 
>>> On 11/14/2011 6:08 PM, Pau Garcia i Quiles wrote:
>>>> Bill,
>>>> 
>>>> I think the current incarnation of regexps in CMake should be kept for
>>>> compatibility reasons.
>>>> 
>>> Yes, of course.
>>> 
>>>> Adding PCRE is not difficult, just time consuming. The implementation
>>>> I'd do would be an additional abstraction layer:
>>>> - For the current BRE implementation, it would be a 1:1 call match
>>>> - For the PCRE implementation, it would keep match status, count,
>>>> next/previous iterators, etc.
>>>> 
>>> So, for this case I would be interested to here from Alex to see if swapping out the regex will fix the ctest performance issue.  It is a nice isolated place to give PCRE a try.
>>> 
>>> -Bill
>>> --
>>> 
>>> Powered by www.kitware.com
>>> 
>>> Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html
>>> 
>>> Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ
>>> 
>>> Follow this link to subscribe/unsubscribe:
>>> http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers
>> 
>> 
> 
> 
> 
> -- 
> Pau Garcia i Quiles
> http://www.elpauer.org
> (Due to my workload, I may need 10 days to answer)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://public.kitware.com/pipermail/cmake-developers/attachments/20111116/ab51e6f9/attachment.html>


More information about the cmake-developers mailing list