[cmake-developers] slow regex implementation in RegularExpression

Pau Garcia i Quiles pgquiles at elpauer.org
Mon Nov 14 18:08:36 EST 2011


I think the current incarnation of regexps in CMake should be kept for
compatibility reasons.

Adding PCRE is not difficult, just time consuming. The implementation I'd
do would be an additional abstraction layer:
- For the current BRE implementation, it would be a 1:1 call match
- For the PCRE implementation, it would keep match status, count,
next/previous iterators, etc.

On Mon, Nov 14, 2011 at 7:30 PM, Bill Hoffman <bill.hoffman at kitware.com>wrote:

> Sorry for the top post...  However, if the issue with ctest being slow can
> be fixed by using PCRE in CMake, that is good news.  We can just link in
> the library, and replace that small part of CMake internal code that has
> the performance problem.  This should not break backwards compatibility.
>  It also gives us a way to slowly bring in PCRE into CMake.
> Alex, is there a way you can try PCRE in CMake to see if it fixes the
> problem?
> -Bill
> On 11/14/2011 1:13 PM, Pau Garcia i Quiles wrote:
>> Hi,
>> Check this:
>> A wish a day 11: Perl Compatible Regular Expressions in CMake
>> http://www.elpauer.org/?p=684
>> Unfortunately the student turned out to be a total fraud: he knew
>> nothing about CMake, regular expressions (much less PCRE!), git, and
>> could barely manage with C/C++. After months of explaining *really*
>> basic stuff (such as the difference between a static and a shared
>> library), he silently gave up.
>> I do have an initial implementation and extensive information on how to
>> implement PCRE in CMake. It's just I don't have enough spare time to do
>> that, and at work I cannot justify investing so many time in CMake for
>> free (for now, we don't need advanced regular expressions)
>> On Mon, Nov 14, 2011 at 6:57 PM, Alexandru Ciobanu
>> <alex at rogue-research.com <mailto:alex at rogue-research.**com<alex at rogue-research.com>>>
>> wrote:
>>    Hi,
>>    Our team is affected by issue 0012381, that causes extremely poor
>>    performance by CTest. Details here:
>>    http://public.kitware.com/Bug/**view.php?id=12381<http://public.kitware.com/Bug/view.php?id=12381>
>>    I've created a small test case that demonstrates the problem. Please
>>    find the .cpp file attached.
>>     >From what I see, the RegularExpression class uses Henry Spencer
>>    regex implementation, which is known to be slow for some cases.
>>    On my machine, the attached example runs in 0.8 sec. Just to process
>>    one string!
>>       $ time ./repr
>>           real     0m0.865s
>>           user     0m0.862s
>>           sys      0m0.002s
>>    Grep can process 100k such strings in 0.5 sec (which includes
>>    reading a 570MB file from disk):
>>       $ wc -l big.str.txt
>>          100000 big.str.txt
>>       $ ls -lh big.str.txt
>>           -rw-r--r--  1 alex  staff   572M 14 Nov 12:30 big.str.txt
>>       $ time grep "([^:]+): warning[ \t]*[0-9]+[ \t]*:" big.str.txt
>>           real     0m0.525s
>>           user     0m0.255s
>>           sys      0m0.269s
>>    I see three ways to fix this problem:
>>      A) use a trusted 3rd party regex library, like re2 or pcre
>>      B) find another self-contained regex implementation
>>      C) try to use the standard POSIX regex available in regex.h on
>>    most systems
>>    I tried to find another self-contained regex implementation, that we
>>    could use. I found Tiny REX, but it is as slow, in this case, as
>>    Henry Spencer's implementation.
>>    So what do you think is the best way to proceed about this problem?
>>    sincerely,
>>    Alex Ciobanu

Pau Garcia i Quiles
(Due to my workload, I may need 10 days to answer)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://public.kitware.com/pipermail/cmake-developers/attachments/20111115/e6302477/attachment.htm>

More information about the cmake-developers mailing list