[cmake-developers] slow regex implementation in RegularExpression
Pau Garcia i Quiles
pgquiles at elpauer.org
Mon Nov 14 18:08:36 EST 2011
I think the current incarnation of regexps in CMake should be kept for
Adding PCRE is not difficult, just time consuming. The implementation I'd
do would be an additional abstraction layer:
- For the current BRE implementation, it would be a 1:1 call match
- For the PCRE implementation, it would keep match status, count,
next/previous iterators, etc.
On Mon, Nov 14, 2011 at 7:30 PM, Bill Hoffman <bill.hoffman at kitware.com>wrote:
> Sorry for the top post... However, if the issue with ctest being slow can
> be fixed by using PCRE in CMake, that is good news. We can just link in
> the library, and replace that small part of CMake internal code that has
> the performance problem. This should not break backwards compatibility.
> It also gives us a way to slowly bring in PCRE into CMake.
> Alex, is there a way you can try PCRE in CMake to see if it fixes the
> On 11/14/2011 1:13 PM, Pau Garcia i Quiles wrote:
>> Check this:
>> A wish a day 11: Perl Compatible Regular Expressions in CMake
>> Unfortunately the student turned out to be a total fraud: he knew
>> nothing about CMake, regular expressions (much less PCRE!), git, and
>> could barely manage with C/C++. After months of explaining *really*
>> basic stuff (such as the difference between a static and a shared
>> library), he silently gave up.
>> I do have an initial implementation and extensive information on how to
>> implement PCRE in CMake. It's just I don't have enough spare time to do
>> that, and at work I cannot justify investing so many time in CMake for
>> free (for now, we don't need advanced regular expressions)
>> On Mon, Nov 14, 2011 at 6:57 PM, Alexandru Ciobanu
>> <alex at rogue-research.com <mailto:alex at rogue-research.**com<alex at rogue-research.com>>>
>> Our team is affected by issue 0012381, that causes extremely poor
>> performance by CTest. Details here:
>> I've created a small test case that demonstrates the problem. Please
>> find the .cpp file attached.
>> >From what I see, the RegularExpression class uses Henry Spencer
>> regex implementation, which is known to be slow for some cases.
>> On my machine, the attached example runs in 0.8 sec. Just to process
>> one string!
>> $ time ./repr
>> real 0m0.865s
>> user 0m0.862s
>> sys 0m0.002s
>> Grep can process 100k such strings in 0.5 sec (which includes
>> reading a 570MB file from disk):
>> $ wc -l big.str.txt
>> 100000 big.str.txt
>> $ ls -lh big.str.txt
>> -rw-r--r-- 1 alex staff 572M 14 Nov 12:30 big.str.txt
>> $ time grep "([^:]+): warning[ \t]*[0-9]+[ \t]*:" big.str.txt
>> real 0m0.525s
>> user 0m0.255s
>> sys 0m0.269s
>> I see three ways to fix this problem:
>> A) use a trusted 3rd party regex library, like re2 or pcre
>> B) find another self-contained regex implementation
>> C) try to use the standard POSIX regex available in regex.h on
>> most systems
>> I tried to find another self-contained regex implementation, that we
>> could use. I found Tiny REX, but it is as slow, in this case, as
>> Henry Spencer's implementation.
>> So what do you think is the best way to proceed about this problem?
>> Alex Ciobanu
Pau Garcia i Quiles
(Due to my workload, I may need 10 days to answer)
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the cmake-developers