[cmake-developers] slow regex implementation in RegularExpression

Pau Garcia i Quiles pgquiles at elpauer.org
Mon Nov 14 13:13:45 EST 2011


Hi,

Check this:

A wish a day 11: Perl Compatible Regular Expressions in CMake
http://www.elpauer.org/?p=684

Unfortunately the student turned out to be a total fraud: he knew nothing
about CMake, regular expressions (much less PCRE!), git, and could barely
manage with C/C++. After months of explaining *really* basic stuff (such as
the difference between a static and a shared library), he silently gave up.

I do have an initial implementation and extensive information on how to
implement PCRE in CMake. It's just I don't have enough spare time to do
that, and at work I cannot justify investing so many time in CMake for free
(for now, we don't need advanced regular expressions)


On Mon, Nov 14, 2011 at 6:57 PM, Alexandru Ciobanu
<alex at rogue-research.com>wrote:

> Hi,
>
> Our team is affected by issue 0012381, that causes extremely poor
> performance by CTest. Details here:
>     http://public.kitware.com/Bug/view.php?id=12381
>
> I've created a small test case that demonstrates the problem. Please find
> the .cpp file attached.
>
> From what I see, the RegularExpression class uses Henry Spencer regex
> implementation, which is known to be slow for some cases.
>
> On my machine, the attached example runs in 0.8 sec. Just to process one
> string!
>   $ time ./repr
>       real     0m0.865s
>       user     0m0.862s
>       sys      0m0.002s
>
> Grep can process 100k such strings in 0.5 sec (which includes reading a
> 570MB file from disk):
>   $ wc -l big.str.txt
>      100000 big.str.txt
>   $ ls -lh big.str.txt
>       -rw-r--r--  1 alex  staff   572M 14 Nov 12:30 big.str.txt
>   $ time grep "([^:]+): warning[ \t]*[0-9]+[ \t]*:" big.str.txt
>       real     0m0.525s
>       user     0m0.255s
>       sys      0m0.269s
>
> I see three ways to fix this problem:
>  A) use a trusted 3rd party regex library, like re2 or pcre
>  B) find another self-contained regex implementation
>  C) try to use the standard POSIX regex available in regex.h on most
> systems
>
> I tried to find another self-contained regex implementation, that we could
> use. I found Tiny REX, but it is as slow, in this case, as Henry Spencer's
> implementation.
>
> So what do you think is the best way to proceed about this problem?
>
> sincerely,
> Alex Ciobanu
>
>
>
>
> --
>
> Powered by www.kitware.com
>
> Visit other Kitware open-source projects at
> http://www.kitware.com/opensource/opensource.html
>
> Please keep messages on-topic and check the CMake FAQ at:
> http://www.cmake.org/Wiki/CMake_FAQ
>
> Follow this link to subscribe/unsubscribe:
> http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers
>



-- 
Pau Garcia i Quiles
http://www.elpauer.org
(Due to my workload, I may need 10 days to answer)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://public.kitware.com/pipermail/cmake-developers/attachments/20111114/0ceb6287/attachment.html>


More information about the cmake-developers mailing list