[cmake-developers] slow regex implementation in RegularExpression

Alexandru Ciobanu alex at rogue-research.com
Mon Nov 14 12:57:57 EST 2011


Hi,

Our team is affected by issue 0012381, that causes extremely poor performance by CTest. Details here: 
     http://public.kitware.com/Bug/view.php?id=12381

I've created a small test case that demonstrates the problem. Please find the .cpp file attached.

From what I see, the RegularExpression class uses Henry Spencer regex implementation, which is known to be slow for some cases.

On my machine, the attached example runs in 0.8 sec. Just to process one string!
   $ time ./repr
       real	0m0.865s
       user	0m0.862s
       sys	0m0.002s

Grep can process 100k such strings in 0.5 sec (which includes reading a 570MB file from disk):
   $ wc -l big.str.txt 
      100000 big.str.txt
   $ ls -lh big.str.txt 
       -rw-r--r--  1 alex  staff   572M 14 Nov 12:30 big.str.txt
   $ time grep "([^:]+): warning[ \t]*[0-9]+[ \t]*:" big.str.txt
       real	0m0.525s
       user	0m0.255s
       sys	0m0.269s

I see three ways to fix this problem:
  A) use a trusted 3rd party regex library, like re2 or pcre
  B) find another self-contained regex implementation 
  C) try to use the standard POSIX regex available in regex.h on most systems

I tried to find another self-contained regex implementation, that we could use. I found Tiny REX, but it is as slow, in this case, as Henry Spencer's implementation.

So what do you think is the best way to proceed about this problem?

sincerely,
Alex Ciobanu 



-------------- next part --------------
A non-text attachment was scrubbed...
Name: repr.cpp
Type: application/octet-stream
Size: 6460 bytes
Desc: not available
URL: <http://public.kitware.com/pipermail/cmake-developers/attachments/20111114/958ec299/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Makefile
Type: application/octet-stream
Size: 153 bytes
Desc: not available
URL: <http://public.kitware.com/pipermail/cmake-developers/attachments/20111114/958ec299/attachment-0005.obj>


More information about the cmake-developers mailing list