[cmake-developers] slow regex implementation in RegularExpression

Alexandru Ciobanu alex at rogue-research.com
Wed Nov 16 14:12:39 EST 2011


Hi Brad,

[1]

> On 11/16/2011 12:44 PM, Alexandru Ciobanu wrote:
>> For each library the steps are:
>> - regcomp() the regular expression
>> - regexec() the expression on the string
> 
> Can you time each of these steps separately for each library?  I would not
> be surprised if the compilation time is the bottleneck.  The evaluation and
> matching of a given string just followed a DFA which should be pretty fast.
> If it turns out that compilation is the bottleneck then we should refactor
> things to make sure CTest compiles each regex at most once so we can re-use
> the same DFA every time.


This is how I run the tests (pseudocode):
   recomp()
   repeat 1000 times:
       regexec()

The times I reported are the total run times divided by 1000.

For the slow ones (TRex,  PCRE, CMake regexp) I have to repeat 10 times only otherwise I wait too long. So it seems that regcomp() is not the problem in this case.

[2]
I have just tested another library - TRE. 

It performs well, I will put it in context:
    current CMake   -- 860ms
    TRex  --  680ms    
    PCRE  -- 610ms  ( with pcre_exec() )
    PCRE  -- 990ms  ( with pcre_dfa_exec() )    
    re2  --  0.085ms
    /usr/include/regex.h  -- 0.075ms 
    TRE  --  0.3ms                                                       ( <<<<<< NEW )

Advantages of TRE:
  - API very similar to standard regex.h (i.e. easy to integrate with CMake)
  - supports wide characters
  - compiles on many platforms Windows, AIX, HP-UX, you name it.

What do you think about TRE?

sincerely,
Alex Ciobanu

-------------- next part --------------
A non-text attachment was scrubbed...
Name: tre.test.c
Type: application/octet-stream
Size: 13340 bytes
Desc: not available
URL: <http://public.kitware.com/pipermail/cmake-developers/attachments/20111116/3d0912de/attachment-0002.obj>
-------------- next part --------------




More information about the cmake-developers mailing list