[CMake] [PATCH] major performance improvement for the C dependency scanner

Alexander Neundorf a.neundorf-work at gmx.net
Wed Nov 30 13:24:12 EST 2005


Hi, 
 
the attached patch reduces the time a cmake-generated Makefile needs 
until it actually starts to compile something on my box from 23 s down to 
7 seconds. This is still much too long, but already a lot better. 
The box is a PIII/450 MHz, the depend.make file is about 8500 lines long. 
 
The patch does the following: cmDependsC::Scan() scans a C file line by 
line for included files. If a header is included in multiple files, it is 
scanned for each file again. The patch introduces a cache, which caches 
all found include-lines per file. 
Then if a file should be scanned, at first the cache is checked and if it 
already contains the file, it isn't scanned again. 
 
The patch has an issue, my depend.make went down from 8500 lines to 6500 
lines. I'm not sure where this comes from. I guess it must be related to 
the header search path. I noticed that some of the depend-files in 
depend.make have absolute paths and some have relative paths and some 
paths also contain "../". Maybe this is somehow related to the 2000 fewer 
lines in depend.make. 
 
930 files were scanned, several thousand were used from the cache. 
 
Further ideas how to make it faster: 
 
* maybe the line-by-line reading is not optimal 
 
At first read the complete file could be read into memory, and then 
parsed from there. Not sure how much it would gain. 
 
* in most C files the include lines are at the top of the file 
 
It would be nice if this could be somehow exploited. Maybe if the 
complete file is in memory, before actually parsing it line by line, just 
go completely through it and simply count the '#' it contains. Then parse 
it line by line, and also count the '#'. If all have been found, stop 
processing this file. I would hope that the benefit of stopping 
(expensive regexp) parsing line by line earlier is bigger than the cost 
of (cheap) single-byte comparison of the whole file. 
 
* cache the contents of the new m_fileCache on a file on disk 
 
When a cmDependsC object is created, fill m_fileCache with the contents 
of the saved file. Then check for each file whether it changed since the 
cache was written on disk. If that's the case, remove the entry from the 
cache. I think this could be a big speedup, but it is slightly beyond my 
cmake-hacking skills. 
 
* make the m_fileCache more global 
 
In the current patch the m_fileCache is created and deleted for every 
cmDependsC object. I don't know under which circumstances such an object 
is created. In my project three of them were created (maybe one for C 
files, one for C++ files, and another one ?). 
If the cache would be shared for all of them, it could be a significant 
gain. Maybe it should be cleared for every new target. I don't know where 
this would have to be done in the code. 
 
 
What do you think ? 
 
Bye 
Alex 
 

-- 
Telefonieren Sie schon oder sparen Sie noch?
NEU: GMX Phone_Flat http://www.gmx.net/de/go/telefonie
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cmDependsC.patch
Type: text/x-diff
Size: 3756 bytes
Desc: not available
Url : http://public.kitware.com/pipermail/cmake/attachments/20051130/72734051/cmDependsC.bin


More information about the CMake mailing list