[CMake] Diff output from CMake?

Clifford Yapp cliffyapp at gmail.com
Tue Sep 13 15:52:25 EDT 2011


On Tue, Sep 13, 2011 at 1:58 PM, David Cole <david.cole at kitware.com> wrote:

> On Tue, Sep 13, 2011 at 1:39 PM, Alexander Neundorf
> <a.neundorf-work at gmx.net> wrote:
> > On Tuesday, September 13, 2011 05:07:00 AM Clifford Yapp wrote:
> >> I am trying to compare two large lists of file paths (about 14,000 lines
> >> each) to identify which entries in each list are missing from the other,
> >> and while I can get CMake to do it I must be doing it the wrong way
> >> because the results are hideously slow.
> >>
> >> I currently generate two files with the paths and then read them in as
> >> lists, using LIST() commands to peform STREQUAL tests.  I was hoping to
> >
> > How do you do that ?
> > Do you iterate over one list using foreach() and then list(FIND) to check
> > whether it exists in the other list ?
>

I tried a couple of ways, most of them variations on that theme (BUILD_FILES
and SVN_FILES are two manifest lists, and I need items from each list that
are not in the other list)

FOREACH(ITEM ${BUILD_FILES})
     LIST(FIND ${ITEM} SVN_FILES POS)
     IF(NOT POS STREQUAL "-1")
        LIST(REMOVE_ITEM SVN_FILES ${ITEM})
        LIST(REMOVE_ITEM BUILD_FILES ${ITEM})
     ENDIF()
ENDFOREACH()

In essence, the idea is BUILD_FILES will end  up holding items unique to
BUILD_FILES and SVN_FILES will end up holding items unique to SVN_FILES,
which are the two pieces of information I'm after.


> > Internally, every cmake variable is stored as a plain std::string.
> > When using a list() command, this string is converted to a
> > std::vector<std::string>, and then cmake operates on this vector.
> > So if you do this 14000 times, each time a 14000 std::strings are
> created,
> > which makes this O(n^2) I think.
> >
> > Maybe something like this works ?
> >
> > set(uniqueItems ${list1} ${list2})
> > list(REMOVE_DUPLICATES uniqueItems )
>

I thought about REMOVE_DUPLICATES, but if I understand correctly wouldn't
that give me the union of the two lists?  I need everything not present in
both lists.


> I realize this same technique will not be very useful with files
> 14,000 lines long... but thought I'd mention it so you could look at
> it and perhaps draw inspiration from it.
>

Basically I'm currently trying a variation on that, except I'm trying to get
a cross-platform comparing tool working that I can guarantee will be there -
I'm trying comm at the moment but I'm not sure comm's notions of what
consistute a sorted file and CMake's LIST(SORT are compatible when it comes
to things like .file and ~file.


> If you could pass along the code that you're using, we might be able
> to suggest a better way to achieve the same thing within the CMake
> language if that's absolutely necessary.
>

Currently (using comm) I'm doing this (replacing /. with /tmpdot in an
attempt to let comm handle the sorted output):

STRING(REGEX REPLACE "/\\." "/tmpdot" BUILD_FILES "${BUILD_FILES}")
STRING(REGEX REPLACE "/\\." "/tmpdot" SVN_FILES "${SVN_FILES}")
LIST(SORT BUILD_FILES)
LIST(SORT SVN_FILES)
STRING(REGEX REPLACE ";" "\n" BUILD_FILES "${BUILD_FILES}")
STRING(REGEX REPLACE ";" "\n" SVN_FILES "${SVN_FILES}")
FILE(WRITE @CMAKE_BINARY_DIR@/build_files_list.txt ${BUILD_FILES})
FILE(WRITE @CMAKE_BINARY_DIR@/svn_files_list.txt ${SVN_FILES})
STRING(REGEX REPLACE "\n" ";" BUILD_FILES "${BUILD_FILES}")
STRING(REGEX REPLACE "\n" ";" SVN_FILES "${SVN_FILES}")

EXECUTE_PROCESS(COMMAND @CMAKE_BINARY_DIR@/@BIN_DIR@/comm -3
@CMAKE_BINARY_DIR@/build_files_list.txt @CMAKE_BINARY_DIR@/svn_files_list.txt
OUTPUT_VARIABLE COMM_RAWOUT)
STRING(REGEX REPLACE "\n" ";" COMM_OUT "${COMM_RAWOUT}")
STRING(REGEX REPLACE "/tmpdot" "/\\\\." COMM_OUT "${COMM_OUT}")
STRING(REGEX REPLACE "\\\\" "" COMM_OUT "${COMM_OUT}")

FOREACH(ITEM ${COMM_OUT})
   LIST(FIND BUILD_FILES ${ITEM} INBUILD)
   LIST(FIND SVN_FILES ${ITEM} INSVN)
   IF(INBUILD STREQUAL "-1" AND NOT INSVN STREQUAL "-1")
      LIST(APPEND SVN_FILES_NOT_IN_BUILD ${ITEM})
      LIST(REMOVE_ITEM COMM_OUT ${ITEM})
   ENDIF(INBUILD STREQUAL "-1" AND NOT INSVN STREQUAL "-1")
   IF(INSVN STREQUAL "-1" AND NOT INBUILD STREQUAL "-1")
      LIST(APPEND BUILD_FILES_NOT_IN_SVN ${ITEM})
      LIST(REMOVE_ITEM COMM_OUT ${ITEM})
   ENDIF(INSVN STREQUAL "-1" AND NOT INBUILD STREQUAL "-1")
ENDFOREACH(ITEM ${COMM_OUT})

I was hoping that CMake's diff ability might indicate lurking in there was
the ability to get actual diff style output from a cmake -E command that
could be parsed (if CMake had sucked in openbsd's diff to implement its diff
abilities, for example) but if not it looks like performance considerations
will require ensuring some sort of cross-platform tool is available.  Would
it perhaps make sense to  have a cmake -E diff the same way there is a cmake
-E tar?

The broader context is implementing a "make distcheck" rule for BRL-CAD
along the lines of the one previously implemented in autotools - the idea is
to have subversion tell us what files are in the repository (svn info), have
the build system report what files it knows about (via some custom CMake
function/macro logic that we have working to record that) and crank that
information back into user reporting and cpack (for example, a file known to
subversion but not addressed in the build logic is a FATAL_ERROR, and a file
unknown to both subversion and the build logic will be ignored when running
CPack).  This ability has proven very useful over the years with our
autotools build for making tarball distributions (there are more steps to
distcheck but this looks to be the last really tricky one to do in CMake).

Cheers, and thanks everybody for the help!

CY
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.cmake.org/pipermail/cmake/attachments/20110913/53459d04/attachment-0001.htm>


More information about the CMake mailing list