[CMake] Experiments in CMake support for Clang (header & standard) modules

David Blaikie dblaikie at gmail.com
Sun May 6 19:49:51 EDT 2018


On Mon, Apr 30, 2018 at 3:30 PM Stephen Kelly <steveire at gmail.com> wrote:

> On 04/20/2018 01:39 AM, David Blaikie wrote:
>
> Hi there,
>
> I'm experimenting with creating examples (& potential changes to CMake
> itself, if needed/useful) of building clang modules (currently using the
> semi-backwards compatible "header modules", with the intent of also/moving
> towards supporting pre-standard C++ modules in development in Clang).
>
>
> Great! Thanks for reaching out. Sorry it has taken me a while to respond.
>

No worries - thanks for getting back to me!


> Have you had other response off-list?
>

Nah - chatted a little with coworkers (fellow Clang/LLVM developers -
mostly Richard Smith & Chandler Carruth) off-list, but nothing much more.

> The basic commands required are:
>
>   clang++ -fmodules -xc++ -Xclang -emit-module -Xclang -fmodules-codegen
> -fmodule-name=foo foo.modulemap -o foo.pcm
>   clang++ -fmodules -c -fmodule-file=foo.pcm use.cpp
>   clang++ -c foo.pcm
>   clang++ foo.o use.o -o a.out
>
>
> Ok. Fundamentally, I am suspicious of having to have a
> -fmodule-file=foo.pcm for every 'import foo' in each cpp file. I shouldn't
> have to manually add that each time I add a new import to my cpp file. Even
> if it can be automated (eg by CMake), I shouldn't have to have my
> buildsystem be regenerated each time I add an import to my cpp file either.
>
> That's something I mentioned in the google groups post I made which you
> linked to. How will that work when using Qt or any other library?
>

- My understanding/feeling is that this would be similar to how a user has
to change their link command when they pick up a new dependency.

Nope, scratch that ^ I had thought that was the case, but talking more with
Richard Smith it seems there's an expectation that modules will be
somewhere between header and library granularity (obviously some small
libraries today have one or only a few headers, some (like Qt) have many -
maybe those on the Qt end might have slightly fewer modules than the have
headers - but still several modules to one library most likely, by the
sounds of it)


Now, admittedly, external dependencies are a little more complicated than
internal (within a single project consisting of multiple libraries) - which
is why I'd like to focus a bit on the simpler internal case first.


> Today, a beginner can find a random C++ book, type in a code example from
> chapter one and put `g++ -I/opt/book_examples prog1.cpp` into a terminal
> and get something compiling and running. With modules, they'll potentially
> have to pass a whole list of module files too.
>

Yeah, there's some talk of supporting a mode that doesn't explicitly
build/use modules in the filesystem, but only in memory for the purpose of
preserving the isolation semantics of modules. This would be used in simple
direct-compilation cases like this. Such a library might need a
configuration file or similar the compiler can parse to discover the
parameters (warning flags, define flags, whatever else) needed to build the
BMI.


> Lots of people manually maintain Makefile-based buildsystems today, and
> most other companies I've been inside of have their own custom tool or
> bunch of python scripts, or both. Manually changing such buildsystems to
> add -fmodule-file or -fmodule-map-file each time an import is added is a
> significant barrier.
>
> Will my project have to compile the modules files for all of my
> dependencies?
>

Yes - and that's where the external dependencies get complicated.


> Even more complication for my buildsystem. Do I have to wait for my
> dependencies to modularize bottom-up before I can benefit from modules?
>

There are some ideas about how to handle that ('legacy headers'/modules in
Richard's work/proposed amendment to the TS), but I'm trying to focus on a
few of the simpler cases first.


> If my dependency does add 'module foo' to their header files, or whatever
> the current syntax is, can I continue to #include <foo> or is that a source
> incompatible change?
>

I believe it'd be a source incompatible change. You could continue to
provide a header for your module separately. They'd likely have different
extensions anyway.


> I raised some of these issues a few years ago regarding the clang
> implementation with files named exactly module.modulemap:
>
>
> http://clang-developers.42468.n3.nabble.com/How-do-I-try-out-C-modules-with-clang-td4041946.html
>
>
> http://clang-developers.42468.n3.nabble.com/How-do-I-try-out-C-modules-with-clang-td4041946i20.html
>
> Interestingly, GCC is taking a directory-centric approach in the driver
> (-fmodule-path=<dir>) as opposed to the 'add a file to your compile line
> for each import' that Clang and MSVC are taking:
>
>  http://gcc.gnu.org/wiki/cxx-modules
>
> Why is Clang not doing a directory-centric driver-interface? It seems to
> obviously solve problems. I wonder if modules can be a success without
> coordination between major compiler and buildsystem developers. That's why
> I made the git repo - to help work on something more concrete to see how
> things scale.
>

'We' (myself & other Clang developers) are/will be talking to GCC folks to
try to get consistency here, in one direction or another (maybe some 3rd
direction different from Clang or LLVM's). As you noted in a follow-up,
there is a directory-based flag in Clang now, added by Boris as he's been
working through adding modules support to Build2.


> Having just read all of my old posts again, I still worry things like this
> will hinder modules 'too much' to be successful. The more (small) barriers
> exist, the less chance of success. If modules aren't successful, then
> they'll become a poisoned chalice and no one will be able to work on fixing
> them. That's actually exactly what I expect to happen, but I also still
> hope I'm just missing something :). I really want to see a committee
> document from the people working on modules which actually explores the
> problems and barriers to adoption and concludes with 'none of those things
> matter'. I think it's fixable, but I haven't seen anyone interested enough
> to fix the problems (or even to find out what they are).
>

Indeed - hence my desire to talk through these things, get some practical
experience, document them to the committee in perhaps a less-ranty, more
concrete form along with pros/cons/unknowns/etc to hopefully find some
consistency, maybe write up a document of "this is how we expect build
systems to integrate with this C++ feature", etc.


>
> Anyway, you are not here for my rants.
>
>
> My current very simplistic prototype, to build a module file, its
> respective module object file, and include those in the library/link for
> anything that depends on this library:
>
>   add_custom_command(
>           COMMAND ${CMAKE_CXX_COMPILER} ${CMAKE_CXX_FLAGS} -xc++ -c
> -Xclang -emit-module -fmodules -fmodule-name=Hello
> ${CMAKE_CURRENT_SOURCE_DIR}/module.modulemap -o
> ${CMAKE_CURRENT_BINARY_DIR}/hello_module.pcm -Xclang -fmodules-codegen
>           DEPENDS module.modulemap hello.h
>
>
> Why does this command depend on hello.h?
>

Because it builds the binary module interface (hello_module.pcm) that is a
serialized form of the compiler's internal representation of the contents
of module.modulemap which refers to hello.h (the modulemap lists the header
files that are part of the module). This is all using Clang's current
backwards semi-compatible "header modules" stuff. In a "real" modules
system, ideally there wouldn't be any modulemap. Just a .cppm file, and any
files it depends on (discovered through the build system scanning the
module imports, or a compiler-driven .d file style thing).

Perhaps it'd be better for me to demonstrate something closer to the actual
modules reality, rather than this retro header modules stuff that clang
supports.


> If that is changed and module.modulemap is not, what will happen?
>

If hello.h is changed and module.modulemap is not changed? The
hello_module.pcm does need to be rebuilt.

Ideally all of this would be implicit (maybe with some flag/configuration,
or detected based on new file extensions for C++ interface definitions) in
the add_library - taking, let's imagine, the .ccm (let's say, for
argument's sake*) file listed in the add_library's inputs and using it to
build a .pcm (BMI), building that .pcm as an object file along with all the
normal .cc files,


* alternatively, maybe they'll all just be .cc files & a build system would
be scanning the .cc files to figure out dependencies & could notice that
one of them is the blessed module interface definition based on the first
line in the file.


>
>
>           OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/hello_module.pcm
>           COMMENT "Generating hello_module.pcm"
>   )
>   add_library (Hello hello.cxx
> ${CMAKE_CURRENT_BINARY_DIR}/hello_module.pcm)
>   target_include_directories(Hello PUBLIC ${CMAKE_CURRENT_SOURCE_DIR})
>   target_compile_options(Hello PUBLIC -fmodules -Xclang
> -fmodule-file=${CMAKE_CURRENT_BINARY_DIR}/hello_module.pcm)
>
> (this is based on the example in the CMake docs using Hello/Demo)
>
>
> Good that you got something working.
>
>
> This also required one modification to CMake itself to classify a pcm file
> as a C++ file that needs to be compiled (to execute the 3rd line in the
> basic command list shown above).
>
>
> An alternative to patching CMake might be
>
>  set_source_files_properties(${CMAKE_CURRENT_BINARY_DIR}/hello_module.pcm
> PROPERTIES LANGUAGE CXX)
>
> though hopefully that is also temporary.
>

Ah, thanks! I'll give that a go.

So I suppose the more advanced question: Is there a way I can extend
handling of existing CXX files (and/or define a new kind of file, say,
CXXM?) specified in a cc_library. If I want to potentially check if a .cc
file is a module, discover its module dependencies, add new rules about how
to build those, etc. Is that do-able within my cmake project, or would that
require changes to cmake itself? (I'm happy to poke around at what those
changes might look like)


>
>
> But this isn't ideal - I don't /think/ I've got the dependencies quite
> right & things might not be rebuilding at the right times.
> Also it involves hardcoding a bunch of things like the pcm file names,
> header files, etc.
>
>
> Indeed. I think part of that comes from the way modules have been
> designed. The TS has similar issues.
>

Sure - but I'd still be curious to understand how I might go about
modifying the build system to handle this. If there are obvious things I
have gotten wrong about the dependencies, etc, that would cause this not to
rebuild on modifications to any of the source/header files - I'd love any
tips you've got.

& if there are good paths forward for ways to prototype changes to the
build system to handle, say, specifying a switch/setting a property/turning
on a feature that I could implement that would collect all the .ccm files
in an add_library rule and use them to make a .pcm file - I'd be happy to
try prototyping that.

> Ideally, at least for a simplistic build, I wouldn't mind generating a
> modulemap from all the .h files (& have those headers listed in the
> add_library command - perhaps splitting public and private headers in some
> way, only including the public headers in the module file, likely).
> Eventually for the standards-proposal version, it's expected that there
> won't be any modulemap file, but maybe all headers are included in the
> module compilation (just pass the headers directly to the compiler).
>
>
> In a design based on passing directories instead of files, would those
> directories be redundant with the include directories?
>

I'm not sure I understand the question, but if I do, I think the answer
would be: no, they wouldn't be redundant. The system will not have
precompiled modules available to use - because binary module definitions
are compiler (& compiler version, and to some degree, compiler flags (eg:
are you building this for x86 32 bit or 64 bit?)) dependent.


> One of the problems modules adoption will hit is that all the compilers
> are designing fundamentally different command line interfaces for them.
>

*nod* We'll be working amongst GCC and Clang at least to try to converge on
something common.


> Buildsystems will have to be rewritten to take advantage of modules, and
> they will be annoying to use and adopt.
>

I'm hoping to minimize any rewriting - but also potentially provide recipes
for CMake (& other users) as well as patches to CMake itself to help
facilitate this.

> This also doesn't start to approach the issue of how to build modules for
> external libraries - which I'm happy to discuss/prototype too, though
> interested in working to streamline the inter-library but intra-project
> (not inter-project) case first.
>
>
> Yes, there are many aspects to consider.
>
> Are you interested in design of a CMake abstraction for this stuff? I have
> thoughts on that, but I don't know if your level of interest stretches that
> far.
>

Not sure how much work it'd be - at the moment my immediate interest is to
show as much real-world/can-actually-run prototype with cmake as possible,
either with or without changes to cmake itself (or a combination of minimal
cmake changes plus project-specific recipes of how to write a user's cmake
files to work with this stuff) or also showing non-working/hypothetical
prototypes of what ideal user cmake files would look like with
reasonable/viable (but not yet implemented) cmake support.

> Stephen - I saw you were asking some questions about this here (
> https://groups.google.com/a/isocpp.org/forum/#!topic/modules/sDIYoU8Uljw &
> https://github.com/steveire/ModulesExperiments - didn't really understand
> how this example applied/worked, though - I guess maybe it's a prototype
> syntax proposal?)
>
>
> It is a set of pre-modules libraries, some of which depend on one another
> and with some transitive dependencies in the headers.
>
> I made it to be 'a few steps above trivial' in the hope that someone would
> show me how to port it to modules-ts (even if the result does not build).
> So far, no one has.
>

Ah, OK.


> Can you help? It would really help my understanding of where things
> currently stand with modules.
>

I can certainly have a go, for sure.


> For example, is there only one way to port the contents of the cpp files?
>

Much like header grouping - how granular headers are (how many headers you
have for a given library) is up to the developer to some degree (certain
things can't be split up), similarly with modules - given a set of C++
definitions, it's not 100% constrained how those definitions are exposed as
modules - the developer has some freedom over how the declarations of those
entities are grouped into modules.


> After that, is there one importable module per class or one per shared
> library (which I think would make more sense for Qt)?
>

Apparently (this was a surprise to me - since I'd been thinking about this
based on the Clang header modules (backwards compatibility stuff, not the
standardized/new language feature modules)) the thinking is probably
somewhere between one-per-class and one-per-shared-library. But for me, in
terms of how a build file would interact with this, more than
one-per-shared-library is probably the critical tipping point. If it was
just one per shared library, then I'd feel like the dependency/flag
management would be relatively simple. You have to add a flag to the linker
commandline to link in a library, so you have to add a flag to the compile
step to reference a module, great. But, no, bit more complicated than that
given the finer granularity that's expected here.


> And is transitive dependency expressed in the header files after porting?
> I think that last one is dealt with by the 'export import' syntax
>

If you mean one module exposing things from modules it depends on - yes,
you can export import. (but by default your imports are just accessible to
the implementation of the module)


> The git repo is an attempt to make the discussion concrete because it
> would show how multiple classes and multiple libraries with dependencies
> could interact in a modules world. I'm interested in what it would look
> like ported to modules-ts, because as far as I know, clang-modules and
> module maps would not need porting of the cpp files at all.
>

Right, clang header-modules is a backwards compatibility feature. It does
require a constrained subset of C++ to be used to be effective (ie:
basically your headers need to be what we think of as ideal/canonical
headers - reincludable, independent, complete, etc). So if you've got
good/isolated headers, you can port them to Clang's header modules by
adding the module maps & potentially not doing anything else - though, if
you rely on not changing your build system, then that presents some
problems if you want to scale (more cores) or distribute your build.
Because the build system doesn't know about these  dependencies - so if you
have, say, two .cc files that both include foo.h then bar.h - well, the
build system runs two compiles, both compiles try to implicitly build the
foo.h module - one blocks waiting for the other to complete, then they
continue and block again waiting for bar.h module to be built. If the build
system knew about these dependencies (what Google uses - what we call
"explicit (header)modules") then it could build the foo.h module and the
bar.h module in parallel, then build the two .cc files in parallel.


>
>
> Basically: What do folks think about supporting these sort of features in
> CMake C++ Builds? Any pointers on how I might best implement this with or
> without changes to CMake?
>
>
> I think some design is needed up front. I expect CMake would want to have
> a first-class (on equal footing with include directories or compile
> definitions and with particular handling) concept for modules, extending
> the install(TARGET) command to install module binary files etc.
>

Module binary files wouldn't be installed in the sense of being part of the
shipped package of a library - because module binary files are
compiler/flag/etc specific.


> To do that kind of design, I at least would need to be able to experiment
> or conceptualize examples which are not totally trivial, such as the
> starting point in my repo.
>
> On the CMake side, I think something like this should be the target (note
> that no compiler command line interface works like this today, which I
> think is a barrier to adoption):
>
>  add_library(foo foo.cpp)
>
>  # Target property to enable modules for the target
>  set_property(TARGET foo PROPERTY USE_CXX_MODULES ON)
>
>  # Note: Use target_include_directories to specify module search
>  # paths (how GCC and MSVC work)
>  # Also note: compilers should use the -I paths as a module path search
> list so
>  # that CMake does not have to pass the same list as both -I and as
> -fmodule-path=
>  # or similar entries.
>  # Also note: This is source compatible with the cmake code that already
> exists!
>  # The existance of /opt/bar/bing.<ext> makes 'import bing;' work.
>  target_include_directories(foo PRIVATE /opt/bar)
>
>  # Possibly need a new command to specify headers (there is
>  # other motivation for this in CMake, so use a generic name without
> 'modules' in it)
>  # Because foo has USE_CXX_MODULES ON, foo.h is processed as a module
>  # and a binary representation is created for import. Other properties can
>  # be set on foo.h with set_source_files_properties() to pass other
> command line
>  # options when generating the module.
>  target_headers(foo PRIVATE foo.h)
>
> Also - in the right design, CMake does not have to regenerate
> -fmodule-file or whatever into the compile line any time the user adds
> 'import something;', which is the case with clang now afaik. Please correct
> me if that is not correct.
>
> I know some people at Kitware have been thinking about modules though, so
> I'd be interested in any other insights from there. Brad, can you comment?
>
> Here's some other reading material for anyone else following along:
>
> https://izzys.casa/posts/millennials-are-killing-the-modules-ts.html
> https://build2.org/article/cxx-modules-misconceptions.xhtml
>

Thanks for the links & questions/ideas!

- Dave


>
>
>
> Thanks,
>
> Stephen.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://cmake.org/pipermail/cmake/attachments/20180506/c6f4832c/attachment-0001.html>


More information about the CMake mailing list