[CMake] About the FindCUDA.cmake module ande Separate Compilation

Thu Jul 17 10:07:03 EDT 2014

Dear All

I am a user of cmake build system and its differents modules, that
were very helpful in the past.

But I met some specific issues with the FindCUDA.cmake module, for
about a year now. Especially about the Separate Compilation feature,
that never worked for me, I  previously had to bypass the problem by
rewriting some code in the same file, but today I am stuck and I have
to get this feature working.

What are my files ?
I have
===================
a.cu:
__constant__ Buffer float[1024];
__global__ void kernelA( float a )
{
     Buffer[0] = a;
}
===================
b.cu.h
extern __constant__ Buffer float[1024];
===================
b.cu
__global__ void kernelB( float b )
{
     Buffer[0] += b;
}
===================

It is obvious with this configuration, that I have to link b.cu with
a.cu, in order to get the same memory area shared across them. This
simple feature seems to be only available if a separate compilation
configuration build is used, in order to avoid redefinition error, and
also compiler needs relocatable device code to be setted, this latter
I don't really understand why.

What specific feature of the CMake Module I am using ?
Here are the main macros I am using:

=======================================
list(APPEND CUDA_NVCC_FLAGS " -gencode arch=compute_30,code=sm_30 -rdc=true ")
set( CUDA_SEPARABLE_COMPILATION )
cuda_add_executable(${OUTPUT_NAME} ${sources} ${headers})
======================================

The error I get:
If I discard the "-rdc=true" nvcc option for relocatable code, the
code compiles, and links fine, but at runtime the code does not work
as expected, ie the value inside the buffer is not shared across
differents kernel a and b.

If all options stated before are setted, the code compiles fine, but
at link step I get tons of link errors that looks like:
 undefined reference to `__cudaRegisterLinkedBinary[...]'

The problem doesn't seem that hard to solve, as seperate compilation
is extensively explained in the cuda documentation :
http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#using-separate-compilation-in-cuda
But I still got problems trying to get the separate things to work
with FindCUDA.cmake.

Thank you in advance for any help.