[CMake] CMake CUDA 3.8+/9 support as a first class language with out Visual Studio Support... err what?

Sun Aug 6 14:37:43 EDT 2017

Upon:

wiping Dell 7559 (yes the weirdness has gotten this bad), reinstalling from
Dell Factory image
upgrading system to Latest Win 10 (now not in developer mode anymore)
Dell update to get latest drivers and other goobly bits
Removing all Virus Scanners to keep this from possibly interfering
Installing Visual Studio 2013 Community
Installing CUDA 7.5 and packaged driver
Imagining with Clonezilla for posterity sake in case... uhhh... when
windows acts up again.

These statements:

> Regarding 960M and CUDA  7.5/7.5, 8.0/7.5, and 7.7/9.0
>
> Answer is:
>
> 960M was likely released post CUDA  7.5 driver and possibly post 8.0.
> Seems that architecture differences do not allow old drivers to work on
> newer arch cards.   Once 9.0 driver was released... 7.5 run time worked
> with 9.0 driver, but for some reason not 8.0.  Seems CUDA and Nvidia
> Runtime/Drivers have a dirty little secret much like Java and the runtimes.
>
> At this point I cannot get CMake 3.2 or 3.9 to work with CUDA 7.5/9.0, VS
> 13, on Win10Pro/Enterprise.  And from the state of doc it seems not worth
> my effort to even try anymore.
>

Are NOT correct.  The 7559 does work with CUDA 7.5 and runtime (all that is
required is to wipe windows and start form scratch... no surprise there):

C:\ProgramData\NVIDIA Corporation\CUDA
Samples\v7.5\1_Utilities\deviceQuery\../../bin/win64/Debug/deviceQuery.exe
Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 960M"
  CUDA Driver Version / Runtime Version          7.5 / 7.5
  CUDA Capability Major/Minor version number:    5.0
  Total amount of global memory:                 4096 MBytes (4294967296
bytes)
  ( 5) Multiprocessors, (128) CUDA Cores/MP:     640 CUDA Cores
  GPU Max Clock rate:                            1176 MHz (1.18 GHz)
  Memory Clock rate:                             2505 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 2097152 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536,
65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048
layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  CUDA Device Driver Mode (TCC or WDDM):         WDDM (Windows Display
Driver Model)
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 2 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device
simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 7.5, CUDA Runtime
Version = 7.5, NumDevs = 1, Device0 = GeForce GTX 960M
Result = PASS

I cannot however get a simple CUDA app to run that was generated with CMake
3.2 or 3.9.  I can get it to compile, but I can't get it to run.  Also
previously CTest and some executables would just hang when run until I
reinstalled Visual Studio 2013.  This is what finally pushed me to wipe the
machine.

I cannot at this point explain what happened.  I also cannot explain why
after a clean wipe I still cannot get a ultra simple CUDA app compiled by
CMake 3.2 or 3.9 to work on Windows 10 even though the SDK apps compile AND
RUN now in 7.5/7.5 combo.

Is there anyone out there that is able to get CMake to create a runnable
CUDA executable that creates memory on the device to run on Win10 latest.
Again SDK Apps compile and run but not CMake generated app.. App will
compile and run, but is unable to communicate/create memory on the device.

My app... as simple as I can make it is:

#include "openglbasic.h"

#include <cuda_runtime.h>
#include <cuda_gl_interop.h>

#include <helper_cuda.h>    // includes cuda.h and cuda_runtime_api.h
#include <helper_functions.h>
#include <helper_cuda_gl.h>

#ifdef USE_CUDA_OPTIMAL_DEVICE
#include <dsacudautil/cuda_device_properties.h>
#include <dsacudautil/optimal_gpu.h>
#endif

#include <memory>
#include <algorithm>

int current = 0;
int UniqueNumber() { return ++current; }

int add(const float* A, const float* B, float* C);

int main(int argc, char* argv[]){

    int curr_cuda_device_id = 0;

    int cuda_device_id = 0;

    cuda_device_id = findCudaDevice(argc, (const char **)argv);

    // Need to initiialzie cuda and opengl interop here.
    checkCudaErrors(cudaGetDevice(&curr_cuda_device_id));
    printf("Current device is [%d]\n", curr_cuda_device_id);
    checkCudaErrors(cudaSetDevice(cuda_device_id));
    checkCudaErrors(cudaGLSetGLDevice(cuda_device_id));
    printf("Current device is [%d]\n", curr_cuda_device_id);

    int x_dim = 256;
    int y_dim = 256;
    int z_dim = 196;
    float * d_volume;
    int size = x_dim * y_dim * z_dim * sizeof(float);
    //size = 5;
    checkCudaErrors(cudaMalloc((void **)&d_volume, size));

    return 0;

}

Exits at:

  checkCudaErrors(cudaMalloc((void **)&d_volume, size));

In: check():

c:\ProgramData\NVIDIA Corporation\CUDA Samples\v7.5\common\inc\helper_cuda.h

template< typename T >
void check(T result, char const *const func, const char *const file, int
const line)
{
    if (result)
    {
        fprintf(stderr, "CUDA error at %s:%d code=%d(%s) \"%s\" \n",
                file, line, static_cast<unsigned int>(result),
_cudaGetErrorEnum(result), func);
        DEVICE_RESET
        // Make sure we call CUDA Device Reset before exiting
        exit(EXIT_FAILURE);
    }
}

With terminal output of:

GPU Device 0: "GeForce GTX 960M" with compute capability 5.0

Current device is [0]
Current device is [0]
CUDA error at
C:\projects\cmake\cmake_test\v3.2\cuda_basic\src\cuda_basic_test.cpp:67
code=46(cudaErrorDevicesUnavailable) "cudaMalloc((void **)&d_volume, size)"

Is there something I am missing here.  Above should work right?

I am also told that CMake tests are successful on Win10.  Do these test
actually try and communicate with the device / create memory or are they
written to just test compilation?  Cuz that would certainly pass on my
machine, but not actually work!

As Stated by Robert Maynard
"Unfortunately you are going to need to provide more information to help
track down the issue. We currently have machines that verify
2015/8.0 and 2013/7.5 properly work (
https://open.cdash.org/index.php?project=CMake&filtercount=1&showfilters=1&field1=buildname&compare1=63&value1=CUDA
"

at
http://cmake.3232098.n2.nabble.com/Visual-Studio-with-CUDA-does-not-work-in-3-9-td7595673.html

Hmm curious the title there:  "Visual-Studio-with-CUDA-does-not-work-in-3-9"

How do I go about running that test on my machine?

I started here:

https://cmake.org/testing/

then on to:

https://gitlab.kitware.com/cmake/cmake/blob/master/Help/dev/testing.rst

that routed me to:

https://gitlab.kitware.com/cmake/dashboard-scripts

Where I read

https://gitlab.kitware.com/cmake/dashboard-scripts/blob/master/cmake_common.cmake

.... yeah ugh I know the internet is all about hyper links, but hyper use
of hyper links?

but anyway I do my own version of
https://gitlab.kitware.com/cmake/cmake/blob/master/Help/dev/testing.rst
(to relocate C:\projects\cmake_dev\Dashboards\CMakeScripts)

$ mkdir -p ~/Dashboards
$ cd ~/Dashboards
$ git clone https://gitlab.kitware.com/cmake/dashboard-scripts.git
CMakeScripts
$ cd CMakeScripts

So I create:

C:\projects\cmake_dev\Dashboards\CMakeScripts\bjd_dashboard.cmake

which reads (short version without comments)

   # Client maintainer: me at mydomain.net
   set(CTEST_SITE "bitbucket")
   set(CTEST_BUILD_NAME "Win10-Visual Studio 2013")
   set(CTEST_BUILD_CONFIGURATION Debug)
   set(CTEST_CMAKE_GENERATOR "Visual Studio 12 2013 Win64")

   #set( dashboard_model Experimental)
   set( dashboard_model Nightly)

   include(${CTEST_SCRIPT_DIRECTORY}/cmake_common.cmake)

Question how do I configure this to run just 3.9 and CUDA tests? ... and
preferably just the CUDA tests.

Also during this I asked the CMake folk to create a kitware/ctest official
docker image so I could just install it on my NAS.. but looks like I have
to ansible it into existence.  spectacularrrrg!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://public.kitware.com/pipermail/cmake/attachments/20170806/a616c4f9/attachment.html>