[CMake] CMake CUDA 3.8+/9 support as a first class language with out Visual Studio Support... err what?

Mon Aug 7 12:55:45 EDT 2017

It is great to hear that you have been able to resolve the compilation
issues with CMake, though it is very weird and worry some that only
formatting your machine would resolve the issue.

1) The error code you are getting (46-cudaErrorDevicesUnavailable )
generally means that another CUDA program is running and is either
holding an exclusive lock on the GPU or maybe the GPU is configured to
only run OpenGL or CUDA, and not both at the same time. My instinct is
to run 'c:\Program Files\NVIDIA Corporation\NVSMI>nvidia-smi -q' and
see if anything is using the GPU, but I don't remember if that ability
is offered only for Quadro/Tesla cards.

2) To setup and run the CMake CUDA tests you are missing 2 lines, which are:

set(ENV{PATH} "$ENV{PATH};C:/Program Files/NVIDIA GPU Computing
Toolkit/CUDA/v7.5/bin")
set(dashboard_cache "CMake_TEST_CUDA:BOOL=ON")

These lines should be placed before the
`include(${CTEST_SCRIPT_DIRECTORY}/cmake_common.cmake)`. For reference
see: https://open.cdash.org/viewNotes.php?buildid=5008534

Now to run the dashboard you would do something like the following.
This will build cmake, all the cmake tests, and upload the results:

ctest -S C:\projects\cmake_dev\Dashboards\CMakeScripts\bjd_dashboard.cmake -VV

On Sun, Aug 6, 2017 at 2:37 PM, Brian Davis <bitminer at gmail.com> wrote:
>
> Upon:
>
> wiping Dell 7559 (yes the weirdness has gotten this bad), reinstalling from
> Dell Factory image
> upgrading system to Latest Win 10 (now not in developer mode anymore)
> Dell update to get latest drivers and other goobly bits
> Removing all Virus Scanners to keep this from possibly interfering
> Installing Visual Studio 2013 Community
> Installing CUDA 7.5 and packaged driver
> Imagining with Clonezilla for posterity sake in case... uhhh... when windows
> acts up again.
>
> These statements:
>
>>
>> Regarding 960M and CUDA  7.5/7.5, 8.0/7.5, and 7.7/9.0
>>
>> Answer is:
>>
>> 960M was likely released post CUDA  7.5 driver and possibly post 8.0.
>> Seems that architecture differences do not allow old drivers to work on
>> newer arch cards.   Once 9.0 driver was released... 7.5 run time worked with
>> 9.0 driver, but for some reason not 8.0.  Seems CUDA and Nvidia
>> Runtime/Drivers have a dirty little secret much like Java and the runtimes.
>>
>> At this point I cannot get CMake 3.2 or 3.9 to work with CUDA 7.5/9.0, VS
>> 13, on Win10Pro/Enterprise.  And from the state of doc it seems not worth my
>> effort to even try anymore.
>
>
> Are NOT correct.  The 7559 does work with CUDA 7.5 and runtime (all that is
> required is to wipe windows and start form scratch... no surprise there):
>
> C:\ProgramData\NVIDIA Corporation\CUDA
> Samples\v7.5\1_Utilities\deviceQuery\../../bin/win64/Debug/deviceQuery.exe
> Starting...
>
>
>  CUDA Device Query (Runtime API) version (CUDART static linking)
>
> Detected 1 CUDA Capable device(s)
>
> Device 0: "GeForce GTX 960M"
>   CUDA Driver Version / Runtime Version          7.5 / 7.5
>   CUDA Capability Major/Minor version number:    5.0
>   Total amount of global memory:                 4096 MBytes (4294967296
> bytes)
>   ( 5) Multiprocessors, (128) CUDA Cores/MP:     640 CUDA Cores
>   GPU Max Clock rate:                            1176 MHz (1.18 GHz)
>   Memory Clock rate:                             2505 Mhz
>   Memory Bus Width:                              128-bit
>   L2 Cache Size:                                 2097152 bytes
>   Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536,
> 65536), 3D=(4096, 4096, 4096)
>   Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
>   Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048
> layers
>   Total amount of constant memory:               65536 bytes
>   Total amount of shared memory per block:       49152 bytes
>   Total number of registers available per block: 65536
>   Warp size:                                     32
>   Maximum number of threads per multiprocessor:  2048
>   Maximum number of threads per block:           1024
>   Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
>   Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
>   Maximum memory pitch:                          2147483647 bytes
>   Texture alignment:                             512 bytes
>   Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
>   Run time limit on kernels:                     Yes
>   Integrated GPU sharing Host Memory:            No
>   Support host page-locked memory mapping:       Yes
>   Alignment requirement for Surfaces:            Yes
>   Device has ECC support:                        Disabled
>   CUDA Device Driver Mode (TCC or WDDM):         WDDM (Windows Display
> Driver Model)
>   Device supports Unified Addressing (UVA):      Yes
>   Device PCI Domain ID / Bus ID / location ID:   0 / 2 / 0
>   Compute Mode:
>      < Default (multiple host threads can use ::cudaSetDevice() with device
> simultaneously) >
>
> deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 7.5, CUDA Runtime
> Version = 7.5, NumDevs = 1, Device0 = GeForce GTX 960M
> Result = PASS
>
>
> I cannot however get a simple CUDA app to run that was generated with CMake
> 3.2 or 3.9.  I can get it to compile, but I can't get it to run.  Also
> previously CTest and some executables would just hang when run until I
> reinstalled Visual Studio 2013.  This is what finally pushed me to wipe the
> machine.
>
> I cannot at this point explain what happened.  I also cannot explain why
> after a clean wipe I still cannot get a ultra simple CUDA app compiled by
> CMake 3.2 or 3.9 to work on Windows 10 even though the SDK apps compile AND
> RUN now in 7.5/7.5 combo.
>
> Is there anyone out there that is able to get CMake to create a runnable
> CUDA executable that creates memory on the device to run on Win10 latest.
> Again SDK Apps compile and run but not CMake generated app.. App will
> compile and run, but is unable to communicate/create memory on the device.
>
> My app... as simple as I can make it is:
>
> #include "openglbasic.h"
>
> #include <cuda_runtime.h>
> #include <cuda_gl_interop.h>
>
> #include <helper_cuda.h>    // includes cuda.h and cuda_runtime_api.h
> #include <helper_functions.h>
> #include <helper_cuda_gl.h>
>
> #ifdef USE_CUDA_OPTIMAL_DEVICE
> #include <dsacudautil/cuda_device_properties.h>
> #include <dsacudautil/optimal_gpu.h>
> #endif
>
>
> #include <memory>
> #include <algorithm>
>
> int current = 0;
> int UniqueNumber() { return ++current; }
>
> int add(const float* A, const float* B, float* C);
>
> int main(int argc, char* argv[]){
>
>     int curr_cuda_device_id = 0;
>
>     int cuda_device_id = 0;
>
>
>     cuda_device_id = findCudaDevice(argc, (const char **)argv);
>
>
>     // Need to initiialzie cuda and opengl interop here.
>     checkCudaErrors(cudaGetDevice(&curr_cuda_device_id));
>     printf("Current device is [%d]\n", curr_cuda_device_id);
>     checkCudaErrors(cudaSetDevice(cuda_device_id));
>     checkCudaErrors(cudaGLSetGLDevice(cuda_device_id));
>     printf("Current device is [%d]\n", curr_cuda_device_id);
>
>
>     int x_dim = 256;
>     int y_dim = 256;
>     int z_dim = 196;
>     float * d_volume;
>     int size = x_dim * y_dim * z_dim * sizeof(float);
>     //size = 5;
>     checkCudaErrors(cudaMalloc((void **)&d_volume, size));
>
>     return 0;
>
> }
>
> Exits at:
>
>   checkCudaErrors(cudaMalloc((void **)&d_volume, size));
>
>
> In: check():
>
> c:\ProgramData\NVIDIA Corporation\CUDA Samples\v7.5\common\inc\helper_cuda.h
>
> template< typename T >
> void check(T result, char const *const func, const char *const file, int
> const line)
> {
>     if (result)
>     {
>         fprintf(stderr, "CUDA error at %s:%d code=%d(%s) \"%s\" \n",
>                 file, line, static_cast<unsigned int>(result),
> _cudaGetErrorEnum(result), func);
>         DEVICE_RESET
>         // Make sure we call CUDA Device Reset before exiting
>         exit(EXIT_FAILURE);
>     }
> }
>
>
> With terminal output of:
>
> GPU Device 0: "GeForce GTX 960M" with compute capability 5.0
>
> Current device is [0]
> Current device is [0]
> CUDA error at
> C:\projects\cmake\cmake_test\v3.2\cuda_basic\src\cuda_basic_test.cpp:67
> code=46(cudaErrorDevicesUnavailable) "cudaMalloc((void **)&d_volume, size)"
>
>
> Is there something I am missing here.  Above should work right?
>
> I am also told that CMake tests are successful on Win10.  Do these test
> actually try and communicate with the device / create memory or are they
> written to just test compilation?  Cuz that would certainly pass on my
> machine, but not actually work!
>
> As Stated by Robert Maynard
> "Unfortunately you are going to need to provide more information to help
> track down the issue. We currently have machines that verify
> 2015/8.0 and 2013/7.5 properly work (
> https://open.cdash.org/index.php?project=CMake&filtercount=1&showfilters=1&field1=buildname&compare1=63&value1=CUDA"
>
> at
> http://cmake.3232098.n2.nabble.com/Visual-Studio-with-CUDA-does-not-work-in-3-9-td7595673.html
>
> Hmm curious the title there:  "Visual-Studio-with-CUDA-does-not-work-in-3-9"
>
> How do I go about running that test on my machine?
>
> I started here:
>
> https://cmake.org/testing/
>
> then on to:
>
> https://gitlab.kitware.com/cmake/cmake/blob/master/Help/dev/testing.rst
>
> that routed me to:
>
> https://gitlab.kitware.com/cmake/dashboard-scripts
>
> Where I read
>
> https://gitlab.kitware.com/cmake/dashboard-scripts/blob/master/cmake_common.cmake
>
> .... yeah ugh I know the internet is all about hyper links, but hyper use of
> hyper links?
>
> but anyway I do my own version of
> https://gitlab.kitware.com/cmake/cmake/blob/master/Help/dev/testing.rst  (to
> relocate C:\projects\cmake_dev\Dashboards\CMakeScripts)
>
> $ mkdir -p ~/Dashboards
> $ cd ~/Dashboards
> $ git clone https://gitlab.kitware.com/cmake/dashboard-scripts.git
> CMakeScripts
> $ cd CMakeScripts
>
> So I create:
>
>
> C:\projects\cmake_dev\Dashboards\CMakeScripts\bjd_dashboard.cmake
>
> which reads (short version without comments)
>
>    # Client maintainer: me at mydomain.net
>    set(CTEST_SITE "bitbucket")
>    set(CTEST_BUILD_NAME "Win10-Visual Studio 2013")
>    set(CTEST_BUILD_CONFIGURATION Debug)
>    set(CTEST_CMAKE_GENERATOR "Visual Studio 12 2013 Win64")
>
>    #set( dashboard_model Experimental)
>    set( dashboard_model Nightly)
>
>    include(${CTEST_SCRIPT_DIRECTORY}/cmake_common.cmake)
>
>
> Question how do I configure this to run just 3.9 and CUDA tests? ... and
> preferably just the CUDA tests.
>
> Also during this I asked the CMake folk to create a kitware/ctest official
> docker image so I could just install it on my NAS.. but looks like I have to
> ansible it into existence.  spectacularrrrg!
>
>
>
> --
>
> Powered by www.kitware.com
>
> Please keep messages on-topic and check the CMake FAQ at:
> http://www.cmake.org/Wiki/CMake_FAQ
>
> Kitware offers various services to support the CMake community. For more
> information on each offering, please visit:
>
> CMake Support: http://cmake.org/cmake/help/support.html
> CMake Consulting: http://cmake.org/cmake/help/consulting.html
> CMake Training Courses: http://cmake.org/cmake/help/training.html
>
> Visit other Kitware open-source projects at
> http://www.kitware.com/opensource/opensource.html
>
> Follow this link to subscribe/unsubscribe:
> http://public.kitware.com/mailman/listinfo/cmake