[CMake] CMake CUDA 3.8+/9 support as a first class language with out Visual Studio Support... err what?
Brian J. Davis
bitminer at gmail.com
Sun Jul 30 20:15:20 EDT 2017
Saga novella continues:
>> Next I am going to remove all NVIDA drivers and try reinstall of
CUDA 7.5 see if I can get deviceQuery to report 7.5/7.5.
Nvidia 352.65 driver removal from Add/Remove Programs
Device Manager -> NVIDIA GeForce GTX 960M -> General reports "device has
been disabled"
Device Query:
C:\ProgramData\NVIDIA Corporation\CUDA Samples\v7.5\bin\win64\Debug>rem
start "Device Query" deviceQuery.exe
C:\ProgramData\NVIDIA Corporation\CUDA
Samples\v7.5\bin\win64\Debug>deviceQuery.exe
deviceQuery.exe Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
cudaGetDeviceCount returned 35
-> CUDA driver version is insufficient for CUDA runtime version
Result = FAIL
Ok so great no driver installed!
Reinstall of CUDA 7.5.18
Run of DeviceQuery:
C:\ProgramData\NVIDIA Corporation\CUDA Samples\v7.5\bin\win64\Debug>rem
start "Device Query" deviceQuery.exe
C:\ProgramData\NVIDIA Corporation\CUDA
Samples\v7.5\bin\win64\Debug>deviceQuery.exe
deviceQuery.exe Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX 960M"
CUDA Driver Version / Runtime Version 7.5 / 7.5
CUDA Capability Major/Minor version number: 5.0
Total amount of global memory: 4096 MBytes
(4294967296 bytes)
( 5) Multiprocessors, (128) CUDA Cores/MP: 640 CUDA Cores
GPU Max Clock rate: 1176 MHz (1.18 GHz)
Memory Clock rate: 2505 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 2097152 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,
65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384),
2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
CUDA Device Driver Mode (TCC or WDDM): WDDM (Windows Display
Driver Model)
Device supports Unified Addressing (UVA): Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 2 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with
device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 7.5, CUDA
Runtime Version = 7.5, NumDevs = 1, Device0 = GeForce GTX 960M
Result = PASS
Ok return to sanity with 7.5/7.5
Return to insanity as NBody still does not work with:
Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance.
-fullscreen (run n-body simulation in fullscreen mode)
-fp64 (use double precision floating point values
for simulation)
-hostmem (stores simulation data in host memory)
-benchmark (run benchmark to measure performance)
-numbodies=<N> (number of bodies (>= 1) to run in simulation)
-device=<d> (where d=0,1,2.... for the CUDA device to use)
-numdevices=<i> (where i=(number of CUDA devices > 0) to use
for simulation)
-compare (compares simulation results running once on
the default GPU and once on the CPU)
-cpu (run n-body simulation on the CPU)
-tipsy=<file.bin> (load a tipsy model file for simulation)
NOTE: The CUDA Samples are not meant for performance measurements.
Results may vary when GPU Boost is enabled.
> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
> Compute 5.0 CUDA device: [GeForce GTX 960M]
CUDA error at c:\programdata\nvidia corporation\cuda
samples\v7.5\5_simulations\nbody\bodysystemcuda_impl.h:160
code=46(cudaErrorDevicesUnavailable)
"cudaEventCreate(&m_deviceData[0].event)"
There is at this point clearly some very odd behavior with CUDA 7.5 and
GeForce 960M. CMake still can build a project, but will not run or
create memory with cudaMalloc etc.
Installed driver at this point is 353.90.
GeForce Experience reports 381.65 driver
but I have downloaded:
384.94-notebook-win10-64bit-international-whql.exe
So I try that and driver installed is now 384.94
CUDA 7.5 works with new driver, but not seemingly driver shipped with
7.5 or 8.0. NBody Runs.
CMake 3.9 still fails to build a runable project with:
GPU Device 0: "GeForce GTX 960M" with compute capability 5.0
Current device is [0]
Current device is [0]
CUDA error at
C:\projects\cmake\cmaketesting\v3.9\cuda_basic\src\cuda_basic_test.cpp:66
code=46(cudaErrorDevicesUnavailable) "cudaMalloc((void **)&dev_mem_ptr,
size)"
DeviceQuery now reports:
C:\ProgramData\NVIDIA Corporation\CUDA
Samples\v7.5\bin\win64\Debug>deviceQuery.exe
deviceQuery.exe Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX 960M"
CUDA Driver Version / Runtime Version 9.0 / 7.5
CUDA Capability Major/Minor version number: 5.0
Total amount of global memory: 4096 MBytes
(4294967296 bytes)
( 5) Multiprocessors, (128) CUDA Cores/MP: 640 CUDA Cores
GPU Max Clock rate: 1176 MHz (1.18 GHz)
Memory Clock rate: 2505 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 2097152 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,
65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384),
2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
CUDA Device Driver Mode (TCC or WDDM): WDDM (Windows Display
Driver Model)
Device supports Unified Addressing (UVA): Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 2 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with
device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA
Runtime Version = 7.5, NumDevs = 1, Device0 = GeForce GTX 960M
Result = PASS
Which is how seemingly CUDA 9 driver support was installed.
Tried CMake 3.2 with CUDA 7.5
GPU Device 0: "GeForce GTX 960M" with compute capability 5.0
Current device is [0]
Current device is [0]
CUDA error at
C:\projects\cmake\cmaketesting\v3.2\cuda_basic\src\cuda_basic_test.cpp:67
code=46(cudaErrorDevicesUnavailable) "cudaMalloc((void **)&d_volume, size)"
and sigh!
There is some bizarre behavior going on here.
So CMake/Kitware I can get CUDA 7.5 to run samples with 384.94 driver
and CUDA 8.0 uninstalled but I cannot get CMake 3.2 using FindCUDA or
CMake 3.9 using project calls to build a simple CUDA app to allocate
memory on the device. What gives?
I have been using CMake since 2.8 and CUDA since 1.3 on C1060's and
mobile Quadros and never experienced this.
Clearly NVIDIA is to blame for the 7.5/8.0 cats in a bag fighting and
7.5 not working with itself and only working with 9.0 driver, but I
cannot get any 3.2 or 3.9 to generate a project I can run... this is
really strange... it's always just worked. If I could compile and run a
CUDA sdk app then I knew CMake would and has worked. What could
possibly be going on here?
More information about the CMake
mailing list