[ITK] [ITK-users] GPUDiscreteGaussian not working
Jose Ignacio Prieto
joseignacio.prieto at gmail.com
Wed Apr 16 09:24:59 EDT 2014
Hi Jim,
It had a different problem when using float. It would show a NAN on the
results. That's why I changed to short.
The card has 4GB ram.
On Tue, Apr 15, 2014 at 7:40 PM, Jim Miller <millerjv at gmail.com> wrote:
> Does the test for GPUDiscreteGaussian run on your platform?
>
> The test uses a pixel type of float. Your code does not. You might try
> float.
>
> The Gaussian filter will require much more GPU memory than the mean
> filter. How much memory does your GPU have?
>
> Jim
>
> On Apr 15, 2014, at 11:18 AM, Jose Ignacio Prieto <
> joseignacio.prieto at gmail.com> wrote:
>
> Hi all, I am having trouble using GPUdiscretegaussian. It works for me on
> CPU but GPU version gives output 0. I tried running the test code but no
> help. I do run GPUMean filter. My card is AMDw7000 and using opencl 1.2,
> itk 4.6
>
> Here is the code and the output. The images are vtk files of 320x320x231,
> ushort.
>
> /*=========================================================================
>
> *
>
> * Copyright Insight Software Consortium
>
> *
>
> * Licensed under the Apache License, Version 2.0 (the "License");
>
> * you may not use this file except in compliance with the License.
>
> * You may obtain a copy of the License at
>
> *
>
> * http://www.apache.org/licenses/LICENSE-2.0.txt
>
> *
>
> * Unless required by applicable law or agreed to in writing, software
>
> * distributed under the License is distributed on an "AS IS" BASIS,
>
> * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>
> * See the License for the specific language governing permissions and
>
> * limitations under the License.
>
> *
>
> *=========================================================================*/
>
>
> #include "itkImageFileReader.h"
>
> #include "itkImageFileWriter.h"
>
>
> #include "itkGPUImage.h"
>
> #include "itkGPUKernelManager.h"
>
> #include "itkGPUContextManager.h"
>
> #include "itkGPUImageToImageFilter.h"
>
> #include "itkGPUNeighborhoodOperatorImageFilter.h"
>
>
> #include "itkTimeProbe.h"
>
> #include "itkGaussianOperator.h"
>
>
> #include "itkDiscreteGaussianImageFilter.h"
>
> #include "itkGPUDiscreteGaussianImageFilter.h"
>
> #include "itkMeanImageFilter.h"
>
> #include "itkGPUMeanImageFilter.h"
>
>
> // typedef float InputPixelType;
>
> // typedef float OutputPixelType;
>
> typedef short InputPixelType;
>
> typedef short OutputPixelType;
>
>
> typedef itk::GPUImage< InputPixelType, 3 > InputImageType;
>
> typedef itk::GPUImage< OutputPixelType, 3 > OutputImageType;
>
>
>
>
> typedef itk::ImageFileReader< InputImageType > ReaderType;
>
> typedef itk::ImageFileWriter< OutputImageType > WriterType;
>
>
>
>
> int main(int argc, char *argv[])
>
> {
>
> if(!itk::IsGPUAvailable())
>
> {
>
> std::cerr << "OpenCL-enabled GPU is not present." << std::endl;
>
> return EXIT_FAILURE;
>
> }
>
>
> if( argc < 3 )
>
> {
>
> std::cerr << "Error: missing arguments" << std::endl;
>
> std::cerr << "inputfile outputfile [num_dimensions]" << std::endl;
>
> return EXIT_FAILURE;
>
> }
>
>
> std::string inFile( argv[1] );
>
> std::string outFile( argv[2] );
>
>
> unsigned int dim = 3;
>
> ReaderType::Pointer reader;
>
> WriterType::Pointer writer;
>
> reader = ReaderType::New();
>
> writer = WriterType::New();
>
>
> reader->SetFileName( inFile );
>
> writer->SetFileName( outFile );
>
>
> float variance = 4.0;
>
>
> // test 1~8 threads for CPU
>
> int nThreads = 8;
>
>
> typedef itk::DiscreteGaussianImageFilter< InputImageType, OutputImageType> CPUFilterType;
>
> CPUFilterType::Pointer CPUFilter = CPUFilterType::New();
>
> itk::TimeProbe cputimer;
>
> cputimer.Start();
>
> CPUFilter->SetNumberOfThreads( nThreads );
>
> CPUFilter->SetInput( reader->GetOutput() );
>
> CPUFilter->SetMaximumKernelWidth(10);
>
> CPUFilter->SetUseImageSpacingOff();
>
> CPUFilter->SetVariance( variance );
>
> CPUFilter->Update();
>
> cputimer.Stop();
>
>
> // typedef itk::MeanImageFilter< InputImageType, OutputImageType> CPUFilterType;
>
> // CPUFilterType::Pointer CPUFilter = CPUFilterType::New();
>
> // itk::TimeProbe cputimer;
>
> // cputimer.Start();
>
> // CPUFilter->SetNumberOfThreads( nThreads );
>
> // CPUFilter->SetInput( reader->GetOutput() );
>
> //// CPUFilter->SetMaximumKernelWidth(10);
>
> //// CPUFilter->SetUseImageSpacingOff();
>
> // CPUFilter->SetRadius( variance );
>
> // CPUFilter->Update();
>
> // cputimer.Stop();
>
>
> std::cout << "CPU Gaussian Filter took " << cputimer.GetMean() << " seconds with "
>
> << CPUFilter->GetNumberOfThreads() << " threads.\n" << std::endl;
>
>
> // -------
>
>
> typedef itk::GPUDiscreteGaussianImageFilter< InputImageType, OutputImageType> GPUFilterType;
>
> GPUFilterType::Pointer GPUFilter = GPUFilterType::New();
>
> itk::TimeProbe gputimer;
>
> gputimer.Start();
>
> GPUFilter->SetInput( reader->GetOutput() );
>
> GPUFilter->SetVariance( variance );
>
> GPUFilter->SetMaximumKernelWidth(10);
>
> GPUFilter->SetUseImageSpacingOff();
>
> // GPUFilter->DebugOn();
>
> // GPUFilter->GPUEnabledOff();
>
> GPUFilter->Print(std::cout);
>
> GPUFilter->Update();
>
> GPUFilter->GetOutput()->UpdateBuffers(); // synchronization point (GPU->CPU memcpy)
>
> gputimer.Stop();
>
> std::cout << "GPU Gaussian Filter took " << gputimer.GetMean() << " seconds.\n" << std::endl;
>
>
> // typedef itk::GPUMeanImageFilter< InputImageType, OutputImageType> GPUFilterType;
>
> // GPUFilterType::Pointer GPUFilter = GPUFilterType::New();
>
> // itk::TimeProbe gputimer;
>
> // gputimer.Start();
>
> // GPUFilter->SetInput( reader->GetOutput() );
>
> //// GPUFilter->SetVariance( variance );
>
> //// GPUFilter->SetMaximumKernelWidth(10);
>
> //// GPUFilter->SetUseImageSpacingOff();
>
> //// GPUFilter->DebugOn();
>
> //// GPUFilter->Print(std::cout);
>
> // GPUFilter->SetRadius( variance );
>
> // GPUFilter->Update();
>
> // GPUFilter->GetOutput()->UpdateBuffers(); // synchronization point (GPU->CPU memcpy)
>
> // gputimer.Stop();
>
> // std::cout << "GPU Gaussian Filter took " << gputimer.GetMean() << " seconds.\n" << std::endl;
>
>
> // ---------------
>
> // RMS Error check
>
> // ---------------
>
>
> double diff = 0;
>
> unsigned int nPix = 0;
>
> itk::ImageRegionIterator<OutputImageType> cit(CPUFilter->GetOutput(), CPUFilter->GetOutput()->GetLargestPossibleRegion());
>
> itk::ImageRegionIterator<OutputImageType> git(GPUFilter->GetOutput(), GPUFilter->GetOutput()->GetLargestPossibleRegion());
>
>
> for(cit.GoToBegin(), git.GoToBegin(); !cit.IsAtEnd(); ++cit, ++git)
>
> {
>
> double err = (double)(cit.Get()) - (double)(git.Get());
>
> // if(err > 0.1 || (double)cit.Get() < 0.1) std::cout << "CPU : " << (double)(cit.Get()) << ", GPU : " << (double)(git.Get()) << std::endl;
>
> diff += err*err;
>
> nPix++;
>
> }
>
>
> writer->SetInput( GPUFilter->GetOutput() );
>
> // writer->SetInput( CPUFilter->GetOutput() );
>
> writer->Update();
>
>
> if (nPix > 0)
>
> {
>
> double RMSError = sqrt( diff / (double)nPix );
>
> std::cout << "RMS Error : " << RMSError << std::endl;
>
> // the CPU filter operator has type double
>
> // but the double precision is not well-supported on most GPUs
>
> // and by most drivers at this time. Therefore, the GPU filter
>
> // operator has type float
>
> // relax the RMS threshold here to allow for errors due to
>
> // differences in precision
>
> // NOTE:
>
> // a threshold of 1.2e-5 worked on linux and Mac, but not Windows
>
> // why?
>
> double RMSThreshold = 1.7e-5;
>
> if (vnl_math_isnan(RMSError))
>
> {
>
> std::cout << "RMS Error is NaN! nPix: " << nPix << std::endl;
>
> return EXIT_FAILURE;
>
> }
>
> if (RMSError > RMSThreshold)
>
> {
>
> std::cout << "RMS Error exceeds threshold (" << RMSThreshold << ")" << std::endl;
>
> return EXIT_FAILURE;
>
> }
>
> }
>
> else
>
> {
>
> std::cout << "No pixels in output!" << std::endl;
>
> return EXIT_FAILURE;
>
> }
>
>
> }
>
>
>
> OUTPUT
>
>
> Starting C:\DocsMaracuya\Build\Ejemplos\Gpu\GPUTest.exe...
>
> Platform : AMD Accelerated Parallel Processing
>
> Platform : AMD Accelerated Parallel Processing
>
> Pitcairn
>
> Maximum Work Item Sizes : { 256, 256, 256 }
>
> Maximum Work Group Size : 256
>
> Alignment in bits of the base address : 2048
>
> Smallest alignment in bytes for any data type : 128
>
> cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics
> cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics
> cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics
> cl_khr_int64_extended_atomics cl_khr_3d_image_writes
> cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32
> cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops
> cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing
> cl_amd_bus_addressable_memory cl_amd_c1x_atomics
>
> CPU Gaussian Filter took 1.70355 seconds with 8 threads.
>
>
> Defines: #define DIM_3
>
> #define INTYPE short
>
> #define OUTTYPE short
>
> #define OPTYPE short
>
>
> Defines: #define DIM_3
>
> #define INTYPE short
>
> #define OUTTYPE short
>
> #define OPTYPE short
>
>
> Defines: #define DIM_3
>
> #define INTYPE short
>
> #define OUTTYPE short
>
> #define OPTYPE short
>
>
> GPUDiscreteGaussianImageFilter (0000000002205DF0)
>
> RTTI typeinfo: class itk::GPUDiscreteGaussianImageFilter<class
> itk::GPUImage<short,3>,class itk::GPUImage<short,3> >
>
> Reference Count: 1
>
> Modified Time: 560
>
> Debug: Off
>
> Object Name:
>
> Observers:
>
> none
>
> Inputs:
>
> Primary: (000000000216E560) *
>
> Indexed Inputs:
>
> 0: Primary (000000000216E560)
>
> Required Input Names: Primary
>
> NumberOfRequiredInputs: 1
>
> Outputs:
>
> Primary: (000000000218A070)
>
> Indexed Outputs:
>
> 0: Primary (000000000218A070)
>
> NumberOfRequiredOutputs: 1
>
> Number Of Threads: 8
>
> ReleaseDataFlag: Off
>
> ReleaseDataBeforeUpdateFlag: Off
>
> AbortGenerateData: Off
>
> Progress: 0
>
> Multithreader:
>
> RTTI typeinfo: class itk::MultiThreader
>
> Reference Count: 1
>
> Modified Time: 499
>
> Debug: Off
>
> Object Name:
>
> Observers:
>
> none
>
> Thread Count: 8
>
> Global Maximum Number Of Threads: 128
>
> Global Default Number Of Threads: 8
>
> CoordinateTolerance: 1e-006
>
> DirectionTolerance: 1e-006
>
> Variance: [4, 4, 4]
>
> MaximumError: [0.01, 0.01, 0.01]
>
> MaximumKernelWidth: 10
>
> FilterDimensionality: 3
>
> UseImageSpacing: 0
>
> InternalNumberOfStreamDivisions: 9
>
> GPU: Enabled
>
> GPU Gaussian Filter took 0.111351 seconds.
>
>
> RMS Error : 26.4279
>
> RMS Error exceeds threshold (1.7e-005)
>
> C:\DocsMaracuya\Build\Ejemplos\Gpu\GPUTest.exe exited with code 1
>
>
> --
> José Ignacio Prieto
> celular(nuevo): 94348182
>
> _____________________________________
> Powered by www.kitware.com
>
> Visit other Kitware open-source projects at
> http://www.kitware.com/opensource/opensource.html
>
> Kitware offers ITK Training Courses, for more information visit:
> http://www.kitware.com/products/protraining.php
>
> Please keep messages on-topic and check the ITK FAQ at:
> http://www.itk.org/Wiki/ITK_FAQ
>
> Follow this link to subscribe/unsubscribe:
> http://www.itk.org/mailman/listinfo/insight-users
>
>
--
José Ignacio Prieto
celular(nuevo): 94348182
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://public.kitware.com/pipermail/community/attachments/20140416/a0770a7e/attachment-0002.html>
-------------- next part --------------
_____________________________________
Powered by www.kitware.com
Visit other Kitware open-source projects at
http://www.kitware.com/opensource/opensource.html
Kitware offers ITK Training Courses, for more information visit:
http://www.kitware.com/products/protraining.php
Please keep messages on-topic and check the ITK FAQ at:
http://www.itk.org/Wiki/ITK_FAQ
Follow this link to subscribe/unsubscribe:
http://www.itk.org/mailman/listinfo/insight-users
More information about the Community
mailing list