[ITK] [ITK-users] GPUDiscreteGaussian not working
Jim Miller
millerjv at gmail.com
Tue Apr 22 06:43:11 EDT 2014
Denis,
I am not following your recommendations for Jose.
Are you stating that sometimes ITK does not copy the result of the GPU filter back into the CPU memory?
A user should not have to use e methods you are directing Jose towards.
Jim
> On Apr 22, 2014, at 5:28 AM, Denis Shamonin <dshamoni at gmail.com> wrote:
>
> Hi Jose,
>
>
>
> The synchronization from CPU to GPU image and back, may or may not be triggered by the default in the current ITK implementation.
>
> You have to make an extra effort to control it. Basically, you have to make sure that GPU input image is allocated and copied from CPU to GPU,
>
> execute the filter which only use input and output images, copy GPU output image back to CPU.
>
>
>
> Ideally, you should not control it and that has to be managed by the ITK GPU pipeline itself (hided from the user), but this is not a case right now.
>
>
>
> There are few problems. First, what you get after calling reader->GetOutput() is normal ITK image that you passing to the GPU filter,
>
> while expecting GPU image at this moment. What you want is that GPU image is ready when you call reader->GetOutput() (created, allocated and copied to GPU).
>
> The second problem may happen right after calling the GPU filter, the memory for output are not copied back to CPU output image.
>
>
>
> What you should do is following:
>
> 1 Create GPU input image.
>
>
>
> 1.1 Register the GPUImageFactory before calling GPUReader = ReaderType::New();
>
> itk::ObjectFactoryBase::RegisterFactory( itk::GPUImageFactory::New() );
>
>
>
> At this moment ALL ITK images which are created will be itk::GPUImage's with memory allocated on GPU.
>
> Use it with care, you may end up with not intended copying to GPU when you modify this images.
>
> The reader once created will also have GPU image inside and that what you need.
>
> You may unregister factory right after you have used it.
>
>
>
> 1.2 Alternative way if registering factory is not possible for you application.
>
> But you still want to use GPU filter in the middle of your application you may consider following:
>
> gpuInputImage = GPUInputImageType::New();
>
> gpuInputImage->GraftITKImage( itkimage ); // normal itk image here
>
> gpuInputImage->AllocateGPU(); // allocate only on GPU
>
> gpuInputImage->GetGPUDataManager()->SetCPUBufferLock( true ); // we don't want to change it CPU input
>
> gpuInputImage->GetGPUDataManager()->SetGPUDirtyFlag( true ); // set gpu dirty flag
>
> gpuInputImage->GetGPUDataManager()->UpdateGPUBuffer(); // copy cpu -> gpu
>
>
>
> 2. Construct you filter, set input from the reader (or gpuInput image), call your filter.
>
> At the moment of construction GPU filter will create GPU output image for you.
>
>
>
> 3. Call extra synchronization step after GPUFilter->Update(); (listed below)
>
> itk::GPUExplicitSync< FilterType, OutputImageType >( GPUFilter, false );
>
>
>
> 4. Write results
>
>
>
> I hope that helps a bit. Making or using GPU filters with ITK is a bit of the challenge right now.
>
> Specially to get it working across all GPU cards available.
>
>
>
> You could check the correct execution for itkGPUShrinkImageFilter as example in The Insight Journal paper:
>
> http://www.insight-journal.org/browse/publication/884
>
>
>
> Regards,
>
> -Denis Shamonin
>
> Division of Image Processing (LKEB)
>
> Department of Radiology
>
> Leiden University Medical Center
>
> PO Box 9600, 2300 RC Leiden, The Netherlands
>
>
>
> //------------------------------------------------------------------------------
> // GPU explicit synchronization helper function
> template< class ImageToImageFilterType, class OutputImageType >
> void
> GPUExplicitSync( typename ImageToImageFilterType::Pointer & filter,
> const bool filterUpdate = true,
> const bool releaseGPUMemory = false )
> {
> if( filter.IsNotNull() )
> {
> if( filterUpdate )
> {
> filter->Update();
> }
>
> typedef typename OutputImageType::PixelType OutputImagePixelType;
> typedef GPUImage< OutputImagePixelType, OutputImageType::ImageDimension > GPUOutputImageType;
> GPUOutputImageType * GPUOutput = dynamic_cast< GPUOutputImageType * >( filter->GetOutput() );
> if( GPUOutput )
> {
> GPUOutput->UpdateBuffers();
> }
>
> if( releaseGPUMemory )
> {
> GPUOutput->GetGPUDataManager()->Initialize();
> }
> }
> else
> {
> itkGenericExceptionMacro( << "The filter pointer is null." );
> }
> }
>
>
>
>> On Wed, Apr 16, 2014 at 3:24 PM, Jose Ignacio Prieto <joseignacio.prieto at gmail.com> wrote:
>> Hi Jim,
>>
>> It had a different problem when using float. It would show a NAN on the results. That's why I changed to short.
>> The card has 4GB ram.
>>
>>
>>> On Tue, Apr 15, 2014 at 7:40 PM, Jim Miller <millerjv at gmail.com> wrote:
>>> Does the test for GPUDiscreteGaussian run on your platform?
>>>
>>> The test uses a pixel type of float. Your code does not. You might try float.
>>>
>>> The Gaussian filter will require much more GPU memory than the mean filter. How much memory does your GPU have?
>>>
>>> Jim
>>>
>>>> On Apr 15, 2014, at 11:18 AM, Jose Ignacio Prieto <joseignacio.prieto at gmail.com> wrote:
>>>>
>>>> Hi all, I am having trouble using GPUdiscretegaussian. It works for me on CPU but GPU version gives output 0. I tried running the test code but no help. I do run GPUMean filter. My card is AMDw7000 and using opencl 1.2, itk 4.6
>>>>
>>>> Here is the code and the output. The images are vtk files of 320x320x231, ushort.
>>>>
>>>> /*=========================================================================
>>>> *
>>>> * Copyright Insight Software Consortium
>>>> *
>>>> * Licensed under the Apache License, Version 2.0 (the "License");
>>>> * you may not use this file except in compliance with the License.
>>>> * You may obtain a copy of the License at
>>>> *
>>>> * http://www.apache.org/licenses/LICENSE-2.0.txt
>>>> *
>>>> * Unless required by applicable law or agreed to in writing, software
>>>> * distributed under the License is distributed on an "AS IS" BASIS,
>>>> * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>>>> * See the License for the specific language governing permissions and
>>>> * limitations under the License.
>>>> *
>>>> *=========================================================================*/
>>>>
>>>> #include "itkImageFileReader.h"
>>>> #include "itkImageFileWriter.h"
>>>>
>>>> #include "itkGPUImage.h"
>>>> #include "itkGPUKernelManager.h"
>>>> #include "itkGPUContextManager.h"
>>>> #include "itkGPUImageToImageFilter.h"
>>>> #include "itkGPUNeighborhoodOperatorImageFilter.h"
>>>>
>>>> #include "itkTimeProbe.h"
>>>> #include "itkGaussianOperator.h"
>>>>
>>>> #include "itkDiscreteGaussianImageFilter.h"
>>>> #include "itkGPUDiscreteGaussianImageFilter.h"
>>>> #include "itkMeanImageFilter.h"
>>>> #include "itkGPUMeanImageFilter.h"
>>>>
>>>> // typedef float InputPixelType;
>>>> // typedef float OutputPixelType;
>>>> typedef short InputPixelType;
>>>> typedef short OutputPixelType;
>>>>
>>>> typedef itk::GPUImage< InputPixelType, 3 > InputImageType;
>>>> typedef itk::GPUImage< OutputPixelType, 3 > OutputImageType;
>>>>
>>>>
>>>>
>>>> typedef itk::ImageFileReader< InputImageType > ReaderType;
>>>> typedef itk::ImageFileWriter< OutputImageType > WriterType;
>>>>
>>>>
>>>>
>>>> int main(int argc, char *argv[])
>>>> {
>>>> if(!itk::IsGPUAvailable())
>>>> {
>>>> std::cerr << "OpenCL-enabled GPU is not present." << std::endl;
>>>> return EXIT_FAILURE;
>>>> }
>>>>
>>>> if( argc < 3 )
>>>> {
>>>> std::cerr << "Error: missing arguments" << std::endl;
>>>> std::cerr << "inputfile outputfile [num_dimensions]" << std::endl;
>>>> return EXIT_FAILURE;
>>>> }
>>>>
>>>> std::string inFile( argv[1] );
>>>> std::string outFile( argv[2] );
>>>>
>>>> unsigned int dim = 3;
>>>> ReaderType::Pointer reader;
>>>> WriterType::Pointer writer;
>>>> reader = ReaderType::New();
>>>> writer = WriterType::New();
>>>>
>>>> reader->SetFileName( inFile );
>>>> writer->SetFileName( outFile );
>>>>
>>>> float variance = 4.0;
>>>>
>>>> // test 1~8 threads for CPU
>>>> int nThreads = 8;
>>>>
>>>> typedef itk::DiscreteGaussianImageFilter< InputImageType, OutputImageType> CPUFilterType;
>>>> CPUFilterType::Pointer CPUFilter = CPUFilterType::New();
>>>> itk::TimeProbe cputimer;
>>>> cputimer.Start();
>>>> CPUFilter->SetNumberOfThreads( nThreads );
>>>> CPUFilter->SetInput( reader->GetOutput() );
>>>> CPUFilter->SetMaximumKernelWidth(10);
>>>> CPUFilter->SetUseImageSpacingOff();
>>>> CPUFilter->SetVariance( variance );
>>>> CPUFilter->Update();
>>>> cputimer.Stop();
>>>>
>>>> // typedef itk::MeanImageFilter< InputImageType, OutputImageType> CPUFilterType;
>>>> // CPUFilterType::Pointer CPUFilter = CPUFilterType::New();
>>>> // itk::TimeProbe cputimer;
>>>> // cputimer.Start();
>>>> // CPUFilter->SetNumberOfThreads( nThreads );
>>>> // CPUFilter->SetInput( reader->GetOutput() );
>>>> //// CPUFilter->SetMaximumKernelWidth(10);
>>>> //// CPUFilter->SetUseImageSpacingOff();
>>>> // CPUFilter->SetRadius( variance );
>>>> // CPUFilter->Update();
>>>> // cputimer.Stop();
>>>>
>>>> std::cout << "CPU Gaussian Filter took " << cputimer.GetMean() << " seconds with "
>>>> << CPUFilter->GetNumberOfThreads() << " threads.\n" << std::endl;
>>>>
>>>> // -------
>>>>
>>>> typedef itk::GPUDiscreteGaussianImageFilter< InputImageType, OutputImageType> GPUFilterType;
>>>> GPUFilterType::Pointer GPUFilter = GPUFilterType::New();
>>>> itk::TimeProbe gputimer;
>>>> gputimer.Start();
>>>> GPUFilter->SetInput( reader->GetOutput() );
>>>> GPUFilter->SetVariance( variance );
>>>> GPUFilter->SetMaximumKernelWidth(10);
>>>> GPUFilter->SetUseImageSpacingOff();
>>>> // GPUFilter->DebugOn();
>>>> // GPUFilter->GPUEnabledOff();
>>>> GPUFilter->Print(std::cout);
>>>> GPUFilter->Update();
>>>> GPUFilter->GetOutput()->UpdateBuffers(); // synchronization point (GPU->CPU memcpy)
>>>> gputimer.Stop();
>>>> std::cout << "GPU Gaussian Filter took " << gputimer.GetMean() << " seconds.\n" << std::endl;
>>>>
>>>> // typedef itk::GPUMeanImageFilter< InputImageType, OutputImageType> GPUFilterType;
>>>> // GPUFilterType::Pointer GPUFilter = GPUFilterType::New();
>>>> // itk::TimeProbe gputimer;
>>>> // gputimer.Start();
>>>> // GPUFilter->SetInput( reader->GetOutput() );
>>>> //// GPUFilter->SetVariance( variance );
>>>> //// GPUFilter->SetMaximumKernelWidth(10);
>>>> //// GPUFilter->SetUseImageSpacingOff();
>>>> //// GPUFilter->DebugOn();
>>>> //// GPUFilter->Print(std::cout);
>>>> // GPUFilter->SetRadius( variance );
>>>> // GPUFilter->Update();
>>>> // GPUFilter->GetOutput()->UpdateBuffers(); // synchronization point (GPU->CPU memcpy)
>>>> // gputimer.Stop();
>>>> // std::cout << "GPU Gaussian Filter took " << gputimer.GetMean() << " seconds.\n" << std::endl;
>>>>
>>>> // ---------------
>>>> // RMS Error check
>>>> // ---------------
>>>>
>>>> double diff = 0;
>>>> unsigned int nPix = 0;
>>>> itk::ImageRegionIterator<OutputImageType> cit(CPUFilter->GetOutput(), CPUFilter->GetOutput()->GetLargestPossibleRegion());
>>>> itk::ImageRegionIterator<OutputImageType> git(GPUFilter->GetOutput(), GPUFilter->GetOutput()->GetLargestPossibleRegion());
>>>>
>>>> for(cit.GoToBegin(), git.GoToBegin(); !cit.IsAtEnd(); ++cit, ++git)
>>>> {
>>>> double err = (double)(cit.Get()) - (double)(git.Get());
>>>> // if(err > 0.1 || (double)cit.Get() < 0.1) std::cout << "CPU : " << (double)(cit.Get()) << ", GPU : " << (double)(git.Get()) << std::endl;
>>>> diff += err*err;
>>>> nPix++;
>>>> }
>>>>
>>>> writer->SetInput( GPUFilter->GetOutput() );
>>>> // writer->SetInput( CPUFilter->GetOutput() );
>>>> writer->Update();
>>>>
>>>> if (nPix > 0)
>>>> {
>>>> double RMSError = sqrt( diff / (double)nPix );
>>>> std::cout << "RMS Error : " << RMSError << std::endl;
>>>> // the CPU filter operator has type double
>>>> // but the double precision is not well-supported on most GPUs
>>>> // and by most drivers at this time. Therefore, the GPU filter
>>>> // operator has type float
>>>> // relax the RMS threshold here to allow for errors due to
>>>> // differences in precision
>>>> // NOTE:
>>>> // a threshold of 1.2e-5 worked on linux and Mac, but not Windows
>>>> // why?
>>>> double RMSThreshold = 1.7e-5;
>>>> if (vnl_math_isnan(RMSError))
>>>> {
>>>> std::cout << "RMS Error is NaN! nPix: " << nPix << std::endl;
>>>> return EXIT_FAILURE;
>>>> }
>>>> if (RMSError > RMSThreshold)
>>>> {
>>>> std::cout << "RMS Error exceeds threshold (" << RMSThreshold << ")" << std::endl;
>>>> return EXIT_FAILURE;
>>>> }
>>>> }
>>>> else
>>>> {
>>>> std::cout << "No pixels in output!" << std::endl;
>>>> return EXIT_FAILURE;
>>>> }
>>>>
>>>> }
>>>>
>>>>
>>>> OUTPUT
>>>>
>>>>
>>>> Starting C:\DocsMaracuya\Build\Ejemplos\Gpu\GPUTest.exe...
>>>> Platform : AMD Accelerated Parallel Processing
>>>> Platform : AMD Accelerated Parallel Processing
>>>> Pitcairn
>>>> Maximum Work Item Sizes : { 256, 256, 256 }
>>>> Maximum Work Group Size : 256
>>>> Alignment in bits of the base address : 2048
>>>> Smallest alignment in bytes for any data type : 128
>>>> cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_amd_bus_addressable_memory cl_amd_c1x_atomics
>>>> CPU Gaussian Filter took 1.70355 seconds with 8 threads.
>>>>
>>>> Defines: #define DIM_3
>>>> #define INTYPE short
>>>> #define OUTTYPE short
>>>> #define OPTYPE short
>>>>
>>>> Defines: #define DIM_3
>>>> #define INTYPE short
>>>> #define OUTTYPE short
>>>> #define OPTYPE short
>>>>
>>>> Defines: #define DIM_3
>>>> #define INTYPE short
>>>> #define OUTTYPE short
>>>> #define OPTYPE short
>>>>
>>>> GPUDiscreteGaussianImageFilter (0000000002205DF0)
>>>> RTTI typeinfo: class itk::GPUDiscreteGaussianImageFilter<class itk::GPUImage<short,3>,class itk::GPUImage<short,3> >
>>>> Reference Count: 1
>>>> Modified Time: 560
>>>> Debug: Off
>>>> Object Name:
>>>> Observers:
>>>> none
>>>> Inputs:
>>>> Primary: (000000000216E560) *
>>>> Indexed Inputs:
>>>> 0: Primary (000000000216E560)
>>>> Required Input Names: Primary
>>>> NumberOfRequiredInputs: 1
>>>> Outputs:
>>>> Primary: (000000000218A070)
>>>> Indexed Outputs:
>>>> 0: Primary (000000000218A070)
>>>> NumberOfRequiredOutputs: 1
>>>> Number Of Threads: 8
>>>> ReleaseDataFlag: Off
>>>> ReleaseDataBeforeUpdateFlag: Off
>>>> AbortGenerateData: Off
>>>> Progress: 0
>>>> Multithreader:
>>>> RTTI typeinfo: class itk::MultiThreader
>>>> Reference Count: 1
>>>> Modified Time: 499
>>>> Debug: Off
>>>> Object Name:
>>>> Observers:
>>>> none
>>>> Thread Count: 8
>>>> Global Maximum Number Of Threads: 128
>>>> Global Default Number Of Threads: 8
>>>> CoordinateTolerance: 1e-006
>>>> DirectionTolerance: 1e-006
>>>> Variance: [4, 4, 4]
>>>> MaximumError: [0.01, 0.01, 0.01]
>>>> MaximumKernelWidth: 10
>>>> FilterDimensionality: 3
>>>> UseImageSpacing: 0
>>>> InternalNumberOfStreamDivisions: 9
>>>> GPU: Enabled
>>>> GPU Gaussian Filter took 0.111351 seconds.
>>>>
>>>> RMS Error : 26.4279
>>>> RMS Error exceeds threshold (1.7e-005)
>>>> C:\DocsMaracuya\Build\Ejemplos\Gpu\GPUTest.exe exited with code 1
>>>>
>>>>
>>>> --
>>>> José Ignacio Prieto
>>>> celular(nuevo): 94348182
>>>> _____________________________________
>>>> Powered by www.kitware.com
>>>>
>>>> Visit other Kitware open-source projects at
>>>> http://www.kitware.com/opensource/opensource.html
>>>>
>>>> Kitware offers ITK Training Courses, for more information visit:
>>>> http://www.kitware.com/products/protraining.php
>>>>
>>>> Please keep messages on-topic and check the ITK FAQ at:
>>>> http://www.itk.org/Wiki/ITK_FAQ
>>>>
>>>> Follow this link to subscribe/unsubscribe:
>>>> http://www.itk.org/mailman/listinfo/insight-users
>>
>>
>>
>> --
>> José Ignacio Prieto
>> celular(nuevo): 94348182
>>
>> _____________________________________
>> Powered by www.kitware.com
>>
>> Visit other Kitware open-source projects at
>> http://www.kitware.com/opensource/opensource.html
>>
>> Kitware offers ITK Training Courses, for more information visit:
>> http://www.kitware.com/products/protraining.php
>>
>> Please keep messages on-topic and check the ITK FAQ at:
>> http://www.itk.org/Wiki/ITK_FAQ
>>
>> Follow this link to subscribe/unsubscribe:
>> http://www.itk.org/mailman/listinfo/insight-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://public.kitware.com/pipermail/community/attachments/20140422/8d28a13f/attachment-0002.html>
-------------- next part --------------
_____________________________________
Powered by www.kitware.com
Visit other Kitware open-source projects at
http://www.kitware.com/opensource/opensource.html
Kitware offers ITK Training Courses, for more information visit:
http://www.kitware.com/products/protraining.php
Please keep messages on-topic and check the ITK FAQ at:
http://www.itk.org/Wiki/ITK_FAQ
Follow this link to subscribe/unsubscribe:
http://www.itk.org/mailman/listinfo/insight-users
More information about the Community
mailing list