VTK/ImageBenchmark
In VTK 7.1, a benchmarking utility called ImageBenchmark was added to the VTK examples. This utility makes it easy to benchmark certain VTK image filters, in order to assist developers who wish to optimize their performance. The source code is in VTK/Examples/ImageProcessing/Cxx/. The benchmarking utility is written in C++ to make it available to as broad a variety of developers as possible, and also to avoid any concerns that wrapping might invalidate the results (though it most certainly would not).
Options
There are several command-line options that can be used to control how the benchmarking is performed.
- --runs N
- The number of runs to perform, 6 is recommended for accurate results
- --threads N
- Request a specific number of threads
- --threads N-M
- Repeat benchmark for each number of threads N through M
- --filter <filter_and_options>
- The filter to benchmark (see second column of the results pasted in the next section).
Here is the full list of options:
Usage: ImageBenchmark [options] Options: --runs N The number of runs to perform --threads N (or N-M or N,M,O) Request a certain number of threads --split-mode slab|beam|block Use the specified splitting mode --enable-smp on|off Use vtkSMPTools vs. vtkMultiThreader --clear-cache MBytes Attempt to clear CPU cache between runs --bytes-per-piece N Ask for N bytes per piece [65536] --min-piece-size XxYxZ Minimum dimensions per piece [16x1x1] --size XxYxZ The image size [256x256x256] --type uchar|short|float The data type for the input [short] --source <imagegsource> Set the data source [gaussian] --filter <filter>[:options] Set the filter to benchmark [median] --output filename.png Output middle slice as a png file. --units mvps|mvptps|seconds The output units (see below for details). --header Print a header line before the results. --verbose Print verbose output to stdout. --version Print the VTK version and exit. --help Print this message. This program prints benchmark results to stdout in csv format. The default units are megavoxels per second, but the --units option can specify units of seconds, megavoxels per second (mvps), or megavoxels per thread per second (mvptps). If more than three runs are done (by use of --runs), then the mean and standard deviation over all of the runs except the first will be printed (use --header to get the column headings). Sources: these are how the initial data set is produced. gaussian A centered 3D gaussian. noise Pseudo-random noise. grid A grid, for checking rotations. mandelbrot The mandelbrot set. Filters: these are the algorithms that can be benchmarked. median:kernelsize=3 Test vtkImageMedian3D. reslice:kernel=nearest Test vtkImageReslice (see below). resize:kernelsize=1 Test vtkImageResize. convolve:kernelsize=3 Test vtkImageConvolve. separable:kernelsize=3 Test vtkImageSeparableConvolution. gaussian:kernelsize=3 Test vtkImageGaussianSmooth. bspline:degree=3 Test vtkImageBSplineCoefficients. fft Test vtkImageFFT. histogram:stencil Test vtkImageHistogram. colormap:components=3 Test vtkImageMapToColors. The reslice filter takes the following options: stencil Spherical stencil (ignore voxels outside). kernel=nearest|linear|cubic|sinc|bspline The interpolator to use. kernelsize=4 The kernelsize (sinc, bspline only). rotation=0/0/0/0 Rotation angle (degrees) and axis. The colormap filter takes the following options: components=3 Output components (3=RGB, 4=RGBA). greyscale Rescale but do not apply a vtkLookupTable.
Sample Output
When run with no arguments, ImageBenchmark will give results for a variety of filters, as shown below. Results are reported in 'megavoxels per second', where a higher number indicates faster execution. The formatting is CSV (comma-separated value), which can be read by most plotting packages.
191.353,colormap:components=3 180.957,colormap:components=4 824.309,colormap:components=1:greyscale 799.908,colormap:components=2:greyscale 562.842,colormap:components=3:greyscale 452.914,colormap:components=4:greyscale 417.646,resize:kernelsize=1 151.029,resize:kernelsize=2 82.5898,resize:kernelsize=4 47.8107,resize:kernelsize=6 958.911,reslice:kernel=nearest:rotation=0/0/0/1 550.164,reslice:kernel=nearest:rotation=90/0/0/1 29.5369,reslice:kernel=nearest:rotation=90/0/1/0 101.706,reslice:kernel=nearest:rotation=45/0/0/1 39.1603,reslice:kernel=nearest:rotation=60/0/1/1 25.3316,reslice:kernel=linear:rotation=60/0/1/1 11.5771,reslice:kernel=cubic:rotation=60/0/1/1 8.26904,reslice:kernel=bspline:rotation=60/0/1/1 3.64946,reslice:kernel=sinc:rotation=60/0/1/1 5.42244,reslice:kernel=sinc:rotation=60/0/1/1:stencil 70.2169,gaussian:kernelsize=3 14.3122,convolve:kernelsize=3 7.81935,separable:kernelsize=3 111.109,resize:kernelsize=3 9.00278,median:kernelsize=3 1239.28,histogram 2086.42,histogram:stencil 11.6615,bspline:degree=3
When given a specific filter to benchmark, only results for that filter will be printed. Note that if --runs is greater than 2, the mean and standard deviation over all runs except the first will be computed (the first run is ignored because it is considered to be less reliable than the rest).
ImageBenchmark --filter median --runs 6 --header R0,R1,R2,R3,R4,R5,Average,StdDev 8.77757,8.26124,8.04351,9.01285,9.078,8.90538,8.6602,0.473952
When given a range of thread counts, results will be given for each thread count:
ImageBenchmark --filter median --runs 6 --split-mode block --size 256x256x256 Threads,R0,R1,R2,R3,R4,R5,Average,StdDev 1,9.16398,9.24523,9.20793,9.23568,9.2334,9.31117,9.24668,0.0385995 2,17.2775,17.525,17.5154,17.3023,17.339,17.4302,17.4224,0.100752 3,24.709,24.9179,24.8035,24.9186,24.9367,23.4266,24.6007,0.658473 4,32.6784,32.9238,32.825,32.8771,32.9512,33.0941,32.9342,0.101429 5,32.6383,32.8922,33.0193,32.9771,30.1123,33.0321,32.4066,1.2837 6,45.0957,45.6274,45.5593,45.5477,45.6546,45.5925,45.5963,0.0450496 7,44.9937,45.5494,45.5663,45.0725,44.7736,45.2493,45.2422,0.334605 8,67.1954,67.4694,67.4336,67.7359,67.3987,67.7304,67.5536,0.165785 9,67.2081,67.6509,67.7047,67.4066,67.6089,67.4696,67.5681,0.125472 10,67.1577,67.7227,67.8311,67.7851,67.8229,67.8133,67.795,0.0440062 11,67.2433,67.735,67.4784,67.7421,67.7862,66.6508,67.4785,0.478284 12,56.548,90.3603,81.876,88.6338,83.4625,82.8865,85.4438,3.79284
ImageBenchmarkDriver
There is another benchmarking example called ImageBenchmarkDriver which calls ImageBenchmark over and over again with various parameters. You can use it to find the best combination of DesiredBytesPerPiece and SplitMode for a particular filter. It takes the same arguments as ImageBenchmark, but it also takes an output directory as a parameter since it writes several output files:
Usage: ImageBenchmarkDriver --prefix <path/prefix> ... Options: --prefix <path/prefix> Prefix for output filenames. Any options from ImageBenchmark can also be used.
As an an example, it can be used like this:
ImageBenchmarkDriver --filter median --runs 6 --prefix mybenchmarks/ 1 of 48: mybenchmarks/SMP_Slab_1KiB_4096x4096.csv 2 of 48: mybenchmarks/SMP_Beam_1KiB_4096x4096.csv 3 of 48: mybenchmarks/SMP_Block_1KiB_4096x4096.csv 4 of 48: mybenchmarks/SMP_Slab_4KiB_4096x4096.csv
If --prefix is absent, results are written to the current directory:
ImageBenchmarkDriver --filter median --runs 6 1 of 48: SMP_Slab_1KiB_4096x4096.csv 2 of 48: SMP_Beam_1KiB_4096x4096.csv 3 of 48: SMP_Block_1KiB_4096x4096.csv 4 of 48: SMP_Slab_4KiB_4096x4096.csv