[vtkusers] Large performance difference between vtkResampleWithDataSet in VTK 8.1.0 and Resample with Dataset filter in Paraview 5.5

Evan Kao tossin at gmail.com
Tue May 15 18:28:43 EDT 2018


Hi Sujin,

I thought I had put in the correct source/input, but I suppose I should
have checked more closely.  Still, it didn't make that much of a difference
(634s or 10.5 min).  I'll continue checking if this problem persists for me
on other platforms.

I also tried using vtkResampleWithDataSet inside a Programmable Filter in
ParaView, and it performed quickly (0.98s).

Is it possible to see the build flags for your version of VTK, or the ones
that were used for the ParaView binaries?  Were you using testing on VTK
8.1 or the latest version?

Thanks,
Evan Kao

On Tue, May 15, 2018 at 2:36 PM, Sujin Philip <sujin.philip at kitware.com>
wrote:

> Hi Evan,
>
> Thanks for sharing the data. I tried your script on my Linux desktop and
> the performance I see on both VTK and ParaView is similar and <1 second.
> This is even without threading enabled. I haven't tried this on Windows yet.
>
> BTW, there is an error in the script you shared. The inputs to the
> resample filter are in the wrong order and the
> "vtkXMLUnstructuredGridWriter" throws an error saying that the data passed
> to it is not an unstructured grid. I assume you want to resample the data
> values from the structured grid on to the geometry provided by the
> unstructured grid. The result will be an unstructured grid, which can be
> written by the "vtkXMLUnstructuredGridWrite". For this the input should be
> the "mesh" data and the source should be "image".
>
> So, currently I don't have a good explanation for what is causing the
> performance degradation for you. It might be some issues with the builds,
> or the Windows setup. You can maybe try building VTK yourself (make sure to
> build in "Release" mode), or try another machine and see if the problem
> persists. I will also try to reproduce this on a Windows machine.
>
> Thanks
> Sujin
>
>
> On Tue, May 15, 2018 at 4:47 PM, Evan Kao <tossin at gmail.com> wrote:
>
>> Hello Sujin,
>>
>> Using the TimerLog, I got the following time from ParaView:
>>
>> Execute vtkResampleWithDataSet id: 10345, 0.42 seconds
>>
>> As for the timeit module, you can see how I use it in the attached Python
>> script.  I only use timeit's default_timer function to grab the time before
>> and after completion of the vtkResampleWithDataSet method and take the
>> difference as the time elapsed.  Regardless, qualitatively ParaView is
>> near-instant while VTK takes a while.
>>
>> Google drive links to the datasets themselves are here (hopefully this
>> doesn't trigger any mailing list filters): Unstructured Grid (35MB)
>> <https://drive.google.com/open?id=1jvjiDlMJEJihB8OQneOeBzJXFiZKKYsR> | Structured
>> Grid (70MB)
>> <https://drive.google.com/open?id=1RYz4eORPWWf23n6G5am-9_F44zxEOHMl>
>>
>> If I get a chance, I'll take a look at using smaller data sets.
>>
>> - Evan
>>
>>
>> On Tue, May 15, 2018 at 12:32 PM, Sujin Philip <sujin.philip at kitware.com>
>> wrote:
>>
>>> Hi Evan,
>>>
>>> I tried testing this on my end and I am seeing expected performance from
>>> VTK and ParaView. But the performance is dependent on the datasets used. Is
>>> it possible for you to share your datasets and scripts with us? Could you
>>> try this with smaller versions of your datasets and see if you are able to
>>> reproduce this?
>>>
>>> I am not familiar with the timeit module in Python. From the
>>> documentation it looks like it runs the code multiple times by default and
>>> prints the total time. Can you confirm if you have taken this into
>>> consideration in your script?
>>>
>>> A simple way to time operations in ParaView is to refer to the "Timer
>>> Log" under the "Tools" menu. You should see a line like:
>>>
>>> Execute vtkResampleWithDataSet id: 6788, 2.70556 seconds
>>>
>>>
>>> Thanks
>>> Sujin
>>>
>>>
>>> On Tue, May 15, 2018 at 1:05 PM, Evan Kao <tossin at gmail.com> wrote:
>>>
>>>> Hi Shawn and Sujin,
>>>>
>>>> Thanks for the quick responses.  The CPU on the computer I'm using is
>>>> an i7-6700
>>>> <https://ark.intel.com/products/88196/Intel-Core-i7-6700-Processor-8M-Cache-up-to-4_00-GHz>
>>>> with 4 cores, 8 threads, and 3.4 GHz frequency.
>>>>
>>>> Multi-threading may be a factor, but it's hard to tell because
>>>> resampling in ParaView is so quick.  ParaView is capable of using 100% of
>>>> the CPU, while VTK (in Python) will max out at 12-13%.  However, for these
>>>> particular datasets, resampling doesn't appear to stress ParaView that much
>>>> (11-16% when observing the Windows Task Manager, and some of that may be
>>>> because of the rendering).  However, I was under the impression that at
>>>> best multi-threading could only reduce the time it takes by N threads (ie
>>>> 8x), while the speed difference here is almost 1000x.  I measured the times
>>>> for ParaView 5.5, VTK 8.1 (compiled elsewhere), and VTK 7.1 (compiled by
>>>> our group):
>>>>
>>>>    1. ParaView 5.5 - 1.1s, using a stopwatch, multiple trials. Timing
>>>>    started the moment I clicked "Apply".
>>>>    2. VTK 8.1 - 922.47s, timed using Python's timeit module, measuring
>>>>    only the vtkResampleWithDataSet.Update() method.
>>>>    3. VTK 7.1 - 950.47s, timed the same way as above.
>>>>
>>>> I'm aware of the difference in labeling between VTK and ParaView for
>>>> Source and Input (which confuses me all the time).  I can verify the
>>>> correct data sets were assigned by saving the output (which should an
>>>> unstructured grid) and viewing it in ParaView - it looks identical to the
>>>> resampled data generated in ParaView (although it overwrites the point
>>>> scalars array and adds some ghost information that needs to be removed).
>>>>
>>>> Thanks,
>>>> Evan
>>>>
>>>> On Tue, May 15, 2018 at 7:38 AM, Sujin Philip <sujin.philip at kitware.com
>>>> > wrote:
>>>>
>>>>> Hi Evan,
>>>>>
>>>>> As Shawn mentioned it could be due to lack of multi-threading. Could
>>>>> you provide us the configuration of the system you are using? Like the
>>>>> number of cores/threads and the CPU frequency? Also please share the actual
>>>>> time that ParaView and VTK are taking. Is it possible for you to try out a
>>>>> slightly older VTK version and see if the performance difference is still
>>>>> there?
>>>>>
>>>>> Which dataset are you setting as input and which as source? The names
>>>>> are unfortunately opposite between VTK-m and ParaView due to legacy
>>>>> reasons. Probing with the unstructured grid as the source is much slower
>>>>> than probing with the structured grid as the source. So please confirm that
>>>>> the VTK pipeline is set up properly.
>>>>>
>>>>> Please let me know if none these seem to be the cause of your problem
>>>>> and I will dig deeper.
>>>>>
>>>>> Thanks
>>>>> Sujin
>>>>>
>>>>>
>>>>>
>>>>> On Tue, May 15, 2018 at 9:52 AM, Shawn Waldon <
>>>>> shawn.waldon at kitware.com> wrote:
>>>>>
>>>>>> Hi Evan,
>>>>>>
>>>>>> I suspect the differece is that the ParaView binaries were compiled
>>>>>> with TBB multithreading support and the Anaconda VTK was not.
>>>>>> vtkResampleWithDataSet is set up to use TBB multithreading if available.
>>>>>> Check the utilization of the cores on your computer when running each and
>>>>>> you will see ParaView using all available cores and Anaconda's VTK probably
>>>>>> only using one.  It is also possible the cell locator change improved
>>>>>> things further but I'm not familiar with that.
>>>>>>
>>>>>> HTH,
>>>>>>
>>>>>> Shawn
>>>>>>
>>>>>> On Mon, May 14, 2018 at 7:54 PM, Evan Kao <tossin at gmail.com> wrote:
>>>>>>
>>>>>>> Hello all,
>>>>>>>
>>>>>>> I am trying to resample a structured grid data (~1.4M points, 1.3M
>>>>>>> cells) with an unstructured grid (~320K points, 480K cells).  In Paraview
>>>>>>> 5.5, this resampling is nearly instant with the Resample With Dataset
>>>>>>> filter.  Yet in a Python script using vtkResampleWithDataSet from VTK
>>>>>>> 8.1.0, the same operation takes about 15 minutes (>2 orders of magnitude
>>>>>>> difference in speed).  As far as I can tell from the VTK repository on
>>>>>>> Gitlab, the only difference between the Paraview/release version and the
>>>>>>> 8.1.0 or 8.1.1 tagged releases is a switch in the cell locator.  Is this
>>>>>>> enough to explain the difference in the performance?  If not, could someone
>>>>>>> enlighten me as to what the possible factors are here?
>>>>>>>
>>>>>>> Also, if it matters, this is all on a Windows 7 64-bit machine.
>>>>>>> Paraview is installed from binaries, while VTK was downloaded from an
>>>>>>> Anaconda distribution compiled by a third party.
>>>>>>>
>>>>>>> Thanks for your time,
>>>>>>> Evan Kao
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Powered by www.kitware.com
>>>>>>>
>>>>>>> Visit other Kitware open-source projects at
>>>>>>> http://www.kitware.com/opensource/opensource.html
>>>>>>>
>>>>>>> Please keep messages on-topic and check the VTK FAQ at:
>>>>>>> http://www.vtk.org/Wiki/VTK_FAQ
>>>>>>>
>>>>>>> Search the list archives at: http://markmail.org/search/?q=vtkusers
>>>>>>>
>>>>>>> Follow this link to subscribe/unsubscribe:
>>>>>>> https://vtk.org/mailman/listinfo/vtkusers
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Powered by www.kitware.com
>>>>>>
>>>>>> Visit other Kitware open-source projects at
>>>>>> http://www.kitware.com/opensource/opensource.html
>>>>>>
>>>>>> Please keep messages on-topic and check the VTK FAQ at:
>>>>>> http://www.vtk.org/Wiki/VTK_FAQ
>>>>>>
>>>>>> Search the list archives at: http://markmail.org/search/?q=vtkusers
>>>>>>
>>>>>> Follow this link to subscribe/unsubscribe:
>>>>>> https://vtk.org/mailman/listinfo/vtkusers
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://vtk.org/pipermail/vtkusers/attachments/20180515/a379d6a6/attachment.html>


More information about the vtkusers mailing list