[vtkusers] Large performance difference between vtkResampleWithDataSet in VTK 8.1.0 and Resample with Dataset filter in Paraview 5.5

Tue May 15 16:47:44 EDT 2018

Hello Sujin,

Using the TimerLog, I got the following time from ParaView:

Execute vtkResampleWithDataSet id: 10345, 0.42 seconds

As for the timeit module, you can see how I use it in the attached Python
script.  I only use timeit's default_timer function to grab the time before
and after completion of the vtkResampleWithDataSet method and take the
difference as the time elapsed.  Regardless, qualitatively ParaView is
near-instant while VTK takes a while.

Google drive links to the datasets themselves are here (hopefully this
doesn't trigger any mailing list filters): Unstructured Grid (35MB)
<https://drive.google.com/open?id=1jvjiDlMJEJihB8OQneOeBzJXFiZKKYsR> |
Structured
Grid (70MB)
<https://drive.google.com/open?id=1RYz4eORPWWf23n6G5am-9_F44zxEOHMl>

If I get a chance, I'll take a look at using smaller data sets.

- Evan

On Tue, May 15, 2018 at 12:32 PM, Sujin Philip <sujin.philip at kitware.com>
wrote:

> Hi Evan,
>
> I tried testing this on my end and I am seeing expected performance from
> VTK and ParaView. But the performance is dependent on the datasets used. Is
> it possible for you to share your datasets and scripts with us? Could you
> try this with smaller versions of your datasets and see if you are able to
> reproduce this?
>
> I am not familiar with the timeit module in Python. From the documentation
> it looks like it runs the code multiple times by default and prints the
> total time. Can you confirm if you have taken this into consideration in
> your script?
>
> A simple way to time operations in ParaView is to refer to the "Timer Log"
> under the "Tools" menu. You should see a line like:
>
> Execute vtkResampleWithDataSet id: 6788, 2.70556 seconds
>
>
> Thanks
> Sujin
>
>
> On Tue, May 15, 2018 at 1:05 PM, Evan Kao <tossin at gmail.com> wrote:
>
>> Hi Shawn and Sujin,
>>
>> Thanks for the quick responses.  The CPU on the computer I'm using is an
>> i7-6700
>> <https://ark.intel.com/products/88196/Intel-Core-i7-6700-Processor-8M-Cache-up-to-4_00-GHz>
>> with 4 cores, 8 threads, and 3.4 GHz frequency.
>>
>> Multi-threading may be a factor, but it's hard to tell because resampling
>> in ParaView is so quick.  ParaView is capable of using 100% of the CPU,
>> while VTK (in Python) will max out at 12-13%.  However, for these
>> particular datasets, resampling doesn't appear to stress ParaView that much
>> (11-16% when observing the Windows Task Manager, and some of that may be
>> because of the rendering).  However, I was under the impression that at
>> best multi-threading could only reduce the time it takes by N threads (ie
>> 8x), while the speed difference here is almost 1000x.  I measured the times
>> for ParaView 5.5, VTK 8.1 (compiled elsewhere), and VTK 7.1 (compiled by
>> our group):
>>
>>    1. ParaView 5.5 - 1.1s, using a stopwatch, multiple trials. Timing
>>    started the moment I clicked "Apply".
>>    2. VTK 8.1 - 922.47s, timed using Python's timeit module, measuring
>>    only the vtkResampleWithDataSet.Update() method.
>>    3. VTK 7.1 - 950.47s, timed the same way as above.
>>
>> I'm aware of the difference in labeling between VTK and ParaView for
>> Source and Input (which confuses me all the time).  I can verify the
>> correct data sets were assigned by saving the output (which should an
>> unstructured grid) and viewing it in ParaView - it looks identical to the
>> resampled data generated in ParaView (although it overwrites the point
>> scalars array and adds some ghost information that needs to be removed).
>>
>> Thanks,
>> Evan
>>
>> On Tue, May 15, 2018 at 7:38 AM, Sujin Philip <sujin.philip at kitware.com>
>> wrote:
>>
>>> Hi Evan,
>>>
>>> As Shawn mentioned it could be due to lack of multi-threading. Could you
>>> provide us the configuration of the system you are using? Like the number
>>> of cores/threads and the CPU frequency? Also please share the actual time
>>> that ParaView and VTK are taking. Is it possible for you to try out a
>>> slightly older VTK version and see if the performance difference is still
>>> there?
>>>
>>> Which dataset are you setting as input and which as source? The names
>>> are unfortunately opposite between VTK-m and ParaView due to legacy
>>> reasons. Probing with the unstructured grid as the source is much slower
>>> than probing with the structured grid as the source. So please confirm that
>>> the VTK pipeline is set up properly.
>>>
>>> Please let me know if none these seem to be the cause of your problem
>>> and I will dig deeper.
>>>
>>> Thanks
>>> Sujin
>>>
>>>
>>>
>>> On Tue, May 15, 2018 at 9:52 AM, Shawn Waldon <shawn.waldon at kitware.com>
>>> wrote:
>>>
>>>> Hi Evan,
>>>>
>>>> I suspect the differece is that the ParaView binaries were compiled
>>>> with TBB multithreading support and the Anaconda VTK was not.
>>>> vtkResampleWithDataSet is set up to use TBB multithreading if available.
>>>> Check the utilization of the cores on your computer when running each and
>>>> you will see ParaView using all available cores and Anaconda's VTK probably
>>>> only using one.  It is also possible the cell locator change improved
>>>> things further but I'm not familiar with that.
>>>>
>>>> HTH,
>>>>
>>>> Shawn
>>>>
>>>> On Mon, May 14, 2018 at 7:54 PM, Evan Kao <tossin at gmail.com> wrote:
>>>>
>>>>> Hello all,
>>>>>
>>>>> I am trying to resample a structured grid data (~1.4M points, 1.3M
>>>>> cells) with an unstructured grid (~320K points, 480K cells).  In Paraview
>>>>> 5.5, this resampling is nearly instant with the Resample With Dataset
>>>>> filter.  Yet in a Python script using vtkResampleWithDataSet from VTK
>>>>> 8.1.0, the same operation takes about 15 minutes (>2 orders of magnitude
>>>>> difference in speed).  As far as I can tell from the VTK repository on
>>>>> Gitlab, the only difference between the Paraview/release version and the
>>>>> 8.1.0 or 8.1.1 tagged releases is a switch in the cell locator.  Is this
>>>>> enough to explain the difference in the performance?  If not, could someone
>>>>> enlighten me as to what the possible factors are here?
>>>>>
>>>>> Also, if it matters, this is all on a Windows 7 64-bit machine.
>>>>> Paraview is installed from binaries, while VTK was downloaded from an
>>>>> Anaconda distribution compiled by a third party.
>>>>>
>>>>> Thanks for your time,
>>>>> Evan Kao
>>>>>
>>>>> _______________________________________________
>>>>> Powered by www.kitware.com
>>>>>
>>>>> Visit other Kitware open-source projects at
>>>>> http://www.kitware.com/opensource/opensource.html
>>>>>
>>>>> Please keep messages on-topic and check the VTK FAQ at:
>>>>> http://www.vtk.org/Wiki/VTK_FAQ
>>>>>
>>>>> Search the list archives at: http://markmail.org/search/?q=vtkusers
>>>>>
>>>>> Follow this link to subscribe/unsubscribe:
>>>>> https://vtk.org/mailman/listinfo/vtkusers
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Powered by www.kitware.com
>>>>
>>>> Visit other Kitware open-source projects at
>>>> http://www.kitware.com/opensource/opensource.html
>>>>
>>>> Please keep messages on-topic and check the VTK FAQ at:
>>>> http://www.vtk.org/Wiki/VTK_FAQ
>>>>
>>>> Search the list archives at: http://markmail.org/search/?q=vtkusers
>>>>
>>>> Follow this link to subscribe/unsubscribe:
>>>> https://vtk.org/mailman/listinfo/vtkusers
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://vtk.org/pipermail/vtkusers/attachments/20180515/51575cc2/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test_vtkresamplewithdataset.py
Type: application/octet-stream
Size: 1287 bytes
Desc: not available
URL: <https://vtk.org/pipermail/vtkusers/attachments/20180515/51575cc2/attachment.obj>