[vtkusers] timings for vtkImageReader2

Fri Oct 8 17:28:32 EDT 2004

Thank you to all for the comments.  

1.  I am developing this app for windows because the final user will be
using windows--although I much prefer linux :)

2.  I understand that I don't have a lot of memory to run such a large
dataset through, but as I said in an earlier message I have read the same
dataset with AMIRA, and it is significantly faster (sorry cannot remember
exact time, but if you look back through the list you can find it)  So why
is AMIRA faster?  I suspect that AMIRA takes advantage of memory mapping in
windows--although it is hard to tell because it is a commercial app and does
not provide source code.  Is there any way to do this with vtk or a vtk
class that already does this?

kate

-----Original Message-----
From: Sean McInerney [mailto:seanm at nmr.mgh.harvard.edu] 
Sent: Friday, October 08, 2004 1:15 PM
To: Mathieu Malaterre
Cc: Budd Hirons; vtk-users; kerekes at fastmail.fm
Subject: Re: [vtkusers] timings for vtkImageReader2

Hi Kate,

   Again I'll second Mathieu. You have far too little memory to be 
handling that much data. In memory your data alone occupies 268Mb before 
you even do anything to it. The memory occupied by your application and 
the VTK libraries that it links in are also significant ... so your 
application and data may be occupying around 350Mb in memory! ... and 
that is a conservative estimate.

   Since Windows XP is not known for being particularly lightweight, you 
are certainly causing a whole lot of its processes to be written out to 
swap. This massive slowdown is even compounded if the swap space is on 
the same disk from which you are reading.

   Using Linux will certainly improve your quality of life, but I think 
investing in, at least, another 512Mb of RAM is compulsory. I used to 
use 1Gb of RAM as standard for any machine expected to do any serious 
imaging work. With the advent of higher field scanners with increased 
resolutions and considering the relatively low price of RAM, I would 
suggest using 2Gb if possible. That will give you the room you need to 
support pipeline operations on your data once it is read in.

-Sean

Mathieu Malaterre wrote:
> Budd,
> 
>     512x512x512x2 = 268,435,456 so I don't see any problem with a 3s 
> reading time. I admit a have 1gig of memory so there is very few chance 
> I start swapping. Also to be honest the first time I run the c++ code it 
> take 8s, the second time it takes 3s
> 
> $ time ./bench                                                 ./bench  
> 1.05s user 1.97s system 99% cpu 3.021 total
> 
> 
>     My guess is:
> - you are swapping a lot (if you didn't ran defrag in a while swapping 
> would be very slow).
> - Close any other apps you are using to avoid having to swap.
> - Use linux :P
> 
> Mathieu
> Ps: sys info:
> 
> $ cat /proc/cpuinfo
> processor       : 0
> vendor_id       : GenuineIntel
> cpu family      : 15
> model           : 2
> model name      : Intel(R) Pentium(R) 4 CPU 2.80GHz
> stepping        : 9
> cpu MHz         : 2793.067
> cache size      : 512 KB
> physical id     : 0
> siblings        : 2
> fdiv_bug        : no
> hlt_bug         : no
> f00f_bug        : no
> coma_bug        : no
> fpu             : yes
> fpu_exception   : yes
> cpuid level     : 2
> wp              : yes
> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
> mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid
> runqueue        : 0
> 
> bogomips        : 5570.56
> 
> processor       : 1
> vendor_id       : GenuineIntel
> cpu family      : 15
> model           : 2
> model name      : Intel(R) Pentium(R) 4 CPU 2.80GHz
> stepping        : 9
> cpu MHz         : 2793.067
> cache size      : 512 KB
> physical id     : 0
> siblings        : 2
> fdiv_bug        : no
> hlt_bug         : no
> f00f_bug        : no
> coma_bug        : no
> fpu             : yes
> fpu_exception   : yes
> cpuid level     : 2
> wp              : yes
> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
> mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid
> runqueue        : 0
> 
> bogomips        : 5583.66
> 
> 
> /sbin/hdparm -t /dev/sda2
> 
> /dev/sda2:
>  Timing buffered disk reads:  160 MB in  3.00 seconds =  53.33 MB/sec
> 
> 
> 
> Budd Hirons wrote:
> 
>> 3 seconds! you have to let me in on your secret!
>>
>> UltraATA maximum transfer in the DMA mode 5 is only 100MB/s, so I 
>> guess it is possible, but unlikely, to read this in 3 seconds.  I ran 
>> this exercise on two machines, compiled from C++.
>>
>> machine 1 (p4,ultraATA drive on board,DMA,DDR ram) - 18s read
>> machine 2 (dual Xeon,SCSI on PCI adapter,no DMA,pc133 ram) - 66s read
>>
>> We regularly deal with volumes that exceed this one in size, and range 
>> to 800 megs on disk.  Load times to get very large chunks of data into 
>> can ram vary wildly, but it should not be expected to happen 
>> instantaneously.  45 seconds for larger datasets would not be off the 
>> mark in release on random equipment and with other processes likely 
>> running.
>>
>> Cheers,
>> Budd.
>>
>>
>>
>> Kate Kerekes wrote:
>>
>>> Hello again,
>>>
>>> I apologize for the length of this message.  I am still having 
>>> problems with
>>> the time it takes vtkImageReader2 to read 268 MB of data (a series of 
>>> 512
>>> files with 512x512 shorts each.)  I ran the following test program (from
>>> Mathieu Malaterre  translated to tcl by me) to see how long a read 
>>> takes:
>>>
>>> #!/bin/sh
>>> load vtkCommonTCL.dll
>>> load vtkFilteringTCL.dll
>>> load vtkGraphicsTCL.dll
>>> load vtkIOTCL.dll
>>> load vtkImagingTCL.dll
>>> load vtkRenderingTCL.dll
>>> load vtkHybridTCL.dll
>>> source Interactor.tcl
>>> source bindings-rw.tcl
>>> source bindings-iw.tcl
>>> source bindings.tcl
>>> source setget.tcl
>>>  
>>> console show
>>>
>>> set createImage yes
>>>
>>> if {$createImage=="yes"} {
>>> vtkImageNoiseSource noise
>>>    noise SetWholeExtent 0 511 0 511 0 511
>>>    noise SetMinimum 0.0
>>>    noise SetMaximum 1.0
>>>    [noise GetOutput] ReleaseDataFlagOn
>>>    puts "about to update vtkImageNoiseSource"
>>>    noise Update
>>>    puts "finished updating vtkImageNoiseSource"
>>>  
>>>    vtkImageCast cast
>>>    cast SetInput [noise GetOutput]
>>>    cast SetOutputScalarTypeToUnsignedShort     vtkImageWriter writer
>>>    writer SetInput [cast GetOutput]
>>>    writer SetFileDimensionality 3
>>>    writer SetFileName "bench.img"
>>>    puts "about to update writer"
>>>    writer Write
>>>    puts "finished updating writer"
>>> }
>>>  
>>>    vtkImageReader2 reader
>>>    [reader GetOutput] ReleaseDataFlagOn
>>>    reader SetDataExtent 0 511 0 511 0 511
>>>    reader SetDataScalarTypeToUnsignedShort
>>>    reader SetDataByteOrderToLittleEndian
>>>    reader SetFileName "bench.img"
>>>
>>>    puts "about to update reader"
>>>    set time1 [clock clicks -milliseconds]
>>>    reader Update
>>>    set time2 [clock clicks -milliseconds]
>>>    puts "finished updating reader"
>>>    puts "reader update took [expr $time2-$time1] milliseconds"
>>>
>>> The read takes 15 seconds on my machine (Windows XP with service pack 
>>> 2, AMD
>>> athlon 64 chipset 3000+, with 512 megs of memory).  I am reading data 
>>> from
>>> the IDE harddrive.  Mathieu's timing was 3 seconds.  This of course 
>>> is not
>>> the newest machine out there, but it is not that old either.
>>>
>>> Then when I try to read my series of slices, my machine takes 45 
>>> seconds.
>>> The only difference in reading the slice files and the bench.img 
>>> created by
>>> ImageNoiseSource is that the bench.img is all three dimensions in one 
>>> file
>>> (rather than a series of slices), and the bench.img file uses unsigned
>>> shorts rather than shorts (but both are 2 bytes, right?).
>>>
>>> So here are my thoughts as to why the problem could be occurring--any
>>> feedback would be appreciated:
>>> 1.  my computer is too slow--I don't think this is it
>>> 2.  the bench.img file only reads faster because it has just been 
>>> written to
>>> memory and is probably still in the cache and therefore takes less 
>>> time to
>>> access
>>> 3.  the fact that my slices are in series and 512 different files 
>>> have to be
>>> read is the slow down
>>> 4.  the code running in tcl is slower than code running in c (I can 
>>> fix this
>>> by calling a c function from tcl--but only if I think this is really the
>>> problem)
>>> 5.  there is some other problem that I am unaware of--this is my 
>>> worst fear
>>> :(
>>>
>>> Please help!  All of the other feedback I have gotten before says 
>>> that the
>>> read should take no more than a few seconds.  45 seconds is way too 
>>> long for
>>> our customer and this is not even the largest filesize that will be 
>>> used.
>>>
>>> Thanks,
>>> Kate Kerekes
>>>
>>>
>>>
>>> _______________________________________________
>>> This is the private VTK discussion list. Please keep messages 
>>> on-topic. Check the FAQ at: <http://public.kitware.com/cgi-bin/vtkfaq>
>>> Follow this link to subscribe/unsubscribe:
>>> http://www.vtk.org/mailman/listinfo/vtkusers
>>>
>> _______________________________________________
>> This is the private VTK discussion list. Please keep messages 
>> on-topic. Check the FAQ at: <http://public.kitware.com/cgi-bin/vtkfaq>
>> Follow this link to subscribe/unsubscribe:
>> http://www.vtk.org/mailman/listinfo/vtkusers
>>
> 
> 
> 
> _______________________________________________
> This is the private VTK discussion list. Please keep messages on-topic. 
> Check the FAQ at: <http://public.kitware.com/cgi-bin/vtkfaq>
> Follow this link to subscribe/unsubscribe:
> http://www.vtk.org/mailman/listinfo/vtkusers
>