[Insight-users] Performance regression ImageSeriesReader? (with test)

Tue Mar 23 15:12:08 EDT 2010

Bill,

Going back would have horrible effects for streaming. It would make slice by slice streaming an n^2 algorithm, which is far worse then the current order of N hindrance for normals Updates. We must make some improvements from 2.8.

If we declare the the MetaDataDictionary is suppose to be updated in the update data phase. ( the ImageFileReader does it in the UpdateOutputInformation phase ) Then the prior stated point 1 design requirement is gone. And the following solution come to mind:

1) Modify the GetMMDA methods to produce a warning if the update output data has not been called. This is to be nice if some users now expect UpdateOutputInformation to produce the MDDA.
2) Add a time stamp for the MMDA, so that when streaming the MMDA is only updated once and not every time a region is requested.

Additionally I believe that we need better DICOM test data which include more tags similar to real world data.

Brad

On Mar 23, 2010, at 2:54 PM, Bill Lorensen wrote:

> The UpdateInformation is supposed to update origin, spacing,
> direction, pixel type, etc. I don't think it is supposed to completely
> populate the meta data dictionary. At least until itk 2.8 it did not.
> Why not revert back to the old behavior as a sort term fix.
> 
> I think this performance hit needs to be repaired before we release
> 3.16. This has been causing major pain for Slicer3 users who
> frequently use dicom. Fortunately for us, Roger brought it to light.
> We missed it because our performance testing is weak.
> 
> There are other issues for sure.
> 
> Bill
> 
> On Tue, Mar 23, 2010 at 11:45 AM, Bradley Lowekamp
> <blowekamp at mail.nih.gov> wrote:
>> Bill,
>> After my tests I agree that reading the headers in DICOM files is a
>> surprisingly expensive operation as such it should be minimized. The coping
>> of the MDAs is insignificant performance wise.  I believe that the best
>> solution would be to have a dedicated DICOM series readers, which also
>> removes the extra header reads needed for the name generation as well as the
>> extra one in the UpdateOutputInformation.
>> If we assume that the usually way to utilize the reader is to just Update,
>> or stream Update, then the additional read of the headers appears
>> unnecessary.
>> I believe a solution would be to make the GetMDDA method smarter, and by
>> default update this MDDA in the UpdateData. A time stamp would need to be
>> used for the MDDA to check when it needs to be updated in the UpdateData
>> methods. For streaming, the first time through would require reading all of
>> the headers for the MDDA, this should bring the time stamp up to date. The
>> GetMDDA methods could also check this timestamp and perform the reading of
>> the headers if it's out of date. This is my best current idea on how to
>> maintain the 1) and 2) I previously mentioned.
>> Brad
>> On Mar 23, 2010, at 12:33 PM, Bill Lorensen wrote:
>> 
>> Brad,
>> 
>> I have an itk 2.8 checkout. The difference is due to the processing of
>> all files in the GenerateOutputInformation method. In the past, only
>> two files were processed. If I restrict the number of files to 2
>> rather that number of files, I get pretty reasonable speeds.
>> 
>> Roger,
>> 
>> As an experiment (and definitely not a fix!), can you in the method
>> void ImageSeriesReader<TOutputImage>
>> ::GenerateOutputInformation(void)
>> 
>> change the line:
>> for ( int i = 0; i != numberOfFiles; ++i )
>> to
>> for ( int i = 0; i != 2; ++i )
>> 
>> and rerun your tests.
>> 
>> Bill
>> 
>> 
>> On Tue, Mar 23, 2010 at 8:59 AM, Bradley Lowekamp
>> <blowekamp at mail.nih.gov> wrote:
>> 
>> Bill,
>> 
>> That is only the half of it. Every time an ImageFileReader is used 3 MDDs
>> 
>> (meta data dictionaries) are created, one in the ImageIO, one in the
>> 
>> ImageFileReader, and one in the output Image. This is in addition to the two
>> 
>> copies, you pointed out in ImageSeriesReader. Clearly reading with an
>> 
>> ImageFileReader the MDD scales very poorly as the it's size increases. I
>> 
>> still have the remaining performance questions:
>> 
>> How much time is spent coping the MDD vs reading? (leaning towards reading
>> 
>> as very expensive)
>> 
>> As pointed out in Roger's most recent performance tests, there appears to be
>> 
>> some additional performance problems in the UpdateData, part. This is
>> 
>> independent of the additional MDD read in the UpdateOutputInformation. This
>> 
>> is definitely another problem, perhaps inside the DICOM library.
>> 
>> The change of moving (apparently duplicating) the copying to MDDs to the MDD
>> 
>> array was added over a year ago, when streaming support was added. If I
>> 
>> recall correctly the two motivating factors were 1) the MDD array is output
>> 
>> information and logically should be updating during the
>> 
>> UpdateOutputInformation part of the pipeline 2) when streaming each file
>> 
>> should not need to be read to create the MMD array. I don't recall where
>> 
>> this discussion took place right now.
>> 
>> I will run some performance test to try to figure out where the time is
>> 
>> being spent. Without changing 1 from above, I am not sure how much could be
>> 
>> gained.
>> 
>> Looking at the performance numbers of the Read Directory part, I would guess
>> 
>> that the meta data is also read there. I believe that an idea solution would
>> 
>> only read this information once. But that is beyond this scope.
>> 
>> Brad
>> 
>> On Mar 22, 2010, at 11:20 PM, Bill Lorensen wrote:
>> 
>> Brad,
>> 
>> It looks like the meta data array is populated in both the
>> 
>> GenerateOutputInformation and GenerateData. Also all slices are
>> 
>> processed in GenerateOutputInformation. In 2.8, only 2 slices were
>> 
>> processed.
>> 
>> Why were these changes made? We are also seeing bad dicom performance
>> 
>> in Slicer3.
>> 
>> Bill
>> 
>> On Mon, Mar 22, 2010 at 6:24 AM, Bradley Lowekamp
>> 
>> <blowekamp at mail.nih.gov> wrote:
>> 
>> Hello,
>> 
>> Can you please tell us a little more about your test data and computer. What
>> 
>> kind of file system is the data on ( locale or network)? How much memory
>> 
>> does the computer have? What is the size of the data? What is the native
>> 
>> pixel type of the data? What are the actual timings? Does the execution seem
>> 
>> to be CPU or IO bound?
>> 
>> One of the changes that was made to the class was to populate the
>> 
>> MetaDataArray in the UpdataOutputInformation phase of the instead of the
>> 
>> UpdateOutputData part. This should be just reading the headers of the files
>> 
>> in the series. There were several reasons this change was made. To help
>> 
>> determine the cause of your slowness, lets break up the timing a little
>> 
>> further.
>> 
>> Could you please call:
>> 
>> start timer
>> 
>> reader->UpdateOutputInformation();
>> 
>> lap timer
>> 
>> reader->UpdateLargestPossibleRegion();
>> 
>> stop timer
>> 
>> And post the timing results.
>> 
>> Thanks,
>> 
>> Brad
>> 
>> On Mar 21, 2010, at 2:52 PM, Roger Bramon Feixas wrote:
>> 
>> This week we updated our ITK version from 2.8 to 3.16  and we noticed the
>> 
>> medical models are loading 2x slower using the 3.16 ITK version. We use
>> 
>> itk::ImageSeriesReader and the problem is focused in its Update() method.
>> 
>> I attached a simple test program which reproduces the problem and where we
>> 
>> can see that the Update() method is 2 times slower using ITK 3.16 vs. ITK
>> 
>> 2.8.
>> 
>> We compiled both versions using Visual Studio 2008 on Windows XP 32bits and
>> 
>>  we don't known if this problem also occurs in other platforms.
>> 
>> I wonder if other itk users have this same performance problem and if there
>> 
>> is anybody can help us in order to solve it.
>> 
>> Thanks!
>> 
>> Roger

========================================================
Bradley Lowekamp  
Lockheed Martin Contractor for
Office of High Performance Computing and Communications
National Library of Medicine 
blowekamp at mail.nih.gov

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.itk.org/pipermail/insight-users/attachments/20100323/9998eaa6/attachment-0001.htm>