[Insight-users] Performance regression ImageSeriesReader? (with test)

Tue Mar 23 18:05:38 EDT 2010

Roger,

Thanks for all of your help. We're getting close to a solution.

Bill

On Tue, Mar 23, 2010 at 2:57 PM, Roger Bramon Feixas
<rogerbramon at gmail.com> wrote:
> Bill,
> Now I'm at home. You will have the results of the experiment tomorrow
> morning.
> Thanks,
> Roger
>
> On Tue, Mar 23, 2010 at 8:44 PM, Bill Lorensen <bill.lorensen at gmail.com>
> wrote:
>>
>> Sounds good. So you are saying that in the
>> ImageSeriesReader<TOutputImage>
>> ::GenerateOutputInformation
>>
>> that the number of files that are processed will be only 2.
>>
>> Bill
>>
>> On Tue, Mar 23, 2010 at 12:39 PM, Bradley Lowekamp
>> <blowekamp at mail.nih.gov> wrote:
>> > Bill:
>> > My proposed solution is to use the old behavior but use a time stamp to
>> > avoid the extra updates of the MDDA when streaming.
>> > The requested region is set after UpdateOutputInformation is executed,
>> > so it
>> > can't be toggled during the this phase of execution.
>> >
>> > On Mar 23, 2010, at 3:28 PM, Bill Lorensen wrote:
>> >
>> > Can we check if streaming is on and revert to the old behavior if it is
>> > off?
>> >
>> > This is a huge performance penalty to support streaming which is
>> > important but no the usual use case.
>> >
>> >
>> > On Tue, Mar 23, 2010 at 12:12 PM, Bradley Lowekamp
>> > <blowekamp at mail.nih.gov> wrote:
>> >
>> > Bill,
>> >
>> > Going back would have horrible effects for streaming. It would make
>> > slice by
>> >
>> > slice streaming an n^2 algorithm, which is far worse then the current
>> > order
>> >
>> > of N hindrance for normals Updates. We must make some improvements from
>> > 2.8.
>> >
>> > If we declare the the MetaDataDictionary is suppose to be updated in the
>> >
>> > update data phase. ( the ImageFileReader does it in the
>> >
>> > UpdateOutputInformation phase ) Then the prior stated point 1 design
>> >
>> > requirement is gone. And the following solution come to mind:
>> >
>> > 1) Modify the GetMMDA methods to produce a warning if the update output
>> > data
>> >
>> > has not been called. This is to be nice if some users now expect
>> >
>> > UpdateOutputInformation to produce the MDDA.
>> >
>> > 2) Add a time stamp for the MMDA, so that when streaming the MMDA is
>> > only
>> >
>> > updated once and not every time a region is requested.
>> >
>> > Additionally I believe that we need better DICOM test data which include
>> >
>> > more tags similar to real world data.
>> >
>> > Brad
>> >
>> > On Mar 23, 2010, at 2:54 PM, Bill Lorensen wrote:
>> >
>> > The UpdateInformation is supposed to update origin, spacing,
>> >
>> > direction, pixel type, etc. I don't think it is supposed to completely
>> >
>> > populate the meta data dictionary. At least until itk 2.8 it did not.
>> >
>> > Why not revert back to the old behavior as a sort term fix.
>> >
>> > I think this performance hit needs to be repaired before we release
>> >
>> > 3.16. This has been causing major pain for Slicer3 users who
>> >
>> > frequently use dicom. Fortunately for us, Roger brought it to light.
>> >
>> > We missed it because our performance testing is weak.
>> >
>> > There are other issues for sure.
>> >
>> > Bill
>> >
>> > On Tue, Mar 23, 2010 at 11:45 AM, Bradley Lowekamp
>> >
>> > <blowekamp at mail.nih.gov> wrote:
>> >
>> > Bill,
>> >
>> > After my tests I agree that reading the headers in DICOM files is a
>> >
>> > surprisingly expensive operation as such it should be minimized. The
>> > coping
>> >
>> > of the MDAs is insignificant performance wise.  I believe that the best
>> >
>> > solution would be to have a dedicated DICOM series readers, which also
>> >
>> > removes the extra header reads needed for the name generation as well as
>> > the
>> >
>> > extra one in the UpdateOutputInformation.
>> >
>> > If we assume that the usually way to utilize the reader is to just
>> > Update,
>> >
>> > or stream Update, then the additional read of the headers appears
>> >
>> > unnecessary.
>> >
>> > I believe a solution would be to make the GetMDDA method smarter, and by
>> >
>> > default update this MDDA in the UpdateData. A time stamp would need to
>> > be
>> >
>> > used for the MDDA to check when it needs to be updated in the UpdateData
>> >
>> > methods. For streaming, the first time through would require reading all
>> > of
>> >
>> > the headers for the MDDA, this should bring the time stamp up to date.
>> > The
>> >
>> > GetMDDA methods could also check this timestamp and perform the reading
>> > of
>> >
>> > the headers if it's out of date. This is my best current idea on how to
>> >
>> > maintain the 1) and 2) I previously mentioned.
>> >
>> > Brad
>> >
>> > On Mar 23, 2010, at 12:33 PM, Bill Lorensen wrote:
>> >
>> > Brad,
>> >
>> > I have an itk 2.8 checkout. The difference is due to the processing of
>> >
>> > all files in the GenerateOutputInformation method. In the past, only
>> >
>> > two files were processed. If I restrict the number of files to 2
>> >
>> > rather that number of files, I get pretty reasonable speeds.
>> >
>> > Roger,
>> >
>> > As an experiment (and definitely not a fix!), can you in the method
>> >
>> > void ImageSeriesReader<TOutputImage>
>> >
>> > ::GenerateOutputInformation(void)
>> >
>> > change the line:
>> >
>> > for ( int i = 0; i != numberOfFiles; ++i )
>> >
>> > to
>> >
>> > for ( int i = 0; i != 2; ++i )
>> >
>> > and rerun your tests.
>> >
>> > Bill
>> >
>> >
>> > On Tue, Mar 23, 2010 at 8:59 AM, Bradley Lowekamp
>> >
>> > <blowekamp at mail.nih.gov> wrote:
>> >
>> > Bill,
>> >
>> > That is only the half of it. Every time an ImageFileReader is used 3
>> > MDDs
>> >
>> > (meta data dictionaries) are created, one in the ImageIO, one in the
>> >
>> > ImageFileReader, and one in the output Image. This is in addition to the
>> > two
>> >
>> > copies, you pointed out in ImageSeriesReader. Clearly reading with an
>> >
>> > ImageFileReader the MDD scales very poorly as the it's size increases. I
>> >
>> > still have the remaining performance questions:
>> >
>> > How much time is spent coping the MDD vs reading? (leaning towards
>> > reading
>> >
>> > as very expensive)
>> >
>> > As pointed out in Roger's most recent performance tests, there appears
>> > to be
>> >
>> > some additional performance problems in the UpdateData, part. This is
>> >
>> > independent of the additional MDD read in the UpdateOutputInformation.
>> > This
>> >
>> > is definitely another problem, perhaps inside the DICOM library.
>> >
>> > The change of moving (apparently duplicating) the copying to MDDs to the
>> > MDD
>> >
>> > array was added over a year ago, when streaming support was added. If I
>> >
>> > recall correctly the two motivating factors were 1) the MDD array is
>> > output
>> >
>> > information and logically should be updating during the
>> >
>> > UpdateOutputInformation part of the pipeline 2) when streaming each file
>> >
>> > should not need to be read to create the MMD array. I don't recall where
>> >
>> > this discussion took place right now.
>> >
>> > I will run some performance test to try to figure out where the time is
>> >
>> > being spent. Without changing 1 from above, I am not sure how much could
>> > be
>> >
>> > gained.
>> >
>> > Looking at the performance numbers of the Read Directory part, I would
>> > guess
>> >
>> > that the meta data is also read there. I believe that an idea solution
>> > would
>> >
>> > only read this information once. But that is beyond this scope.
>> >
>> > Brad
>> >
>> > On Mar 22, 2010, at 11:20 PM, Bill Lorensen wrote:
>> >
>> > Brad,
>> >
>> > It looks like the meta data array is populated in both the
>> >
>> > GenerateOutputInformation and GenerateData. Also all slices are
>> >
>> > processed in GenerateOutputInformation. In 2.8, only 2 slices were
>> >
>> > processed.
>> >
>> > Why were these changes made? We are also seeing bad dicom performance
>> >
>> > in Slicer3.
>> >
>> > Bill
>> >
>> > On Mon, Mar 22, 2010 at 6:24 AM, Bradley Lowekamp
>> >
>> > <blowekamp at mail.nih.gov> wrote:
>> >
>> > Hello,
>> >
>> > Can you please tell us a little more about your test data and computer.
>> > What
>> >
>> > kind of file system is the data on ( locale or network)? How much memory
>> >
>> > does the computer have? What is the size of the data? What is the native
>> >
>> > pixel type of the data? What are the actual timings? Does the execution
>> > seem
>> >
>> > to be CPU or IO bound?
>> >
>> > One of the changes that was made to the class was to populate the
>> >
>> > MetaDataArray in the UpdataOutputInformation phase of the instead of the
>> >
>> > UpdateOutputData part. This should be just reading the headers of the
>> > files
>> >
>> > in the series. There were several reasons this change was made. To help
>> >
>> > determine the cause of your slowness, lets break up the timing a little
>> >
>> > further.
>> >
>> > Could you please call:
>> >
>> > start timer
>> >
>> > reader->UpdateOutputInformation();
>> >
>> > lap timer
>> >
>> > reader->UpdateLargestPossibleRegion();
>> >
>> > stop timer
>> >
>> > And post the timing results.
>> >
>> > Thanks,
>> >
>> > Brad
>> >
>> > On Mar 21, 2010, at 2:52 PM, Roger Bramon Feixas wrote:
>> >
>> > This week we updated our ITK version from 2.8 to 3.16  and we noticed
>> > the
>> >
>> > medical models are loading 2x slower using the 3.16 ITK version. We use
>> >
>> > itk::ImageSeriesReader and the problem is focused in its Update()
>> > method.
>> >
>> > I attached a simple test program which reproduces the problem and where
>> > we
>> >
>> > can see that the Update() method is 2 times slower using ITK 3.16 vs.
>> > ITK
>> >
>> > 2.8.
>> >
>> > We compiled both versions using Visual Studio 2008 on Windows XP 32bits
>> > and
>> >
>> >  we don't known if this problem also occurs in other platforms.
>> >
>> > I wonder if other itk users have this same performance problem and if
>> > there
>> >
>> > is anybody can help us in order to solve it.
>> >
>> > Thanks!
>> >
>> > Roger
>> >
>> >
>> > ========================================================
>> >
>> > Bradley Lowekamp
>> >
>> > Lockheed Martin Contractor for
>> >
>> > Office of High Performance Computing and Communications
>> >
>> > National Library of Medicine
>> >
>> > blowekamp at mail.nih.gov
>> >
>> >
>> >
>> > ========================================================
>> >
>> > Bradley Lowekamp
>> >
>> > Lockheed Martin Contractor for
>> >
>> > Office of High Performance Computing and Communications
>> >
>> > National Library of Medicine
>> >
>> > blowekamp at mail.nih.gov
>> >
>> >
>
>