[Insight-developers] Going from Non-Streaming to Streaming I/O for a class

Tue Apr 26 09:54:06 EDT 2011

Hello Kent,

I asume you have "read" the ImageFileWriter, and ImageFileReader, to know how the region negations occur to get streaming to work?

On Apr 26, 2011, at 5:47 AM, Gaëtan Lehmann wrote:

> 
> Le 25 avr. 11 à 22:57, Luis Ibanez a écrit :
> 
>> Hi Kent,
>> 
>> Brad L. will probably have a lot to say about this topic,
>> but let me just contribute some quick guesses here.
>> 
>> On Mon, Apr 25, 2011 at 2:44 PM, Williams, Norman K
>> <norman-k-williams at uiowa.edu> wrote:
>>> For HDF5, I've implemented writing and reading images in chunks, as a
>>> pre-requisite for Streaming I/O.  Now I have a
>>> couple of questions about how that works.
>>> 
>>> 1.Is there any particular reason for inheriting from  
>>> itk::StreamingIOBase?
>>> I notice that some streamable I/O classes
>>> inherit itk::ImageIOBase -- in fact most of them.
>>> 
>> 
>> This is due to historical reasons.
>> 
>> The itk::StreamingIOBase class was added recently.
>> 
>> We probably should re-root the other ImageIO classes
>> that do streaming, to derive from this new class.

Yes, the classes that stream read and write should be derived from StreamingImageIOBase. 

Due to backwards compatibility issues, the ImageIOBase needed to support both streaming and non-streamingIO, so implementations of the region methods are a little complex and multipurpose in the ImageIOBase class. Ideally I would like to see the ImageIOBase implement just the non-streaming versions of the streaming methods, and then the StreamingImageIOBase implement the standard streaming implementations.

>> 
>>> 2. It appears that streamed reading has two requirements -- are  
>>> there any
>>> others?:
>>>  A) implement GenerateStreamableReadRegionFromRequestedRegion.
>>>  B) In Read, use this->GetIORegion() to decide where in the file  
>>> to read.
>>> 
>> 
>> You will also need to implement:
>> 
>>     virtual bool CanStreamRead()
>> 
>> 
> 
> If your ImageIO is a subclass of StreamingIOBase, you only have to use  
> GetIORegion() in Read.
> The other methods are already implemented.
> 
>> (that probably should be "const"...)

I tried to document these methods clearly, please let me know of this information could be made clearer.

/** Determine if the ImageIO can stream reading from the
      current settings. Default is false. If this is queried after
      the header of the file has been read then it will indicate if
      that file can be streamed */
  virtual bool CanStreamRead();

  /** Determine if the ImageIO can stream write from the
   *  current settings.
   *
   * There are two types of non exclusive streaming: pasteing subregions, and iterative
   *
   */
  virtual bool CanStreamWrite();

As these methods may need to interact with the underlaying IO interface, I am not sure it was clear they could be changed to be const for all IOs.

If HDF5 can stream read and write for all files the StreamingImageIOBase's implementation should not need to be overridden.

Implementing streamed reading with compression is more challenging. You don't want to have to decompress large regions just for one pixel. Below you describe a complicated layout for how the compression is done it blocks. Do you plan on implementing streaming with compression? When streaming compressed files do you plan on supporting arbitrary regions, or enlarging them to the compressed block? Will this depend on the type of compression?

Perhaps disable streaming with compression?

>> 
>> 
>>> 3. Streamed Writing seems to depend on implementation of
>>> GetActualNumberOfSplitsForWriting, but it's a bit
>>>  confusing what to implement.  What seems to be happening is this:
>>>  A) Implement GetActualNumberOfSplitsForWriting, and return the # of
>>> chunks to write that the driver is capable of.
>>>     In the case of HDF5, this can be anything from a single voxel  
>>> to the
>>> whole image.
> 

These method needs to handle a lot of cases for sequential streamed writing, and pasting of arbitrary regions along with combinations of dimensions and compression etc.

Do you plan on implementing and testing both? There are a lot of cases.

  /** Before this method is called all the configuration will be done,
   * that is Streaming/PasteRegion/Compression/Filename etc
   * If pasting is being used the number of requested splits is for that
   * region not the largest. The derived ImageIO class should verify that
   * the file is capable of being writen with this configuration.
   * If pasted is enabled and is not support or does not work with the file,
   * then an excepetion should be thrown.
   *
   * The default implementation depends on CanStreamWrite.
   * If false then 1 is returned (unless pasting is indicated), so that the whole file will be updated in one region.
   * If true then its assumed that any arbitrary region can be writen
   * to any file. So the users request will be respected. If a derived
   * class has more restictive conditions then they should be checked
   */
  virtual unsigned int GetActualNumberOfSplitsForWriting(unsigned int numberOfRequestedSplits,
                                                         const ImageIORegion & pasteRegion,
                                                         const ImageIORegion & largestPossibleRegion);

This method's job is to return the number of regions that will be streamed through the pipeline. The other methods will split the image along the highest non-1 dimensions. So for example it you have a 10x10x10 volume but the user's numberOfRequestedSplits is 12, this methods will return 10.

Also the implementation of GetSplitRegionForWriting, and GetActualNumberOfSplitsForWritingCanStreamWrite are very important to streaming.

  /** returns the ith IORegion
   *
   * numberOfActualSplits should be the value returned from GetActualNumberOfSplitsForWriting with the same parameters
   *
   * Derieved classes should overload this method to return a compatible region
   */
  virtual ImageIORegion GetSplitRegionForWriting(unsigned int ithPiece,
                                                 unsigned int numberOfActualSplits,
                                                 const ImageIORegion & pasteRegion,
                                                 const ImageIORegion & largestPossibleRegion);

The default implementations splits a long the max non-1 dimensions, it's the same algorithm used to split an image for the multi-threader and ImageRegionSplitter only with IORegion, and not ImageRegion.

The below two methods care called from the above methods when CanStreamWrite is true, if you need a different algorithm these are the methods to overload.

 /** an implementation of ImageRegionSplitter:GetNumberOfSplits
   */
  virtual unsigned int GetActualNumberOfSplitsForWritingCanStreamWrite(unsigned int numberOfRequestedSplits,
                                                                       const ImageIORegion & pasteRegion) const;

  /** an implementation of  ImageRegionSplitter:GetSplit
   */
  virtual ImageIORegion GetSplitRegionForWritingCanStreamWrite(unsigned int ithPiece,
                                                               unsigned int numberOfActualSplits,
                                                               const ImageIORegion & pasteRegion) const;

> 
>>> I'm not sure though, what
>>>     the pastedRegion and largestPossibleRegion are for -- are they
>>> simply advisory, saying 'I'd like to be able to
>>>     paste the region pastedRegion, and oh by the way, here's the
>>> largestPossibleRegion for the file I want to write',
>>>     or ... what?

The largestPossibleRegion should be the exact same size as the GetDimensions, and is guaranteed to start with an index of 0. Having this information the for of an ImageIORegion is a convenience.

>> 
>> you will also need:
>> 
>> virtual bool CanStreamWrite()
>> 
>> 
>> The "paste" functionality is intended for a case that is a bit more
>> generic than streaming. E.g. in Streaming, you could imagine the
>> output file to be growing progressively as you write data out.
>> In the "paste" mode, the full file may already be out there in disk,
>> and we are patching a section of it.

Most of the ImageIO which stream write currently write the whole image twice, once for when the first region is written and then region by region. This was because it was not always clear if the ImageIO would be just pasting a sub-region, or if streaming chunk by chunk the whole image. The file should always be valid.

>> 
>> 
>>>  B) In Write, write out the region requested by m_IORegion, if  
>>> possible
>>> or throw an exception?
>>> 
>> 
>> Yeap, This is more or less what is done in the MetaImageIO class
>> for example.

Yes you must write out exactly the m_IORegion, no more, no less.

>> 
>> 
>>> A good thing about HDF5 is that it can handle scatter/gather I/O --  
>>> you
>>> set up the chunk size, and then you can write the image data all at  
>>> once
>>> and it divides it into chunks and writes it, optionally compressing  
>>> each
>>> chunk. Or you can write out a chunk at a time, out of order.
>>> --

Kent this sounds very ambitions, and like it could have a lot of very nice features. I would recommend first getting the streamed reading working, then move on to streamed writing. As I believe I am still the only one to have implemented streamed writing, we may need a TCON to discussion the issue.

Hope I was helpful,
Brad

>> 
>> 
>>> Kent Williams norman-k-williams at uiowa.edu
>>> 

>>> 
>>> 
>>> 
>>> 
>>> 
>>> ________________________________
>>> Notice: This UI Health Care e-mail (including attachments) is  
>>> covered by the Electronic Communications Privacy Act, 18 U.S.C.  
>>> 2510-2521, is confidential and may be legally privileged.  If you  
>>> are not the intended recipient, you are hereby notified that any  
>>> retention, dissemination, distribution, or copying of this  
>>> communication is strictly prohibited.  Please reply to the sender  
>>> that you have received the message in error, then delete it.  Thank  
>>> you.
>>> ________________________________
>>> _______________________________________________
>>> Powered by www.kitware.com
>>> 
>>> Visit other Kitware open-source projects at
>>> http://www.kitware.com/opensource/opensource.html
>>> 
>>> Kitware offers ITK Training Courses, for more information visit:
>>> http://kitware.com/products/protraining.html
>>> 
>>> Please keep messages on-topic and check the ITK FAQ at:
>>> http://www.itk.org/Wiki/ITK_FAQ
>>> 
>>> Follow this link to subscribe/unsubscribe:
>>> http://www.itk.org/mailman/listinfo/insight-developers
>>> 
>> _______________________________________________
>> Powered by www.kitware.com
>> 
>> Visit other Kitware open-source projects at
>> http://www.kitware.com/opensource/opensource.html
>> 
>> Kitware offers ITK Training Courses, for more information visit:
>> http://kitware.com/products/protraining.html
>> 
>> Please keep messages on-topic and check the ITK FAQ at:
>> http://www.itk.org/Wiki/ITK_FAQ
>> 
>> Follow this link to subscribe/unsubscribe:
>> http://www.itk.org/mailman/listinfo/insight-developers
> 
> -- 
> Gaëtan Lehmann
> Biologie du Développement et de la Reproduction
> INRA de Jouy-en-Josas (France)
> tel: +33 1 34 65 29 66    fax: 01 34 65 29 09
> http://voxel.jouy.inra.fr  http://www.itk.org
> http://www.mandriva.org  http://www.bepo.fr
> 
> <PGP.sig><ATT00001..txt>

========================================================
Bradley Lowekamp  
Lockheed Martin Contractor for
Office of High Performance Computing and Communications
National Library of Medicine 
blowekamp at mail.nih.gov

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.itk.org/mailman/private/insight-developers/attachments/20110426/b295450a/attachment.htm>