[Insight-developers] Going from Non-Streaming to Streaming I/O for a class

Bradley Lowekamp blowekamp at mail.nih.gov
Tue Apr 26 11:39:57 EDT 2011


On Apr 26, 2011, at 10:23 AM, Gaëtan Lehmann wrote:

> 
> Le 26 avr. 11 à 15:54, Bradley Lowekamp a écrit :
> 
>> If HDF5 can stream read and write for all files the StreamingImageIOBase's implementation should not need to be overridden.
>> 
>> Implementing streamed reading with compression is more challenging. You don't want to have to decompress large regions just for one pixel. Below you describe a complicated layout for how the compression is done it blocks. Do you plan on implementing streaming with compression? When streaming compressed files do you plan on supporting arbitrary regions, or enlarging them to the compressed block? Will this depend on the type of compression?
>> 
>> Perhaps disable streaming with compression?
>> 
> 
> Brad,
> 
> HDF5 library handles the read of an arbitrary region, independently of the chuncks used and on the compression.
> So this should be quite easy to implement in the ITK ImageIO. The only problem will be performance problems, not the coding.
> 
> For performance, the chunks used should be smal enough so there is no need to read the whole file to extract a smal part.
> I think that restricting the chunks to
> 
>  SX * SY * 1 * 1 * 1 * ...
> 
> is fine. That's what bioformats does, and it works quite well (excepted SX and SY are necessarily the size of the image on X and Y).
> The problem is to choose SX and SY.
> Too small, it would make hdf5 store a lot of chunks, and this may be inefficient.
> Too big, it would force hdf5 to read a lot of the file while streaming, and it may be inefficient.
> 
> Maybe some experimentations with
> 
>  SX == SY == 256
>  SX == SY == 512
>  SX == SY == 1024
> 
> or other values, would help to make a decision.

I like the idea of a streaming IO performance test!

For basic image processing where I am streaming IO on both ends I am usually IO bound. And compression or random access slows things down. I usually try to stream 1 slice or 100MB at a time, and that nearly RAW file give the best performance for my network file system.

In the above test cases, also 

SX = SX_MAX

should be tested, the should maximize continuity.

Also please remember that to be a fair IO test the file should be larger then memory, to prevent OS cache.

Brad

> 
>> 
>>>> 
>>>> 
>>>>> A good thing about HDF5 is that it can handle scatter/gather I/O --
>>>>> you
>>>>> set up the chunk size, and then you can write the image data all at
>>>>> once
>>>>> and it divides it into chunks and writes it, optionally compressing
>>>>> each
>>>>> chunk. Or you can write out a chunk at a time, out of order.
>>>>> --
>> 
>> Kent this sounds very ambitions, and like it could have a lot of very nice features. I would recommend first getting the streamed reading working, then move on to streamed writing. As I believe I am still the only one to have implemented streamed writing, we may need a TCON to discussion the issue.
> 
> The HDF5 lib should handle that quite smoothly, so I wouldn't qualify that as very ambitious.
> But that would be very very useful for sure.
> 
> Gaëtan
> 
> -- 
> Gaëtan Lehmann
> Biologie du Développement et de la Reproduction
> INRA de Jouy-en-Josas (France)
> tel: +33 1 34 65 29 66    fax: 01 34 65 29 09
> http://voxel.jouy.inra.fr  http://www.itk.org
> http://www.mandriva.org  http://www.bepo.fr
> 

========================================================
Bradley Lowekamp  
Lockheed Martin Contractor for
Office of High Performance Computing and Communications
National Library of Medicine 
blowekamp at mail.nih.gov


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.itk.org/mailman/private/insight-developers/attachments/20110426/ba756c26/attachment.htm>


More information about the Insight-developers mailing list