[Insight-users] moving image and memory

Fri Jul 16 11:05:41 EDT 2004

Hi Michael,

1) The BSplineInterpolator also uses GetPixel()
    in order to get the image values that are
    used for computing the interpolated intensity.

2) You could override the GetPixel() method by
    creating a new class (e.g. itk::CachedImage<>)
    deriving from the itk::Image<>. For this new
    image type you could also develop a customized
    variant of the itk::ImportImageContainer.

    The current implementation of the ImageContainer
    assumes a contiguous block of memory that is
    arranged in rows. You can crate a different
    implementation where the memory is allocated
    in blocks.

3) A natural concern at this point is that you seem to
    be solving a hypothetical problem. The algorithmic
    choices on the scheme that you are proposing are
    quite depending on assumptions such as:

     - How much will the moving image move from
       its original position.

     - How big the images are with respect to
       the main memory

     - How efficient should the block access be.

    Small changes in these assumptions may make your
    algorithms to be an excellent fit or a catastrophic
    mismatch.

    You may want to narrow the condition for which
    your algorithm is going to be employed, in order
    to make your that your are addressing problems
    that you really have and not being mislead by
    situations that *may eventually* happen in a
    generic case but are not likely in your current
    arrangement.

Regards,

    Luis

---------------
Michael wrote:
> Hi Luis,
> 
> first of all, thanks for your reply.
> 
> Of course I was thinking about computing some kind of bounding box as 
> well. However, I then thought that some sophisticated memory handling 
> would be more appropriate. First of all, it's rather difficult to 
> estimate the size of the security margin. If the region should be the 
> same during the whole registration run and if we consider that there 
> might be regions where the region covered by the initial transform and 
> the region covered by the final transform don't even overlap, I can't 
> really think about a good way to estimate the security margin.
> 
> Calculating the region for every iteration and reloading it every time 
> is probably not quite efficient with respect to performance. Only 
> loading that part from disk that has changed from iteration n-1 to 
> iteration n is rather difficult, because it can't just be "appended" to 
> the image (or can it?). So I would have to allocate a new region for the 
> image and to copy the old and the new data (i.e the data loaded from 
> disk in iteration n) in there. This is, however, time consuming as well...
> 
> Therefore I thought about using some caching strategy. Caching (between 
> harddisk and memory) is already provided by the operating system in the 
> form of demand paging. However, I guess this won't work well with our 
> images as I've explained in the first mail. Therefore I guess it's 
> better to implement some caching methods either independent from or 
> supporting the OS paging.
> 
> Here some ideas that I've had concerning the problem:
> 
> 1) A metric takes a fixed and a moving image mask. In my opinion it 
> would be reasonable to have an option that ensures that the requested 
> region of the fixed and the moving image is no larger than the bounding 
> box of this mask. That way, a user has a chance to control the region of 
> the moving image that has to be in memory. Maybe it's better to add a 
> member "moving image region" that does this task (in order not to misuse 
> the mask). Of course, the same can be reached by the user by inserting 
> some filter that does this job right before connecting the pipeline to 
> the registration method.
> 
> 2) Concerning caching I thought about the following:
> If I'm not mistaken, all metric functions in the basic registration 
> framework indirectely access the moving image by the GetPixel() method. 
> Indirectely, because it's acctually the interpolator that does it (in 
> the case of the bspline interpolator not the moving but the coefficient 
> image is accessed, as far as I've seen). If this is not correct, please 
> let me know. I didn't go into too much detail yet, so I might be wrong. 
> Considering this way of accessing pixels, it would be possible to write 
> an image class that stores its data in blocks rather than row by row:
> 
> 
> -------------------
> ¦ A  ¦ B  ¦ C  ¦ D  ¦
> ¦1 2 ¦5 6 ¦... ¦    ¦
> ¦3 4 ¦7 8 ¦    ¦    ¦
> ¦-------------------¦
> ¦... ¦    ¦    ¦    ¦
> ¦    ¦    ¦    ¦    ¦
> ¦    ¦    ¦    ¦    ¦
> ¦-------------------¦
> ¦    ¦    ¦    ¦    ¦
> ¦    ¦    ¦    ¦    ¦
> ¦    ¦    ¦    ¦    ¦
> -------------------
> 
> It shouldn't be too much work to write a GetPixel() method that can 
> access such a data structure. The advantage of such a structure is, that 
> regions that are close to each other in the image are also close to each 
> other in memory.
> 
> Using such a structure, I see basically see two ways to improve the 
> situation. A first way is to rely on the OS paging. Since, less pages 
> are needed to store a (typical) region of interest, paging would 
> probably work much better with such a structure than with the original 
> structure. Maybe a memory mapped file could be used to store the image 
> on disk and maybe a more efficient ordering of the blocks (than A, B, C, 
> ...) could further increase locality of disk accesses.
> 
> Another option would be to store the blocks separately, that is not at 
> contiguous memory locations. This would make it easier to dynamically 
> load (and append) regions to the image. If a pixel on a block that is 
> not in main memory is accessed (by GetPixel()), this block could then be 
> loaded from disk and if a block has not been accessed for a long time 
> the memory could be freed.
> 
> Do you see any major problems connected with such a data structure or 
> any of the proposed options to use it? Of course, the GetPixel() method 
> would become a bit less efficient than the GetPixel() method in 
> itk::Image. However the lower number of disk accesses hopefully more 
> than outweighs this.
> 
> Thanks for any comments,
> 
> Michael
> 
> 
> 
> 
> 
> 
> Luis Ibanez wrote:
> 
>>
>> Hi Michael,
>>
>> A naive option is to compute the bounding box of the projection
>> of the fixed image onto the moving image given the current transform.
>>
>> Then add some security margins to that bounding box in order to
>> define a region and use that region for extracting that piece of the
>> moving image.
>>
>> If you are doing this in a distributed environment you could probably
>> have one or several machines dedicated to providing the right subregions
>> of the  moving image to the machines that are actually computing
>> contributions to the image metric.
>>
>> This naive method will not be very effective if your transform has
>> rotations with large angles. In that case you may even save some time
>> by creating an intermediate resampled version of the moving image
>> in order to get a better aligned dataset. At the end for the
>> registration you should compensate for the transfrom used for creating
>> the intermediate resampled image.
>>
>>
>> We will be quite interested to hear back from you regarding the
>> solution you find to this problem, since this situation will become
>> more and more common as dataset sizes grow bigger.
>>
>>
>>  Regards,
>>
>>
>>    Luis
>>
>>
>> --------------
>> Michael wrote:
>>
>>>  Hi,
>>>
>>> I'm thinking about the following problem... In a registration task, 
>>> there might be a fixed image that is much smaller than the 
>>> corresponding moving image, i.e. it contains only a part of what is 
>>> represented by the moving image. In this case, it's obviously a waste 
>>> of memory to keep the entire moving image in memory during the whole 
>>> registration run, since during each iteration only a small part is 
>>> accessed.
>>>
>>> In my case (which differs slightly from above explanation), I want to 
>>> calculate partial results of the metric value based on parts of the 
>>> fixed image (similar as done in the ThreadedGetValue of the 
>>> MatchCardinalityImageToImageMetric) on different computers. The case 
>>> is similar to above situation, because a small fixed image (a 
>>> subimage of the original fixed image) is mapped to a large moving 
>>> image on each host. However, the case differs in that it there might 
>>> be parts where the region of the initial mapping (i.e. initial 
>>> transform parameters applied) is entirely outside of the region of 
>>> the final mapping. Consider for example a 2D rigid transform: For a 
>>> slightly wrong angle, a (transformed) region around the center of the 
>>> transform is almost identical to the region for the correct angle. If 
>>> the region is far away from the center, however, they might not even 
>>> overlap. I'd say this situation is rather unusual for a registration 
>>> where a small fixed image is registered to a large moving image. In 
>>> my opinion, this fact makes it (even more) difficult to estimate the 
>>> region of the moving image that will be accessed during the whole 
>>> registration run.
>>>
>>> So, the question is, how is it efficiently possible to keep the right 
>>> parts of the moving image in memory? Considering the access 
>>> characteristics it seems to me that some kind of "caching" would be 
>>> appropriate. That is, keeping those regions in memory that have been 
>>> accessed recently and store regions that have not been accessed for a 
>>> long time on disk. If a region is accessed and not in main memory, it 
>>> would just have to be loaded from disk.
>>>
>>> Obviously, the operating system provides some support for this 
>>> (paging). However, I'm not quite sure whether this support is quite 
>>> optimal for the described problem. Consider the following (2D-) image:
>>>
>>> -----------------------------
>>> ¦01 02 03 04 05 06 07 08 09 10¦
>>> ¦                             ¦
>>> ¦11 12 13 14 15 16 17 18 19 20¦
>>> ¦                             ¦
>>> ¦21 22 23 24 25 26 27 28 29 30¦
>>> ¦                             ¦
>>> ¦31 32 33 34 35 36 37 38 39 40¦
>>> ¦                             ¦
>>> ¦41 42 43 44 45 46 47 48 49 50¦
>>> ¦                             ¦
>>> ¦51 52 53 54 55 56 57 58 59 60¦
>>> ¦                             ¦
>>> ¦61 62 63 64 65 66 67 68 69 70¦
>>> ¦-----------------------------¦
>>>
>>> and the indicated subimage:
>>>
>>> ------------------------------
>>> ¦01 02 03 04 05 06 07 08 09 10¦
>>> ¦         -----               ¦
>>> ¦11 12 13¦14 15¦16 17 18 19 20¦
>>> ¦        ¦     ¦              ¦
>>> ¦21 22 23¦24 25¦26 27 28 29 30¦
>>> ¦        ¦     ¦              ¦
>>> ¦31 32 33¦34 35¦36 37 38 39 40¦
>>> ¦        ¦     ¦              ¦
>>> ¦41 42 43¦44 45¦46 47 48 49 50¦
>>> ¦        ¦     ¦              ¦
>>> ¦51 52 53¦54 55¦56 57 58 59 60¦
>>> ¦         -----               ¦
>>> ¦61 62 63 64 65 66 67 68 69 70¦
>>> ¦-----------------------------¦
>>>
>>> I expect that the data is stored as a contiugous block in the order 
>>> of the numbers in the pictures.
>>>
>>> Say one of the numbers in above images represents x bytes and we have 
>>> a page size of 8*x bytes. In order to keep the indicated area in 
>>> memory, we have to have 5 pages (one for each line of the region of 
>>> interest) in main memory, that is 5*8*x bytes. However, we only 
>>> require 5*2*x bytes (size of the region), which is 4 times less.
>>>
>>> Another problem with the OS support is, how can I avoid that large 
>>> amounts of swap-space are used to store data that might never be 
>>> accessed? Maybe the image could be stored on disk as an ordinary file 
>>> instead of having (parts of) it swapped out on disk. That way swap 
>>> space is free for other processes. Are there any reasons that argue 
>>> for having the data in swap space instead of having it in an ordinary 
>>> file?
>>>
>>> Because of above considerations, I think that it might be better to 
>>> manually (that is, not by the OS) control which blocks have to be 
>>> stored on disk and which ones have to be kept in memory. This would 
>>> maybe even allow to apply some prediction methods (that might be more 
>>> appropriate than those of the OS).
>>>
>>> Is there some built-in support in ITK, that helps to solve this 
>>> problem (e.g. a way to store an image in blocks that can be loaded 
>>> from disk if necessary)? Could maybe memory mapped files help to 
>>> provide optimal access to the image data (it seems to me that the 
>>> problem about page size remains...)? If yes, is there some support 
>>> for this in ITK? Does anybody see a better solution than what I've 
>>> called "caching" or any fundamental problems with that approach?
>>>
>>> Any comments are appreciated.
>>>
>>> Thanks,
>>>
>>> Michael
>>>
>>> _______________________________________________
>>> Insight-users mailing list
>>> Insight-users at itk.org
>>> http://www.itk.org/mailman/listinfo/insight-users
>>>
>>
>>
>>
>> _______________________________________________
>> Insight-users mailing list
>> Insight-users at itk.org
>> http://www.itk.org/mailman/listinfo/insight-users
>>
>>
> 
> 
> _______________________________________________
> Insight-users mailing list
> Insight-users at itk.org
> http://www.itk.org/mailman/listinfo/insight-users
>