[Insight-developers] Data submodule was reverted again ( staging check needed? )

Gaëtan Lehmann gaetan.lehmann at jouy.inra.fr
Wed Jan 26 14:39:41 EST 2011


Le 26 janv. 11 à 14:29, Brad King a écrit :

> On 01/26/2011 04:22 AM, Gaëtan Lehmann wrote:
>>   ITKData repository takes 74 MB.
>>   ITK repository takes 154 MB.
>
> After using
>
> $ git repack -a -f -d --window=250 --depth=250
>
> to pack both repositories tightly I get
>
> $ du -sk ITK.git/objects ITKData.git/objects
> 59354   ITK.git/objects
> 38005   ITKData.git/objects
>
> The main problem is that data files like png images do not compress  
> well
> in Git's pack format because they cannot be represented as a small  
> delta
> against another png file.  All the images need to be carried around in
> whole in history even if they've already been removed.  Contrast  
> this to
> source files which are usually updated by small patches and compress
> very well.  The data-to-source ratio will only grow over time.
>
> Note also that the above sizes are not representative of pure-source
> v. pure-data because ITK had some data files in its history that were
> not in Testing/Data and thus ended up in the history of the main  
> source.

Ok. Fortunately, testing data doesn't change that often, so this  
shouldn't really be a problem.

>
>>   ITK build directory takes 1.3 GB – 8.4 GB if we don't take care to
>> remove the temporary data after running the tests.
>>   ITK build with wrapping takes 5.3 GB.
>
> The build directory sizes don't count.  We're talking about source  
> sizes.

Not sure why you say that. Sources are made to be built so looking at  
the size of the build tree helps to see that the testing data may not  
be much compared to the other important parts for the tests: the  
binaries.

>
>> The large data is actually a problem. Midas can be a solution for
>> that, and we can put a file limit for the main repository.
>
> There is already a limit on blob size in the ITK repo and a bigger  
> limit
> in the ITKData repo.

I know — so there is no real problem in adding some data in the  
repository. It can't grow that fast.
The large data can go elsewhere – in Midas for example.

>
>> So I still think, at this time, that the extra complexity of the
>> submodule management is not compensated by the size gain in the main
>> repository.
>
> Yes, but once the files appear in the main history we can never go  
> back.
> The submodule approach keeps us treading water while something better
> is developed.

Ok so this is why the testing data have been moved to a submodule: to  
be easily replaced by something else – it makes sense that way.
Unfortunately at this time, the usage is quite complex, and Midas  
doesn't look much simple.

Gaëtan



>
> Bill Lorensen wrote:
>> But the current setup with data as a submodule adds complexity to
>> checkins and is subject to unexpected abuse as recently reported by
>> Brad L.
>
> I can address this with a commit check that ensures no ITK commit's
> Testing/Data submodule references an older version than one of its
> parents.
>
>> The midas solution adds even more complexity, especially for  
>> baselines.
>>
>> I'm looking forward to a solution that keeps the footprint low but
>> keeps the workload on a developer at or near what it was in ITK 3.
>
> That's a design goal.
>
> -Brad K

-- 
Gaëtan Lehmann
Biologie du Développement et de la Reproduction
INRA de Jouy-en-Josas (France)
tel: +33 1 34 65 29 66    fax: 01 34 65 29 09
http://voxel.jouy.inra.fr  http://www.itk.org
http://www.mandriva.org  http://www.bepo.fr

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 203 bytes
Desc: Ceci est une signature ?lectronique PGP
URL: <http://www.itk.org/mailman/private/insight-developers/attachments/20110126/6b5a4e1f/attachment.pgp>


More information about the Insight-developers mailing list