https://public.kitware.com/Wiki/api.php?action=feedcontributions&user=Mstauff&feedformat=atomKitwarePublic - User contributions [en]2024-03-28T09:31:25ZUser contributionsMediaWiki 1.38.6https://public.kitware.com/Wiki/index.php?title=ITK/Git/Develop/Data&diff=41775ITK/Git/Develop/Data2011-07-21T18:02:39Z<p>Mstauff: Added instructions to build ITKData to initiate the download of test data files.</p>
<hr />
<div>This page documents how to add test data while developing ITK.<br />
See our [[ITK/Git|table of contents]] for more information.<br />
__TOC__<br />
= Setup =<br />
<br />
The workflow below depends on local hooks to function properly.<br />
Follow the main [[ITK/Git/Develop#Setup|developer setup instructions]] before proceeding.<br />
In particular, run<br />
[http://itk.org/gitweb?p=ITK.git;a=blob;f=Utilities/SetupForDevelopment.sh;hb=HEAD <code>SetupForDevelopment.sh</code>]:<br />
<br />
$ ./Utilities/SetupForDevelopment.sh<br />
<br />
''The set script was last modified for this workflow on '''May 9, 2011'''. Be sure to run it in your work tree with a checkout more recent than that.''<br />
<br />
= Workflow =<br />
<br />
Our workflow for adding data integrates with our standard Git [[ITK/Git/Develop|development process]].<br />
Start by [[ITK/Git/Develop#Create_a_Topic|creating a topic]].<br />
Return here when you reach the "edit files" step.<br />
<br />
These instructions follow a typical use case of adding a new test with a baseline image.<br />
<br />
== Add Data ==<br />
<br />
{| style="width: 100%"<br />
|-<br />
|width=60%|<br />
Copy the data file into your local source tree.<br />
|-<br />
|<br />
:<code>$ mkdir -p Modules/.../test/Baseline</code><br />
:<code>$ cp ~/''MyTest.png'' Modules/.../test/Baseline/''MyTest.png''</code><br />
|<br />
|}<br />
<br />
== Add Test ==<br />
<br />
{| style="width: 100%"<br />
|-<br />
|width=60%|<br />
Edit the test CMakeLists.txt file and reference the data file in an <code>itk_add_test</code> call.<br />
Specify the file inside <code>DATA{...}</code> using a path relative to the test directory:<br />
:<code>$ edit Modules/.../test/CMakeLists.txt</code><br />
:{|<br />
|<br />
itk_add_test(NAME MyTest COMMAND ... --compare DATA{Baseline/''MyTest.png''} ...)<br />
|}<br />
|align="center"|<br />
[http://itk.org/gitweb?p=ITK.git;a=blob;f=CMake/ExternalData.cmake;hb=HEAD <code>ExternalData.cmake</code>]<br />
|}<br />
<br />
== Run CMake ==<br />
<br />
{| style="width: 100%"<br />
|-<br />
|width=60%|<br />
''CMake will [[#ExternalData|move the original file]]. Keep your own copy if necessary.''<br />
<br />
Run cmake on the build tree:<br />
:<code>$ cd ../ITK-build</code><br />
:<code>$ cmake .</code><br />
:''(Or just run "make" to do a full configuration and build.)''<br />
:<code>$ cd ../ITK</code><br />
|align="center"|<br />
[[#Recover_Data_File|Need to recover the original file]]?<br />
|-<br />
|<br />
During configuration CMake will display a message such as:<br />
:{|<br />
|<br />
Linked Modules/.../test/Baseline/''MyTest.png''.md5 to ExternalData MD5/...<br />
|}<br />
This means that CMake converted the file into a data object referenced by a "content link".<br />
|align="center"|<br />
[http://itk.org/gitweb?p=ITK.git;a=blob;f=CMake/ExternalData.cmake;hb=HEAD <code>ExternalData.cmake</code>]<br />
|}<br />
<br />
== Commit ==<br />
<br />
{| style="width: 100%"<br />
|-<br />
|width=60%|<br />
Continue to [[ITK/Git/Develop#Create_a_Topic|create the topic]] and edit other files as necessary.<br />
Add the content link and commit it along with the other changes:<br />
:<code>$ git add Modules/.../test/Baseline/''MyTest.png''.md5</code><br />
:<code>$ git add Modules/.../test/CMakeLists.txt</code><br />
:<code>$ git commit</code><br />
|align="center"|<br />
[http://www.kernel.org/pub/software/scm/git/docs/git-add.html <code>git help add</code>]<br />
<br/><br />
[http://www.kernel.org/pub/software/scm/git/docs/git-commit.html <code>git help commit</code>]<br />
|-<br />
|<br />
The local <code>pre-commit</code> hook will display a message such as:<br />
:{|<br />
|<br />
Modules/.../test/Baseline/''MyTest.png''.md5: Added content to Git at refs/data/MD5/...<br />
Modules/.../test/Baseline/''MyTest.png''.md5: Added content to local store at .ExternalData/MD5/...<br />
Content link Modules/.../test/Baseline/''MyTest.png''.md5 -> .ExternalData/MD5/...<br />
|}<br />
This means that the pre-commit hook recognized that the content link references a new data object and [[#pre-commit|prepared it for upload]].<br />
|align="center"|<br />
[http://itk.org/gitweb?p=ITK.git;a=blob;f=Utilities/Hooks/pre-commit;hb=HEAD <code>pre-commit</code>]<br />
|}<br />
<br />
== Push ==<br />
{| style="width: 100%"<br />
|-<br />
|width=60%|<br />
Follow the instructions to [[ITK/Git/Develop#Share_a_Topic|share the topic]].<br />
When you push it to Gerrit for review using<br />
:<code>$ git gerrit-push</code><br />
|align="center"|<br />
[http://itk.org/gitweb?p=ITK.git;a=blob;f=Utilities/Git/git-gerrit-push;hb=HEAD <code>git-gerrit-push</code>]<br />
|-<br />
|<br />
part of the output will be of the form<br />
:{|<br />
|<br />
* ...:refs/data/commits/... [new branch]<br />
* HEAD:refs/for/master/''my-topic'' [new branch]<br />
Pushed refs/data and removed local copy:<br />
MD5/...<br />
|}<br />
This means that the git-gerrit-push script pushed the topic and [[#git-gerrit-push|uploaded the data]] it references.<br />
|}<br />
<br />
== Downloading / Building ==<br />
{| style="width: 100%"<br />
|-<br />
|width=60%|<br />
For the test data to be downloaded into your build directory, the <code> ITKData </code> module must be built.<br />
If you don't perform a complete build, you can build the module directly, e.g. <code> make ITKData </code>. You'll see something like<br />
<br />
:{|<br />
|<br />
* -- Fetching "<ITK-source>/.ExternalData/MD5/..." <br />
* -- [download 100% complete] <br />
* -- Downloaded object: "<ITK-build>/ExternalData/Objects/MD5/..."<br />
|}<br />
|-<br />
|<br />
Once the test data has been downloaded, it's in <code> <ITK-build-directory>/ExternalData </code>.<br />
|}<br />
<br />
= Discussion =<br />
<br />
An ITK test data file is not stored in the main source tree under version control.<br />
Instead the source tree contains a "content link" that refers to a data object by a hash of its content.<br />
At build time the the<br />
[http://itk.org/gitweb?p=ITK.git;a=blob;f=CMake/ExternalData.cmake;hb=HEAD <code>ExternalData.cmake</code>]<br />
module fetches data needed by enabled tests.<br />
This allows arbitrarily large data to be added and removed without bloating the version control history.<br />
<br />
The above [[#Workflow|workflow]] allows developers to add a new data file almost as if committing it to the source tree.<br />
The following subsections discuss details of the workflow implementation.<br />
<br />
== ExternalData ==<br />
<br />
While [[#Run_CMake|CMake runs]] the<br />
[http://itk.org/gitweb?p=ITK.git;a=blob;f=CMake/ExternalData.cmake;hb=HEAD ExternalData]<br />
module evaluates [[#Add_Test|DATA{} references]].<br />
ITK [http://itk.org/gitweb?p=ITK.git;a=blob;f=CMake/ITKExternalData.cmake;hb=HEAD sets]<br />
the <code>ExternalData_LINK_CONTENT</code> option to <code>MD5</code> to enable automatic conversion of raw data files into content links.<br />
When the module detects a real data file in the source tree it performs the following transformation as specified in the module documentation:<br />
* Compute the MD5 hash of the file<br />
* Store the <code>${hash}</code> in a file with the original name plus <code>.md5</code><br />
* Rename the original file to <code>.ExternalData_MD5_${hash}</code><br />
The real data now sit in a file that we [http://itk.org/gitweb?p=ITK.git;a=blob;f=.gitignore;hb=HEAD tell Git to ignore].<br />
For example:<br />
<br />
$ '''cat Modules/.../test/Baseline/.ExternalData_MD5_477e602800c18624d9bc7a32fa706b97 |md5sum'''<br />
477e602800c18624d9bc7a32fa706b97 -<br />
$ '''cat Modules/.../test/Baseline/''MyTest.png''.md5'''<br />
477e602800c18624d9bc7a32fa706b97<br />
<br />
=== Recover Data File ===<br />
<br />
To recover the original file after running CMake but before committing, undo the operation:<br />
<br />
$ '''cd Modules/.../test/Baseline'''<br />
$ '''mv .ExternalData_MD5_$(cat MyTest.png.md5) MyTest.png'''<br />
<br />
== pre-commit ==<br />
<br />
While [[#Commit|committing]] a new or modified content link the<br />
[http://itk.org/gitweb?p=ITK.git;a=blob;f=Utilities/Hooks/pre-commit;hb=HEAD <code>pre-commit</code>]<br />
hook moves the real data object from the <code>.ExternalData_MD5_${hash}</code> file left by the ExternalData module<br />
to a local object repository stored in a <code>.ExternalData</code> directory at the top of the source tree.<br />
<br />
The hook also uses Git plumbing commands to store the data object as a blob in the local Git repository.<br />
The blob is not referenced by the new commit but instead by <code>refs/data/MD5/${hash}</code>.<br />
This keeps the blob alive in the local repository but does not add it to the project history.<br />
For example:<br />
$ '''git for-each-ref --format="%(refname)" refs/data'''<br />
refs/data/MD5/477e602800c18624d9bc7a32fa706b97<br />
$ '''git cat-file blob refs/data/MD5/477e602800c18624d9bc7a32fa706b97 | md5sum'''<br />
477e602800c18624d9bc7a32fa706b97 -<br />
<br />
== git gerrit-push ==<br />
<br />
The "<code>git gerrit-push</code>" command is actually an<br />
[http://itk.org/gitweb?p=ITK.git;a=blob;f=Utilities/DevelopmentSetupScripts/SetupGitAliases.sh;hb=HEAD alias]<br />
for the<br />
[http://itk.org/gitweb?p=ITK.git;a=blob;f=Utilities/Git/git-gerrit-push;hb=HEAD <code>Utilities/Git/git-gerrit-push</code>]<br />
script.<br />
In addition to pushing the topic branch to Gerrit the script also detects content links added or modified by the commits in the topic.<br />
It reads the data object hashes from the content links and looks for matching <code>refs/data/</code> entries in the local Git repository.<br />
<br />
The script pushes the matching data objects to Gerrit inside a temporary commit object disjoint from the rest of history.<br />
For example:<br />
<br />
$ '''git gerrit-push --dry-run --no-topic'''<br />
* f59717cfb68a7093010d18b84e8a9a90b6b42c11:refs/data/commits/f59717cfb68a7093010d18b84e8a9a90b6b42c11 [new branch]<br />
Pushed refs/data and removed local copy:<br />
MD5/477e602800c18624d9bc7a32fa706b97<br />
$ '''git ls-tree -r --name-only f59717cf'''<br />
MD5/477e602800c18624d9bc7a32fa706b97<br />
$ '''git log --oneline f59717cf'''<br />
f59717c data<br />
<br />
A robot runs every few minutes to fetch the objects from Gerrit and upload them to a<br />
[http://www.itk.org/files/ExternalData location] that we<br />
[http://itk.org/gitweb?p=ITK.git;a=blob;f=CMake/ITKExternalData.cmake;hb=HEAD tell ExternalData to search]<br />
at build time.</div>Mstauff