No subject

Sun Apr 3 09:54:08 EDT 2011

removing a content link is equivalent to removing the real data file.
In the latter case you would have to copy it in again too.

A key concept to this whole approach is that the content link and real
data object are equivalent and interchangeable in the source tree.  Once
one has the content link there should never be a reason to put the real
file back (unless you try to mess with the .ExternalData* files which
are implementation details).  If you've found a way to wander off the
documented workflow in a way that requires it then we need to address
that case.

> It would be nice if the test data could be left in place

I originally had this goal in mind.  However, I later realized that it
is both hard to do and conceptually incorrect:

- It is hard to .gitignore data files (see pre-commit response below)
  so this requires a separate "git add" for every content link rather
  than "git add ." in the directory.

- The trigger for CMake to do the data->link conversion is the presence
  of the real file named by DATA{}.  Since develpers may "mv" the files
  into the source tree we cannot rely on modification timestamps to
  know if the content links are out of date.  Therefore we would have
  to transform every time CMake runs, and the transform is expensive
  compared to reading a content link.

- The ExternalData staging ".ExternalData_${algo}_${hash}" file must
  be created atomically.  A cheap, easy, and reliable way to do this
  is to rename the original file.

- Since the real data file and content link are equivalent (see above)
  it makes sense to have only one present at a time.  If you copy in
  the original data file again CMake will transform it again to
  preserve this equivalence.

> Perhaps a pre-commit check could be added to prevent this original test
> data from being committed.

It is hard to write such a check.  It has no idea what the file names
will be (data can have many possible extensions).  One may also add
legitimate images like tiny icons in documentation.  My plan is to rely
on Gerrit reviews for rejecting real data files from commits along with
a simple max-blob-size on the server.

> Another point of concern: It seems possible that the gerrit builds
> could begin before the data objects are transferred to the web server
> if the processes are not synchronized. Is it set up so that the
> transfer of the objects always happens prior to running the gerrit
> builds?

Yes, that is a potential problem.  The data sync robot runs every minute
but it is possible for the test builds to jump in.  I still need to work
with Marcus (who runs the auto-build driver) to combine the two robots
so that things go in order.

-Brad