[Cdash] ITK appears in GDCM cdash

David Cole david.cole at kitware.com
Thu Sep 3 13:41:42 UTC 2009


Julien's not only worried about the speed of the SQL query. He's also (and
probably more) worried about the speed of computing the hash on large input
data (test output ranges from a few bytes up to several megabytes depending
on the test...)

For every test.xml file submitted that contains hundreds (or even thousands)
of tests, we have to compute the hash on the test output, look it up to see
if it's already in the db and then insert it if it's not.

Computing the hash on the test output is the dominant thing to worry about
at test submission processing time.


On Thu, Sep 3, 2009 at 9:35 AM, Eric Noulard <eric.noulard at gmail.com> wrote:

> 2009/9/3 Amitha Perera <amitha.perera at kitware.com>:
> > On Thu, Sep 3, 2009 at 8:43 AM, Julien Jomier<julien.jomier at kitware.com>
> wrote:
> >> Sorry that's what I meant: could we find a hashing algorithm which
> return
> >> value fits in one integer?. Also, what are the complexity of the
> different
> >> hashing algorithms? in another word, which one will be fast enough with
> the
> >> minimal chance of collisions?
> >
> > I'm pretty sure the SHA-1 and MD5 algorithms will prove to be fast
> > enough for your needs. And I'm sure there are implementations of them
> > in any language you want.
> >
> > Also, I suspect that
> >  SELECT * WHERE CRC=myCRC;
> > will run pretty much just as fast as
> >  SELECT * WHERE HASH_INT1=myInt1 AND HASH_INT2=myInt2 ... AND
> HASH_INT4=myInt4;
> >
> > It's been a while since I did any serious SQL, but I think a SQL CHAR
> > can represent any value between 0 and 255, so it may be just as easy
> > to represent the hash as a CHAR(16) or what ever.  And again, I'm
> > pretty sure the SELECT statement would execute without any noticeable
> > difference in speed.
>
> Julien seems to think
> that string comparison was slow (compared to int comparison)
>   >> "we will need to do a string matching in SQL and I'd like to
> avoid this if we can for speed reasons"
>
> However since as far as I understand you don't need cryptographic hash
> function
> you may look into some non-cryptographic hash function
> that has been designed to be fast, like that one
> http://en.wikipedia.org/wiki/Fowler_Noll_Vo_hash
>
> Seems to be used with MySQL too:
>
> http://www.xaprb.com/blog/2008/03/09/a-very-fast-fnv-hash-function-for-mysql/
> http://www.radwin.org/michael/blog/2007/03/mysql_user_defined_functio.html
>
> Note that FNV may not be the sole choice in this area.
> --
> Erk
> Membre de l'April - « promouvoir et défendre le logiciel libre » -
> http://www.april.org
> _______________________________________________
> Cdash mailing list
> Cdash at public.kitware.com
> http://public.kitware.com/cgi-bin/mailman/listinfo/cdash
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://public.kitware.com/pipermail/cdash/attachments/20090903/f4612b06/attachment-0003.htm>


More information about the CDash mailing list