ZTree.com  | ZEN  | About...  

 Index   Back

[Wish] Compare by Hash   [Wish]

By: Walter Rassbach       
Date: Oct 04,2009 at 13:58
In Response to: [Wish] Compare by Hash (Gregg Lentz)

Doesn't matter what kind of a hash you use, the probability of a false match is exactly dependant on the number of bits in the hash, i.e., a 32-bit hash has a 1-in-2**32 chance, a 64-bit hash has a 1-in-2**64 chance, etc. Mathematically, that is ALL that really matters, all the fancy tis and fancy that really doesn't matter, e.g., MD5 ain't much better than a simple checksum, except in special cases, like error correction codes or things designed to specifically handle something like burst errors. Trust me on this one, it is what I do for a living...

> Thank you for your replies.
>
> With all due respect, I think the objective here is being
> misunderstood. I envision the primary scenario of such a Hash Compare
> would be as a guard against sporadic bit errors, which could occur when
> migrating branches of large files to a different partition, or across a
> network to a different server or NAS box.
>
>
> Walter:
>
> One could create a Bloom-style filter,....
>
> If that would be a more efficient and reliable option, I'm all for it.
> As I see it, the main problem here is the segmentation that Binary
> Compare uses. Its non-sequential I/O (ping-ponging between the physical
> disc locations of the two files) incurs drive access waits for every
> buffer fetch, accumulating into unacceptably poor performance overall.
>
> if the hashes match, one really has to go and do a full
> compare...
>
> While it is well known that two totally dissimilar files can produce
> the same hash, we are starting with the expectation here that the files
> being compared are identical. As I understand it, the mathematical
> possibility of an anomaly in the above scenario, whereby an algorithm
> such as MD5 produces an identical hash for two same-sized files with only
> a single-bit error or two, is so small as to be, in all practicality,
> zero. Anything above this threshold and you've got bigger problems to
> deal with anyway. ;-)
>
>
> Jürgen:
>
> if a file is changed without changing size and date/time you will
> never notice this...
>
> True. But I'm only wanting quick assurance that branches of files that
> I regularly move or mirror to other drives/partitions/servers got there
> intact. If a bit slips sometime after the fact, well, that's beyond the
> scope of file manager software such as ZTree anyway, IMHO.
>
>
> Liviu:
>
> What you propose could be achieved (albeit using more memory) simply
> by having ZTree use a larger "chunk size" during binary compare....
>
> Agreed--increasing the buffer size would reduce the disc
> thrashing that I'd mentioned. But at what size "chunk" does this become
> impractical? When you start talking multi-gigabyte files becoming common
> (with no slow-down of growth in sight), this seems to be a band-aid fix
> at best.

1,306 views      
Thread locked
 

Messages in this Thread

 
96,656 Postings in 12,233 Threads, 350 registered users, 55 users online (0 registered, 55 guests)
Index | Admin contact |   Forum Time: May 13, 2024 - 5:46 am UTC  |  Hits:63,423,784  (4,934 Today )
RSS Feed