ZTree.com  | ZEN  | About...  

 Index   Back

[Discuss] 23, 302, 77164   [Discuss]

By: Liviu       
Date: Oct 22,2000 at 13:51
In Response to: [Discuss] Probabilities and the CRC (John Gruener)

> (How's this for a new keyword?)

Simple, self-explanatory, GOOD.

> Now with 26 people, we have 325 pairs, (26 * 25 / 2).
> [...] Where am I going wrong here?

Your pairs are not random. They are VERY correlated by being generated as the product of 2 base sets.
If they where RANDOM indeed, then the probability of a match would be 325/66340 = 4.9% (with 66340 = 365 * 364 / 2 the total number of possible pairs).
Instead, it's more like 50+%, and I think that's exactly the "paradox" part of the birthdays.

> To compute the probability of having a birthday match

The easiest way is to think in terms of probability of NOT having any match. The 1st person leaves 364 days open, so the 2nd person has a 364/365 chance of fitting a birthday among those. That leaves 363 days open, so the 3rd person's chance is 363/365. And so on, for P persons the probability of having distinct birthdays is (364/365)*(363/365)*(362/365)*...*(365-P+1)/365, and the probability of a match is 1 minus that. Incidentally, I believe the minimum number of persons to yield a match probability >50% is 23 (not 26) but I just base that on a quick Saturday-evening numbers run so don't bet your fortune on it.

> For 20,000 files [...] with CRC-32

...the chance of a CRC match would be about 4.5%

> However, in the context we are discussing,
> wouldn't the above results apply to 20,000 (or
> 1,000) files of the *exact* same *size*?

No, all of the above is random :) actually about arbitrary values (including CRC's) attached to random variables (including files).

As for numbers (from the same source disclaimed above), the chance of a false CRC match surpasses 50% for
- 302 files (CRC 16);
- 77164 files (CRC 32).

Now, 77k files sounds like a lot, but if all ZTree users do it over and again, it will happen. And that's without even getting into "intentional" CRC masquerading. There were viruses (virii ?) which would manipulate a few junk bytes to keep the overall CRC from changing for a couple of popular polynomials.

Besides, it's also a matter of image and of trust. Did you feel, a few years back, that Intel's floating-point bug would directly threaten you? ...Statistically not. Did you think it's a bug, nevertheless? ...I did.

Liviu

846 views      
Thread locked
 

Messages in this Thread

 
96,637 Postings in 12,231 Threads, 350 registered users, 79 users online (0 registered, 79 guests)
Index | Admin contact |   Forum Time: Mar 29, 2024 - 9:41 am UTC  |  Hits:62,400,260  (15,046 Today )
RSS Feed