## 也許是自打咀巴

2008/09/11

Fooled by randomness

Given n balls and k urns. All balls are uniformly distributed among all k urns. We calculate the probability of the first urn getting not more than t balls.

The random variable $X$ is the number of balls in the first urn, it can be approximated by a Poisson Distribution.
$Pr( X = t) \le \frac{ e^{-u} (u)^t }{ t! }$
where $u = \frac{n}{k}$ is the expected rate.

We can get the probability of the first urn getting not more than t balls by summing the probability of all events, since they are mutually excluded.

$Pr(X \le t) = \sum_{i = 0}^{t} Pr(X = i)$

Begin lazy, we ask matlab for results with the following command,

poisscdf(t, u)

Suppose we have 204030 balls and 5 urns, the expected number of each urn is 40806. The probability of the first urn getting less than 99% of the expected number (i.e. 40398) is

$Pr(X \le 40398) \le \frac{ e ^ {40806} (40806e) ^ {40398}}{ 40398 ^ {40398}} = 0.0217$

On the other hand we can get the probability of getting more than 1% of the expected number (i.e. 41214) similarly,

$Pr(X > 41214) = 1 - Pr( X \le 41214) = 1 - 0.9783 = 0.0217$

In fact a more precise approximation scheme can be achieved by using Chernoff Bound, however it depends on a moment generating function that i am not familiar with. This article gives a loose approximation of how unlikely an urn to get 1% fewer(more) balls than the expected value.

Thank you S for the useful discussion in Probability.

### 10 回應 to “也許是自打咀巴”

1. Alex Says:

From you equation

1 – P(X <= 41214) = 0.9783

it implies that

P(X <= 41214) = 0.0217

which is equal to P(X <= 40398) from your another result.

I think something seems to be wrong because logically you’d expect that P(X <= 41214) to be larger (actually much larger) than P(X 41214) is around 0.0217?

2. Alex Says:

For some reason my previous comment was truncated. This was what I wanted to say:

From you equation

1 – P(X <= 41214) = 0.9783

it implies that

P(X <= 41214) = 0.0217

which is equal to P(X <= 40398) from your another result.

I think something seems to be wrong because logically you’d expect that P(X <= 41214) to be larger (actually much larger) than P(X 41214) is around 0.0217?

3. Justin Says:

that was a typo. thx!

P(X <= 41214) = 0.9873.

should be 1-P(X <= 41214) = 1-0.9873 = 0.0217

4. Justin Says:

(I posted this for Alex (from http://www.alexweblog.com/), there are some strange errors that prevent his comments to display correctly)

In the last sentence,
*** P(X 41214) ***
is actually
*** P(X < 41214) ***

5. cowmoo Says:

it seems you don’t really need a Poisson approximation here

basically X is a binomial random variable with parameters n and p=1/k
then for n and t large, you can just use central limit theorem and approximate with normal distribution
(Poisson approximation holds for n large)

6. :P Says:

堆數好似亂晒坑…

Pr( X <= 41214) 應該係等於 Pr( X=0 ) + Pr( X= 1) + … +P(X= 41213) ,所以0.0217粒數好似錯左.

正如cowmoo講，應該用Lindonberg-Levy ClT approximate 最方便：
mean = 204030 *.2 = 40806
variance = 204030 *.2 *.8 = 32644.8

z = (40398-40806) / sqrt(32644.8) = -2.25815 ,

P( Z N(0,1)

所以一張名單lor 到少過99% of mean 既機會大約係 0.012
P( -2.25815 < Z < 2.25815) 就係0.976064。

切勿忘記，我地假設投票人係亂投的，呢個係一個好大的assumption. 根據投票結果，我地應該能夠否定是假設。

7. :P Says:

sorry.. truncated words

**P( z < Z = -2.25815) = 0.012 , where Z converges to standard normal. **

8. Justin Says:

Cow, 😛

The fact is I have forgotten CLT can be applied.

I got 0.0217 probably becoz poisson approx is not as accurate as clt.

>切勿忘記，我地假設投票人係亂投的，呢個係一個好大的assumption. 根據投票結果，我地應該能夠否定是假設。

this is what i am trying to say in this post.

9. :P Says:

As a note of clarification, in fact, the Poisson approximation of binomial distribution cannot be applied in this case. The reason is the following:

Note the the moment generating fuction of Binomial distribution B(n, p) is (1-p +pe^t)^n = (1+p(e^t-1))^n

where as the MGF of poisson P(入) is exp(入(exp(t) -1)).

If we make two assumptions here — 1) 入 = np = constant mean, 2) n tends to infinity

then the MGF of B(n,p) becomes (1+ 入(e^t -1) /n ) ^n
which tends to exp(入(exp(t) -1)) , then Poisson MGF, as n tends to infinity.

Here, in this case, n = 204030 is sufficiently large, but np is increasing with n, as p is fixed at 0.2. The first assumption is not satisfied, and thus we cannot use the Poisson approximation in this case.

That can perhaps explain to a great extent the discrepancy of the results. 😛

10. Justin Says:

😛
Thank you so much for your detailed explanation!

I should pay more attention in math class!

🙂