也許是自打咀巴

2008/09/11

星期日晚看見選舉結果，望見兩大選區（新界東／新界西）各泛民名單的得票如此相近，心想「噢，law of large number…」，然後跟據幾個模糊的概念，草草地寫了那篇
Fooled by randomness。

事實上該文很不準確。先是兆安的幾個留言提醒我對量子力學的理解有錯，然後是朋友S私底下指出這不可能是純綷隨機結果。我自己計了一次，對，如果純粹的隨機投票，五張名單的得票應該是極之接近（七成機會正負一百）。多謝賜正。關於量子力學，我的理解只是大學通識程度，不多提了。數學還有點點能力，下面是我的算法。對數學沒有興趣的，可跳至尾二段。

Given n balls and k urns. All balls are uniformly distributed among all k urns. We calculate the probability of the first urn getting not more than t balls.

The random variable $X$ is the number of balls in the first urn, it can be approximated by a Poisson Distribution.
$Pr( X = t) \le \frac{ e^{-u} (u)^t }{ t! }$
where $u = \frac{n}{k}$ is the expected rate.

We can get the probability of the first urn getting not more than t balls by summing the probability of all events, since they are mutually excluded.

$Pr(X \le t) = \sum_{i = 0}^{t} Pr(X = i)$

Begin lazy, we ask matlab for results with the following command,

poisscdf(t, u)

Suppose we have 204030 balls and 5 urns, the expected number of each urn is 40806. The probability of the first urn getting less than 99% of the expected number (i.e. 40398) is

$Pr(X \le 40398) \le \frac{ e ^ {40806} (40806e) ^ {40398}}{ 40398 ^ {40398}} = 0.0217$

On the other hand we can get the probability of getting more than 1% of the expected number (i.e. 41214) similarly,

$Pr(X > 41214) = 1 - Pr( X \le 41214) = 1 - 0.9783 = 0.0217$

In fact a more precise approximation scheme can be achieved by using Chernoff Bound, however it depends on a moment generating function that i am not familiar with. This article gives a loose approximation of how unlikely an urn to get 1% fewer(more) balls than the expected value.

Thank you S for the useful discussion in Probability.

把問題簡化，只考慮任何一張名單。上面的結果得出，假設有204030張選票和五張名單，若所以選民皆是隨機投票的話，平均每張名單應得40806張選票。然而，一張名單超過／低過平均數的百份之一的機會為百份之五；超過／低過平均數的百份之二更不足百份之零點零零零一。然而沒有一張名單的實際得票是平均正負百份之二內，意即不能以純粹隨機過程去解悉投票行為。

縱然該文錯漏百出，我仍然堅持該文的基本概念，每個人的喜惡不一，無可能準確配票，反而，隨機（於兩大選區）能容易地得出較平均的分配。

This entry was posted on 2008/09/11 於 01:59 and is filed under 社會.

10 回應 to “也許是自打咀巴”

Alex Says:

2008/09/11 於 02:54
From you equation

1 – P(X <= 41214) = 0.9783

it implies that

P(X <= 41214) = 0.0217

which is equal to P(X <= 40398) from your another result.

I think something seems to be wrong because logically you’d expect that P(X <= 41214) to be larger (actually much larger) than P(X 41214) is around 0.0217?

回應
Alex Says:

2008/09/11 於 02:56
For some reason my previous comment was truncated. This was what I wanted to say:

From you equation

1 – P(X <= 41214) = 0.9783

it implies that

P(X <= 41214) = 0.0217

which is equal to P(X <= 40398) from your another result.

I think something seems to be wrong because logically you’d expect that P(X <= 41214) to be larger (actually much larger) than P(X 41214) is around 0.0217?

回應
Justin Says:

2008/09/11 於 02:58
that was a typo. thx!

P(X <= 41214) = 0.9873.

should be 1-P(X <= 41214) = 1-0.9873 = 0.0217

回應
Justin Says:

2008/09/11 於 03:06
(I posted this for Alex (from http://www.alexweblog.com/), there are some strange errors that prevent his comments to display correctly)

In the last sentence,
*** P(X 41214) ***
is actually
*** P(X < 41214) ***

回應
cowmoo Says:

2008/09/11 於 03:27
it seems you don’t really need a Poisson approximation here

basically X is a binomial random variable with parameters n and p=1/k
then for n and t large, you can just use central limit theorem and approximate with normal distribution
(Poisson approximation holds for n large)

回應
:P Says:

2008/09/13 於 00:56
堆數好似亂晒坑…

Pr( X <= 41214) 應該係等於 Pr( X=0 ) + Pr( X= 1) + … +P(X= 41213) ,所以0.0217粒數好似錯左.

正如cowmoo講，應該用Lindonberg-Levy ClT approximate 最方便：
mean = 204030 *.2 = 40806
variance = 204030 *.2 *.8 = 32644.8

z = (40398-40806) / sqrt(32644.8) = -2.25815 ,

P( Z N(0,1)

所以一張名單lor 到少過99% of mean 既機會大約係 0.012
P( -2.25815 < Z < 2.25815) 就係0.976064。

切勿忘記，我地假設投票人係亂投的，呢個係一個好大的assumption. 根據投票結果，我地應該能夠否定是假設。

回應
:P Says:

2008/09/13 於 00:59
sorry.. truncated words

**P( z < Z = -2.25815) = 0.012 , where Z converges to standard normal. **

回應
Justin Says:

2008/09/13 於 01:59
Cow, 😛

The fact is I have forgotten CLT can be applied.

I got 0.0217 probably becoz poisson approx is not as accurate as clt.

>切勿忘記，我地假設投票人係亂投的，呢個係一個好大的assumption. 根據投票結果，我地應該能夠否定是假設。

this is what i am trying to say in this post.

回應
:P Says:

2008/09/13 於 04:29
As a note of clarification, in fact, the Poisson approximation of binomial distribution cannot be applied in this case. The reason is the following:

Note the the moment generating fuction of Binomial distribution B(n, p) is (1-p +pe^t)^n = (1+p(e^t-1))^n

where as the MGF of poisson P(入) is exp(入(exp(t) -1)).

If we make two assumptions here — 1) 入 = np = constant mean, 2) n tends to infinity

then the MGF of B(n,p) becomes (1+ 入(e^t -1) /n ) ^n
which tends to exp(入(exp(t) -1)) , then Poisson MGF, as n tends to infinity.

Here, in this case, n = 204030 is sufficiently large, but np is increasing with n, as p is fixed at 0.2. The first assumption is not satisfied, and thus we cannot use the Poisson approximation in this case.

That can perhaps explain to a great extent the discrepancy of the results. 😛

回應
Justin Says:

2008/09/15 於 03:25
😛
Thank you so much for your detailed explanation!

I should pay more attention in math class!

🙂

回應

凍啡走甜

腳毛

其他

回應

最近

從前