也許是自打咀巴

2008/09/11

星期日晚看見選舉結果,望見兩大選區(新界東/新界西)各泛民名單的得票如此相近,心想「噢,law of large number…」,然後跟據幾個模糊的概念,草草地寫了那篇
Fooled by randomness

事實上該文很不準確。先是兆安的留言提醒我對量子力學的理解有錯,然後是朋友S私底下指出這不可能是純綷隨機結果。我自己計了一次,對,如果純粹的隨機投票,五張名單的得票應該是極之接近(七成機會正負一百)。多謝賜正。關於量子力學,我的理解只是大學通識程度,不多提了。數學還有點點能力,下面是我的算法。對數學沒有興趣的,可跳至尾二段。

Given n balls and k urns. All balls are uniformly distributed among all k urns. We calculate the probability of the first urn getting not more than t balls.

The random variable X is the number of balls in the first urn, it can be approximated by a Poisson Distribution.
Pr( X = t) \le \frac{ e^{-u} (u)^t }{ t! }
where u = \frac{n}{k} is the expected rate.

We can get the probability of the first urn getting not more than t balls by summing the probability of all events, since they are mutually excluded.

Pr(X \le t) = \sum_{i = 0}^{t} Pr(X = i)

Begin lazy, we ask matlab for results with the following command,

poisscdf(t, u)

Suppose we have 204030 balls and 5 urns, the expected number of each urn is 40806. The probability of the first urn getting less than 99% of the expected number (i.e. 40398) is

Pr(X \le 40398) \le \frac{ e ^ {40806} (40806e) ^ {40398}}{ 40398 ^ {40398}} = 0.0217

On the other hand we can get the probability of getting more than 1% of the expected number (i.e. 41214) similarly,

Pr(X > 41214) = 1 - Pr( X \le 41214) = 1 - 0.9783 = 0.0217

In fact a more precise approximation scheme can be achieved by using Chernoff Bound, however it depends on a moment generating function that i am not familiar with. This article gives a loose approximation of how unlikely an urn to get 1% fewer(more) balls than the expected value.

Thank you S for the useful discussion in Probability.

把問題簡化,只考慮任何一張名單。上面的結果得出,假設有204030張選票和五張名單,若所以選民皆是隨機投票的話,平均每張名單應得40806張選票。然而,一張名單超過/低過平均數的百份之一的機會為百份之五;超過/低過平均數的百份之二更不足百份之零點零零零一。然而沒有一張名單的實際得票是平均正負百份之二內,意即不能以純粹隨機過程去解悉投票行為。

縱然該文錯漏百出,我仍然堅持該文的基本概念,每個人的喜惡不一,無可能準確配票,反而,隨機(於兩大選區)能容易地得出較平均的分配。

10 回應 to “也許是自打咀巴”

  1. Alex Says:

    From you equation

    1 – P(X <= 41214) = 0.9783

    it implies that

    P(X <= 41214) = 0.0217

    which is equal to P(X <= 40398) from your another result.

    I think something seems to be wrong because logically you’d expect that P(X <= 41214) to be larger (actually much larger) than P(X 41214) is around 0.0217?

  2. Alex Says:

    For some reason my previous comment was truncated. This was what I wanted to say:

    From you equation

    1 – P(X <= 41214) = 0.9783

    it implies that

    P(X <= 41214) = 0.0217

    which is equal to P(X <= 40398) from your another result.

    I think something seems to be wrong because logically you’d expect that P(X <= 41214) to be larger (actually much larger) than P(X 41214) is around 0.0217?

  3. Justin Says:

    that was a typo. thx!

    P(X <= 41214) = 0.9873.

    should be 1-P(X <= 41214) = 1-0.9873 = 0.0217

  4. Justin Says:

    (I posted this for Alex (from http://www.alexweblog.com/), there are some strange errors that prevent his comments to display correctly)

    In the last sentence,
    *** P(X 41214) ***
    is actually
    *** P(X < 41214) ***

  5. cowmoo Says:

    it seems you don’t really need a Poisson approximation here

    basically X is a binomial random variable with parameters n and p=1/k
    then for n and t large, you can just use central limit theorem and approximate with normal distribution
    (Poisson approximation holds for n large)

  6. :P Says:

    堆數好似亂晒坑…

    Pr( X <= 41214) 應該係等於 Pr( X=0 ) + Pr( X= 1) + … +P(X= 41213) ,所以0.0217粒數好似錯左.

    正如cowmoo講,應該用Lindonberg-Levy ClT approximate 最方便:
    mean = 204030 *.2 = 40806
    variance = 204030 *.2 *.8 = 32644.8

    z = (40398-40806) / sqrt(32644.8) = -2.25815 ,

    P( Z N(0,1)

    所以一張名單lor 到少過99% of mean 既機會大約係 0.012
    P( -2.25815 < Z < 2.25815) 就係0.976064。

    切勿忘記,我地假設投票人係亂投的,呢個係一個好大的assumption. 根據投票結果,我地應該能夠否定是假設。

  7. :P Says:

    sorry.. truncated words

    **P( z < Z = -2.25815) = 0.012 , where Z converges to standard normal. **

  8. Justin Says:

    Cow, 😛

    The fact is I have forgotten CLT can be applied.

    I got 0.0217 probably becoz poisson approx is not as accurate as clt.

    >切勿忘記,我地假設投票人係亂投的,呢個係一個好大的assumption. 根據投票結果,我地應該能夠否定是假設。

    this is what i am trying to say in this post.

  9. :P Says:

    As a note of clarification, in fact, the Poisson approximation of binomial distribution cannot be applied in this case. The reason is the following:

    Note the the moment generating fuction of Binomial distribution B(n, p) is (1-p +pe^t)^n = (1+p(e^t-1))^n

    where as the MGF of poisson P(入) is exp(入(exp(t) -1)).

    If we make two assumptions here — 1) 入 = np = constant mean, 2) n tends to infinity

    then the MGF of B(n,p) becomes (1+ 入(e^t -1) /n ) ^n
    which tends to exp(入(exp(t) -1)) , then Poisson MGF, as n tends to infinity.

    Here, in this case, n = 204030 is sufficiently large, but np is increasing with n, as p is fixed at 0.2. The first assumption is not satisfied, and thus we cannot use the Poisson approximation in this case.

    That can perhaps explain to a great extent the discrepancy of the results. 😛

  10. Justin Says:

    😛
    Thank you so much for your detailed explanation!

    I should pay more attention in math class!

    🙂


發表留言