Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Problems

These are the solutions to the selected exercises from Watanabe’s green book. Majority of them are from chapter 1. Please refer to the exercises from the pdf, which is freely available online.

Problems

Problem 1

(a) Let w0=(1,1,,1)R10w_0 = (1, 1, \dots, 1) \in \mathbb{R}^{10}, and let WW be a random variable on R10\mathbb{R}^{10} which is subject to

p(w)=c{exp(w2)+100exp(10ww02)}p(w) = c \{ \exp(- ||w||^2) + 100 \exp(-10 ||w - w_0||^2) \}

Let w=xw0+vw = x w_0 + v. Then,

w2=w02x2+v2=10x2+v2||w||^2 = ||w_0||^2 x^2 + ||v||^2 = 10 x^2 + ||v||^2

and

ww02=(x1)w0+v2=10(x1)2+v2||w - w_0||^2 = ||(x-1) w_0 + v||^2 = 10(x-1)^2 + ||v||^2

Thus, $$ p(x, v) = c { \exp( - (10 x^2 + ||v||^2) ) + 100 \exp( -10 (10(x-1)^2 + ||v||^2) ) }

$Tomaximize To maximize p(x,v)wecandososeparately.Weminimize we can do so separately. We minimize ||v||^2 \Rightarrow v=0$.

argmax p(x,v)=(argmax p(x,0),0)\text{argmax } p(x, v) = (\text{argmax } p(x, 0), 0)

p(x,0)=c{exp(10x2)+100exp(100(x1)2)}p(x, 0) = c \{ \exp(-10 x^2) + 100 \exp(-100(x-1)^2) \}

By checking the derivative we have x(0,1)x \in (0, 1) As the function is dominated by the second term, x1x \approx 1. Then ww0w \approx w_0.

(b) E[W]0E[W] \approx 0

E[W]=R10wp(w)dwE[W] = \int_{\mathbb{R}^{10}} w p(w) dw

We know that R10p(w)dw=1\int_{\mathbb{R}^{10}} p(w) dw = 1 So

c[R10ew2dw+100R10e10ww02dw]=1c \left[ \int_{\mathbb{R}^{10}} e^{-||w||^2} dw + 100 \int_{\mathbb{R}^{10}} e^{-10 ||w - w_0||^2} dw \right] = 1

Remember: Rdexp(αxμ2)dx=(πα)d/2\int_{\mathbb{R}^d} \exp(-\alpha ||x - \mu||^2) dx = \left( \frac{\pi}{\alpha} \right)^{d/2}

Then c[π5+100(π10)5]=1c \left[ \pi^5 + 100 \cdot \left( \frac{\pi}{10} \right)^5 \right] = 1.

E[W]=c[R10wexp(w2)dwodd+R10wexp(10ww02)dw]E[W] = c \left[ \underset{\downarrow \text{odd}}{\int_{\mathbb{R}^{10}} w \exp(-||w||^2) dw} + \int_{\mathbb{R}^{10}} w \exp(-10 ||w - w_0||^2) dw \right]
=c[R10wexp(10ww02)dw]= c \left[ \int_{\mathbb{R}^{10}} w \exp(-10 ||w - w_0||^2) dw \right]
=c(105π5w0)0.000999w00.= c (10^{-5} \pi^5 w_0) \approx 0.000999 w_0 \approx 0.

Even though the pdf peaks at w0w_0, the vol is very small as the var is 1/101/10 in all 10 dims, Ah the curse of dimensionality.

The takeaway from this is that MAP can be a pretty bad estimator for the right parameter. Notice that as we consider higher dimensions, the distance between the expected value and the MAP estimate increases to arbitrarily large values.

Problem 2 - Fluctuation Dissipation Theorem

Let β>0\beta > 0, and H(x):RnRH(x) : \mathbb{R}^n \rightarrow \mathbb{R}. Say XRnX \in \mathbb{R}^n is subject to a pdf.

p(xβ)=1Z(β)exp(βH(x))where Z(β)=exp(βH(x))dxp(x | \beta) = \frac{1}{Z(\beta)} \exp(-\beta H(x)) \quad \text{where } Z(\beta) = \int \exp(-\beta H(x)) dx

We need to prove that E[H(X)]β=V[H(X)]\dfrac{\partial E[H(X)]}{\partial \beta} = - \mathbb{V}[H(X)]

E[H(X)]=1Z(β)exp(βH(x))H(x)dxE[H(X)] = \int \frac{1}{Z(\beta)} \exp(-\beta H(x)) H(x) dx
Z(β)β=βexp(βH(x))dx=H(x)exp(βH(x))dx\frac{\partial Z(\beta)}{\partial \beta} = \int \frac{\partial}{\partial \beta} \exp(-\beta H(x)) dx = \int -H(x) \exp(-\beta H(x)) dx
=Z(β)E[H(X)]= - Z(\beta) E[H(X)]
f(β)β=H2(x)exp(βH(x))dx=Z(β)E[H2(X)]\frac{\partial f(\beta)}{\partial \beta} = \int -H^2(x) \exp(-\beta H(x)) dx = - Z(\beta) E[H^2(X)]

Thus,

E[H(X)]β=Z2(β)E[H2(X)]+Z2(β)E2[H(X)]Z2(β)\frac{\partial E[H(X)]}{\partial \beta} = \frac{- Z^2(\beta) E[H^2(X)] + Z^2(\beta) E^2[H(X)]}{Z^2(\beta)}
=E2[H(X)]E[H2(X)]= E^2[H(X)] - E[H^2(X)]
=V[H(X)]= - \mathbb{V}[H(X)]

Thus, E[H(X)]β=V[H(X)]\dfrac{\partial E[H(X)]}{\partial \beta} = - \mathbb{V}[H(X)]

Problem 3

Let p(xa)p(x|a) be a statistical model of x{0,1}x \in \{0, 1\} defined by

p(xa)=ax(1a)1xwhere 0a1,φ(a)=1 is the prior.p(x|a) = a^x (1-a)^{1-x} \quad \text{where } 0 \leq a \leq 1, \quad \varphi(a) = 1 \text{ is the prior}.

Let XnX^n be independently subject to p(xa0)p(x|a_0).

Let n1=i=1nxi,n2=nn1n_1 = \sum_{i=1}^n x_i, \quad n_2 = n - n_1

  1. Find the MLE.

logp(xia)=xiloga+(1xi)log(1a)\log p(x_i | a) = x_i \log a + (1-x_i) \log(1-a)
f(a)=logp(xia)=n1loga+n2log(1a)f(a) = \sum \log p(x_i | a) = n_1 \log a + n_2 \log(1-a)

Better way: Find argmax axi(1a)1xi\text{argmax } \prod a^{x_i} (1-a)^{1-x_i}

=an1(1a)nn1= a^{n_1} (1-a)^{n-n_1}

As 0a10 \leq a \leq 1, we can use AM-GM to maximize. The argmax satisfies:

an1=1ann1\frac{a}{n_1} = \frac{1-a}{n-n_1}
nan1a=n1n1an a - n_1 a = n_1 - n_1 a
a=n1n\Rightarrow a = \frac{n_1}{n}
  1. Estimated probability distribution p(xa^)p(x | \hat{a}) (Frequentist estimation)

p(1a^)=a^=n1n,p(0a^)=1a^=nn1n=n2np(1 | \hat{a}) = \hat{a} = \frac{n_1}{n}, \quad p(0 | \hat{a}) = 1 - \hat{a} = \frac{n - n_1}{n} = \frac{n_2}{n}
  1. Bayesian Predictive distribution p(xXn)p(x | X^n)

Let us calculate the posterior first.

p(aXn)=p(Xna)φ(a)01p(Xna)φ(a)dap(a | X^n) = \frac{p(X^n | a) \varphi(a)}{\int_0^1 p(X^n | a) \varphi(a) da}
=p(Xna)01p(Xna)da(this can be calculated by multiplication)= \frac{p(X^n | a)}{\int_0^1 p(X^n | a) da} \quad (\text{this can be calculated by multiplication})

So, p(xXn)=01p(xa)p(aXn)dap(x | X^n) = \int_0^1 p(x | a) p(a | X^n) da

=01p(xa)p(Xna)da01p(Xna)da= \frac{\int_0^1 p(x | a) p(X^n | a) da}{\int_0^1 p(X^n | a) da}

Now, p(Xna)=p(xia)=axi(1a)nxi=an1(1a)n2p(X^n | a) = \prod p(x_i | a) = a^{\sum x_i} (1-a)^{n - \sum x_i} = a^{n_1} (1-a)^{n_2}

Now, 01p(Xna)da=01an1(1a)n2da=β(n1+1,n2+1)\int_0^1 p(X^n | a) da = \int_0^1 a^{n_1} (1-a)^{n_2} da = \beta(n_1 + 1, n_2 + 1)

p(1Xn)=01an1+1(1a)n2β(n1+1,n2+1)=β(n1+2,n2+1)β(n1+1,n2+1)p(1 | X^n) = \frac{\int_0^1 a^{n_1+1} (1-a)^{n_2}}{\beta(n_1+1, n_2+1)} = \frac{\beta(n_1+2, n_2+1)}{\beta(n_1+1, n_2+1)}
p(0Xn)=β(n1+1,n2+2)β(n1+1,n2+1)p(0 | X^n) = \frac{\beta(n_1+1, n_2+2)}{\beta(n_1+1, n_2+1)}

Now, β(α,β)=Γ(α)Γ(β)Γ(α+β)\beta(\alpha, \beta) = \frac{\Gamma(\alpha) \Gamma(\beta)}{\Gamma(\alpha+\beta)}

Thus, p(1Xn)=n1+1n+2p(1 | X^n) = \frac{n_1+1}{n+2}

p(0Xn)=1n1+1n+2=n2+1n+2p(0 | X^n) = 1 - \frac{n_1+1}{n+2} = \frac{n_2+1}{n+2}