Source: D. Koller, Graphical Models
if $P(X=1) = \theta_1 $
Estimator for $\theta_1$:
$$ \hat \theta_1 = \frac{1}{m} \sum_{i=1}^{m} \mathbb 1_{x^{(i)}=1} $$with the indicator function $\mathbb 1$ (is 1 if subscript is true; else 0)
More generally, for any distribution $P$ and function $f(x)$, the empirical expectation $\mathbb{\hat E}_\mathcal D (f)$ on a data set $\mathcal D$ is
$$ \mathbb{\hat E}_\mathcal D (f) := \frac{1}{m} \sum_{i=1}^{m} f(x^{(i)}) $$The empirial expectation is an approximation for the expectation on $P$:
$$ \mathbb{E}_P (f) \approx \mathbb{\hat E}_\mathcal D (f) $$import numpy as np
mn = np.random.multinomial(1, (0.5,0.2,0.3), size=100)
c = mn.argmax(axis=1)
for i in range(3):
theta = float((c==i).sum())/len(c)
print theta
0.43 0.25 0.32
Hoeffding Bound (additive bound):
$$ \mathbb P_D \left( \mathbb{\hat E}_\mathcal D (f) \notin [\theta-\epsilon, \theta+\epsilon] \right) \leq 2 e^{-2m \epsilon^2} $$so the probability that we have data which gives us an empirical estimation far from the "true" value shrinks exponentially with the number of data $m$.
Chernoff Bound (multiplicative bound):
$$ \mathbb P_D \left( \mathbb{\hat E}_\mathcal D (f) \notin [\theta(1-\epsilon), \theta(1+\epsilon)] \right) \leq 2 e^{-2m \theta \epsilon^2 / 3} $$For the additive bound: If we want with probability $1-\delta$ that our empirical estimation is maximal $\epsilon$ away from the "real" expectation, so we want $$ 2 e^{-2m \epsilon^2} \leq \delta $$
$$ m \geq \frac{\ln {2/\delta}}{2\epsilon^2} $$analog for the Chernoff Bound $$ m \geq 3\frac{\ln {2/\delta}}{2 \theta \epsilon^2} $$
Note: $\theta$ is in the denominator. So for low $\theta$-values we need more samples $m$.
with the conditional independence: $ ( y \mid \theta ) \perp ( \tilde y \mid \theta) $ this becomes:
$$ p(\tilde y \mid y) = \int p(\tilde y \mid \theta) p(\theta \mid y) d\theta $$A Markov chain defines a probabilistic transition model with
$$ \forall x: \sum_{x'} T(x \rightarrow x') $$for large $t$ $$ P(X^{(t)} =x') \approx P\left( X^{(t+1)} = x'\right) = \sum_x P( X^{(t)} = x) T(x\rightarrow x') $$
stationary distribution $\pi$ $$ \pi (x') = \sum_x \pi(x) T(x\rightarrow x') $$
Example: Page Rank
Transition probability matrix for three states $a, b$ and $c$:
$$ T = \begin{pmatrix} T(a\rightarrow a) & T(a\rightarrow b) & T(a\rightarrow c) \\ T(b\rightarrow a) & T(b\rightarrow b) & T(b\rightarrow c) \\ T(c\rightarrow a) & T(c\rightarrow b) & T(c\rightarrow c) \end{pmatrix} = \begin{pmatrix} 0.3 & 0.7 & 0.0 \\ 0.5 & 0.05 & 0.45 \\ 0.0 & 0.45 & 0.55 \end{pmatrix} $$Note: each row sum to 1.
import numpy as np
T = np.array([[0.3,0.7,0.],[0.5,0.05,0.45],[0.0,0.45,0.55]])
#state at t=0, we are at b:
x_0 = np.array([0.,1.,.0])
x1 = x_0.dot(T)
print x1
[ 0.5 0.05 0.45]
The probabilities converge to a stationary distribution:
x = x_0
for i in range(20):
x = x.dot(T)
print x
#print x.sum()
#assert x.sum() == 1.
[ 0.5 0.05 0.45] [ 0.175 0.555 0.27 ] [ 0.33 0.27175 0.39825] [ 0.234875 0.4238 0.341325] [ 0.2823625 0.33919875 0.37843875] [ 0.25430812 0.38491112 0.36078075] [ 0.268748 0.35961258 0.37163942] [ 0.26043069 0.37334197 0.36622734] [ 0.26480019 0.36577089 0.36942892] [ 0.2623255 0.36989169 0.36778281] [ 0.2636435 0.3676247 0.36873181] [ 0.2629054 0.368861 0.36823361] [ 0.26330212 0.36818195 0.36851593] [ 0.26308161 0.36855275 0.36836564] [ 0.26320086 0.3683493 0.36844984] [ 0.26313491 0.36846049 0.3684046 ] [ 0.26317072 0.36839953 0.36842975] [ 0.26315098 0.36843287 0.36841615] [ 0.26316173 0.3684146 0.36842367] [ 0.26315582 0.36842459 0.36841959]
Note that for the stationary distribution (steady state) the following equation holds: $$ \vec x T = T $$ That's the eigen equation with eigenvalue 1.
eigenvalues
array([-0.54749372, 0.44749372, 1. ])
i=eigenvalues.argmax() # index of the largest eigenvector
eigenvectors[:,i]/sum(eigenvectors[:,i])
array([ 0.26315789, 0.36842105, 0.36842105])
In general both question must not be true.
But if a Markov chain is regular than both properties will become true.
A Markov is regular if there exists $k$ such that, for every $x, x'$, the probability of getting from $x$ to $x'$ in exactly $k$ steps is $>0$.
A regular Markov chain converges to a unique stationary distribution regardless of the start state.
How do you know if a chain has mixed or not?
How do you know a chain has not mixed?
Markov Chain: $$ \theta_0 \rightarrow \theta_1 \rightarrow \theta_2 \rightarrow \theta_3 \rightarrow \theta_4 \rightarrow \theta_5 \dots $$
where $\theta_0 \sim p(\theta_0), \theta_1 \sim p(\theta_1) \dots$
and with the property
Detailed Balance:
$$ p^*(\theta') T(\theta'\rightarrow \theta) = p^*(\theta) T(\theta\rightarrow \theta') $$Theorem: If detailed balance holds, and $T$ is regular, than $T$ has a unique stationary distribution $\pi$.
Proof:
$$ \sum_{x'} p(x') T(x'\rightarrow x) = \sum_{x'} p(x) T(x\rightarrow x') $$The right hand side is: $$ \sum_{x'} p(x) T(x\rightarrow x') = p(x) \sum_{x'} T(x\rightarrow x') = p(x) $$ So this is equivalent to the definition of a stationary distribution:
$$ \sum_{x'} p(x') T(x'\rightarrow x) = p(x) $$Proposal distribution $Q(\vec x \rightarrow \vec x')$
Acceptance probability $A(\vec x \rightarrow \vec x')$
At each state $\vec x$,
for $x \neq x'$: $$ T(\vec x \rightarrow \vec x') = Q(\vec x \rightarrow \vec x') A(\vec x \rightarrow \vec x') $$
for $x=x'$ $$ T(\vec x \rightarrow \vec x) = Q(\vec x \rightarrow \vec x) + \sum_{x'}Q(\vec x \rightarrow \vec x') \left(1-A(\vec x \rightarrow \vec x')\right) $$
from detailed balance:
$$ \pi(x') T(x'\rightarrow x) = \pi(x) T(x \rightarrow x') $$and for $x\neq x'$ the following should hold:
$$ \pi(\vec x') Q(\vec x' \rightarrow \vec x) A(\vec x' \rightarrow \vec x) = \pi(\vec x) Q(\vec x \rightarrow \vec x') A(\vec x \rightarrow \vec x') $$therefore follows a constraint on the $A$-ratio:
$$ \frac{A(\vec x \rightarrow \vec x')}{A(\vec x' \rightarrow \vec x)} = \frac{\pi(\vec x') Q(\vec x' \rightarrow \vec x) }{\pi(\vec x) Q(\vec x \rightarrow \vec x') } $$without loss of generality: numerator smaller as denominator
general rule:
$$ A(\vec x \rightarrow \vec x') = \min\left[ 1, \frac{\pi(\vec x') Q(\vec x' \rightarrow \vec x) }{\pi(\vec x) Q(\vec x \rightarrow \vec x')} \right] $$Q must be reversible:
Opposing forces: