(Source: Pieter Abbeel, CS188)

- Sampling from a uniform distribution: $x \sim Uniform [0,1)$
- Convert the sample in an outcome for the given distribution by having each outcome associated with a sub-intervall of $[0,1)$ with sub-interval size equal to the probability.

e.g. for

C | P(C) |
---|---|

red | 0.2 |

green | 0.3 |

blue | 0.5 |

In [18]:

```
# principle of sampling from the above probability table:
def get_a_C():
#step 1
x = np.random.rand()
#step 2
if x < 0.2:
C = 'red'
elif x < 0.5:
C='green'
else:
C='blue'
return C
print get_a_C()
```

blue

- for $i = 1,2, ..,n$:
- sample $x_i$ from $P(x_i | Parents(x_i))$

- return $(x_1, x_2, .., x_n)$

We are sampling from the right distribution (can easyly be shown).

If the numbers of sample goes to infinity, we get as estimate the correct join probabilities.

e.g. estimate of $P(C \mid r^1, d^0)$:

- select all samples consistent with the evidence, the number of such samples is $n_e$
- estimate the probabilities of different $C$ assignments by counting the corresponding number and dividing it through $n_e$

If we have a low number of counts we have a bad estimate of $P$. In the worste case: no counts for numerator and denominator $\rightarrow P \approx \frac{0}{0}$

Process:

- input: evidence instantiation
- for $i = 1,2, ..,n$:
- sample $x_i$ from $P(x_i | Parents(x_i))$
- if $x_i$ non consistence with evidence: reject sample, i.e. no sample is generated in this cycle

- return $(x_1, x_2, .., x_n)$

Problem:

- if the evidence is very unlikely we reject a lot of samples
- evidence not exploited when we are sampling

Idea: fix evidence variable and sample the rest.

- Problem: Don't sample from the right distribution.
- Solution: Weight the sample by the probabilities of the evidence.

e.g.: shape $\rightarrow$ color

and

square | circle | triangle | |
---|---|---|---|

red | 0.2 | 0.5 | 0.3 |

green | 0.3 | 0.2 | 0.3 |

blue | 0.5 | 0.3 | 0.4 |

and $Q=P( Shape \mid Color=blue)$

Samples with weights:

- square: 0.5
- circle: 0.3
- square: 0.5
- triangle: 0.4
- ...

- input: evidence instantiation
- $weight=1$
- for $i = 1,2, ..,n$:
- if $X_i$ is evidence variable:
- $X_i$ is observation of $x_i$ of $X_i$
- set $weight. \leftarrow weight * P(x_i \mid Par(x_i))$

- else:
- sample $x_i$ from $P(x_i | Parents(x_i))$

- if $x_i$ non consistence with evidence: reject sample, i.e. no sample is generated in this cycle

- if $X_i$ is evidence variable:
- return $((x_1, x_2, .., x_n), weight)$

- Evidence influences the choice of downstream variables, but not upstream variables.
- unlikely samples have very low weights
- maybe just small number of samples with high weights $\rightarrow$ bad estimate

- keep track of a full instantiation - start with a random instantiation
- sample one variable condition on all the other variables, except the evidence variables

Process:

- Fix Evidence:
- Initialite the other variables randomly
- Repeat:
- Choose a non-evidence variable $X_i$ (randomly!)
- Resample $X_i$ from $P(X_i \mid x_1, x_2, \dots , x_{i-1}, x_{i+1}, \dots , x_{n}) $

How to compute $X_i$ from $P(X_i \mid x_1, x_2, \dots , x_{i-1}, x_{i+1}, \dots , x_{n}) $ efficently?

$$ P(X_i \mid x_1, x_2, \dots , x_{i-1}, x_{i+1}, \dots , x_{n}) = $$$$ \frac{P(x_1, x_2, \dots , x_{i-1}, X_i, x_{i+1}, \dots ,x_{n})}{P(x_1, x_2, \dots x_{i-1}, x_{i+1}, \dots , x_{n})} = $$$$ =\frac{P(x_1, x_2, \dots , x_{i-1}, X_i, x_{i+1}, \dots ,x_{n})} {\sum_{x'_i} P(x_1, x_2, \dots x_{i-1}, x'_i, x_{i+1}, \dots , x_{n})} $$Factorization of $P(X_i \mid X_1, X_2, \dots , X_{i-1}, X_i, X_{i+1}, \dots , X_{n})$ by the topology of the Bayes net: e.g.

$$ P(X_i \mid X_1, X_2, \dots , X_{i-1}, X_i, X_{i+1}, \dots , X_{n}) = P(X_1) P(X_2 \mid X_1) P(X_3 \mid X_1) \dots P(X_i \mid \dots) P( \dots \mid \dots X_i \dots ) P( \dots \mid \dots X_i \dots )\dots $$The important point is that there are a lot of factors without any dependencies on $X_i$ (either on the left or right side of the bar '$\mid$'). So the fraction above can be reduced from this terms:

$$ P(X_i \mid x_1, x_2, \dots , x_{i-1}, x_{i+1}, \dots , x_{n}) = $$```
```

$$
=\frac{P(X_i \mid \dots) P( \dots \mid \dots X_i \dots ) P( \dots \mid \dots X_i \dots )}
{\sum_{x'_i} P(x'_i \mid \dots) P( \dots \mid \dots x'_i \dots ) P( \dots \mid \dots x'_i \dots ) }
$$
- S. Russel, P. Norvig: Artifical Intelligence a Modern Approach, 3th edition, 2010
- Pieter Abbeel: Berkley CS188, Video https://www.youtube.com/watch?v=ogYaTTRguIw