sampling_from_bayesnets slides

### Sampling from a Bayes net¶

(Source: Pieter Abbeel, CS188)

#### Sampling from a given distribution¶

1. Sampling from a uniform distribution: $x \sim Uniform [0,1)$
2. Convert the sample in an outcome for the given distribution by having each outcome associated with a sub-intervall of $[0,1)$ with sub-interval size equal to the probability.

e.g. for

C P(C)
red 0.2
green  0.3
blue 0.5
In [18]:
# principle of sampling from the above probability table:
def get_a_C():
#step 1
x = np.random.rand()
#step 2
if x < 0.2:
C = 'red'
elif x < 0.5:
C='green'
else:
C='blue'
return C

print get_a_C()
blue

### Prior Sampling¶

#### Process:¶

• for $i = 1,2, ..,n$:
• sample $x_i$ from $P(x_i | Parents(x_i))$
• return $(x_1, x_2, .., x_n)$

We are sampling from the right distribution (can easyly be shown).

#### Consistency:¶

If the numbers of sample goes to infinity, we get as estimate the correct join probabilities.

e.g. estimate of $P(C \mid r^1, d^0)$:

• select all samples consistent with the evidence, the number of such samples is $n_e$
• estimate the probabilities of different $C$ assignments by counting the corresponding number and dividing it through $n_e$

#### Problem:¶

If we have a low number of counts we have a bad estimate of $P$. In the worste case: no counts for numerator and denominator $\rightarrow P \approx \frac{0}{0}$

### Rejection Sampling¶

Process:

• input: evidence instantiation
• for $i = 1,2, ..,n$:
• sample $x_i$ from $P(x_i | Parents(x_i))$
• if $x_i$ non consistence with evidence: reject sample, i.e. no sample is generated in this cycle
• return $(x_1, x_2, .., x_n)$

Problem:

• if the evidence is very unlikely we reject a lot of samples
• evidence not exploited when we are sampling

### Likelihood Weighting¶

Idea: fix evidence variable and sample the rest.

• Problem: Don't sample from the right distribution.
• Solution: Weight the sample by the probabilities of the evidence.

e.g.: shape $\rightarrow$ color

and

square circle triangle
red 0.2 0.5 0.3
green 0.3 0.2 0.3
blue 0.5 0.3 0.4

and $Q=P( Shape \mid Color=blue)$

Samples with weights:

• square: 0.5
• circle: 0.3
• square: 0.5
• triangle: 0.4
• ...

#### Process¶

• input: evidence instantiation
• $weight=1$
• for $i = 1,2, ..,n$:
• if $X_i$ is evidence variable:
• $X_i$ is observation of $x_i$ of $X_i$
• set $weight. \leftarrow weight * P(x_i \mid Par(x_i))$
• else:
• sample $x_i$ from $P(x_i | Parents(x_i))$
• if $x_i$ non consistence with evidence: reject sample, i.e. no sample is generated in this cycle
• return $((x_1, x_2, .., x_n), weight)$

#### Problems¶

• Evidence influences the choice of downstream variables, but not upstream variables.
• unlikely samples have very low weights
• maybe just small number of samples with high weights $\rightarrow$ bad estimate

### Gibbs Sampling¶

• keep track of a full instantiation - start with a random instantiation
• sample one variable condition on all the other variables, except the evidence variables

Process:

• Fix Evidence:
• Initialite the other variables randomly
• Repeat:
• Choose a non-evidence variable $X_i$ (randomly!)
• Resample $X_i$ from $P(X_i \mid x_1, x_2, \dots , x_{i-1}, x_{i+1}, \dots , x_{n})$

How to compute $X_i$ from $P(X_i \mid x_1, x_2, \dots , x_{i-1}, x_{i+1}, \dots , x_{n})$ efficently?

$$P(X_i \mid x_1, x_2, \dots , x_{i-1}, x_{i+1}, \dots , x_{n}) =$$$$\frac{P(x_1, x_2, \dots , x_{i-1}, X_i, x_{i+1}, \dots ,x_{n})}{P(x_1, x_2, \dots x_{i-1}, x_{i+1}, \dots , x_{n})} =$$$$=\frac{P(x_1, x_2, \dots , x_{i-1}, X_i, x_{i+1}, \dots ,x_{n})} {\sum_{x'_i} P(x_1, x_2, \dots x_{i-1}, x'_i, x_{i+1}, \dots , x_{n})}$$

Factorization of $P(X_i \mid X_1, X_2, \dots , X_{i-1}, X_i, X_{i+1}, \dots , X_{n})$ by the topology of the Bayes net: e.g.

$$P(X_i \mid X_1, X_2, \dots , X_{i-1}, X_i, X_{i+1}, \dots , X_{n}) = P(X_1) P(X_2 \mid X_1) P(X_3 \mid X_1) \dots P(X_i \mid \dots) P( \dots \mid \dots X_i \dots ) P( \dots \mid \dots X_i \dots )\dots$$

The important point is that there are a lot of factors without any dependencies on $X_i$ (either on the left or right side of the bar '$\mid$'). So the fraction above can be reduced from this terms:

$$P(X_i \mid x_1, x_2, \dots , x_{i-1}, x_{i+1}, \dots , x_{n}) =$$

$$=\frac{P(X_i \mid \dots) P( \dots \mid \dots X_i \dots ) P( \dots \mid \dots X_i \dots )} {\sum_{x'_i} P(x'_i \mid \dots) P( \dots \mid \dots x'_i \dots ) P( \dots \mid \dots x'_i \dots ) }$$