sampling_from_bayesnets slides

Sampling from a Bayes net

(Source: Pieter Abbeel, CS188)

Sampling from a given distribution

  1. Sampling from a uniform distribution: $x \sim Uniform [0,1)$
  2. Convert the sample in an outcome for the given distribution by having each outcome associated with a sub-intervall of $[0,1)$ with sub-interval size equal to the probability.

e.g. for

C P(C)
red 0.2
green  0.3  
blue 0.5
In [18]:
# principle of sampling from the above probability table:
def get_a_C():
    #step 1
    x = np.random.rand()
    #step 2
    if x < 0.2:
        C = 'red'
    elif x < 0.5:
        C='green'
    else:
        C='blue'
    return C

print get_a_C()
blue

Prior Sampling

Process:

  • for $i = 1,2, ..,n$:
    • sample $x_i$ from $P(x_i | Parents(x_i))$
  • return $(x_1, x_2, .., x_n)$

We are sampling from the right distribution (can easyly be shown).

Consistency:

If the numbers of sample goes to infinity, we get as estimate the correct join probabilities.

Answering Queries:

e.g. estimate of $P(C \mid r^1, d^0)$:

  • select all samples consistent with the evidence, the number of such samples is $n_e$
  • estimate the probabilities of different $C$ assignments by counting the corresponding number and dividing it through $n_e$

Problem:

If we have a low number of counts we have a bad estimate of $P$. In the worste case: no counts for numerator and denominator $\rightarrow P \approx \frac{0}{0}$

Rejection Sampling

Process:

  • input: evidence instantiation
  • for $i = 1,2, ..,n$:
    • sample $x_i$ from $P(x_i | Parents(x_i))$
    • if $x_i$ non consistence with evidence: reject sample, i.e. no sample is generated in this cycle
  • return $(x_1, x_2, .., x_n)$

Problem:

  • if the evidence is very unlikely we reject a lot of samples
  • evidence not exploited when we are sampling

Likelihood Weighting

Idea: fix evidence variable and sample the rest.

  • Problem: Don't sample from the right distribution.
  • Solution: Weight the sample by the probabilities of the evidence.

e.g.: shape $\rightarrow$ color

and

square circle triangle
red 0.2 0.5 0.3
green 0.3 0.2 0.3
blue 0.5 0.3 0.4

and $Q=P( Shape \mid Color=blue)$

Samples with weights:

  • square: 0.5
  • circle: 0.3
  • square: 0.5
  • triangle: 0.4
  • ...

Process

  • input: evidence instantiation
  • $weight=1$
  • for $i = 1,2, ..,n$:
    • if $X_i$ is evidence variable:
      • $X_i$ is observation of $x_i$ of $X_i$
      • set $weight. \leftarrow weight * P(x_i \mid Par(x_i))$
    • else:
      • sample $x_i$ from $P(x_i | Parents(x_i))$
    • if $x_i$ non consistence with evidence: reject sample, i.e. no sample is generated in this cycle
  • return $((x_1, x_2, .., x_n), weight)$

Problems

  • Evidence influences the choice of downstream variables, but not upstream variables.
  • unlikely samples have very low weights
  • maybe just small number of samples with high weights $\rightarrow$ bad estimate

Gibbs Sampling

  • keep track of a full instantiation - start with a random instantiation
  • sample one variable condition on all the other variables, except the evidence variables

Process:

  • Fix Evidence:
  • Initialite the other variables randomly
  • Repeat:
    • Choose a non-evidence variable $X_i$ (randomly!)
    • Resample $X_i$ from $P(X_i \mid x_1, x_2, \dots , x_{i-1}, x_{i+1}, \dots , x_{n}) $

How to compute $X_i$ from $P(X_i \mid x_1, x_2, \dots , x_{i-1}, x_{i+1}, \dots , x_{n}) $ efficently?

$$ P(X_i \mid x_1, x_2, \dots , x_{i-1}, x_{i+1}, \dots , x_{n}) = $$$$ \frac{P(x_1, x_2, \dots , x_{i-1}, X_i, x_{i+1}, \dots ,x_{n})}{P(x_1, x_2, \dots x_{i-1}, x_{i+1}, \dots , x_{n})} = $$$$ =\frac{P(x_1, x_2, \dots , x_{i-1}, X_i, x_{i+1}, \dots ,x_{n})} {\sum_{x'_i} P(x_1, x_2, \dots x_{i-1}, x'_i, x_{i+1}, \dots , x_{n})} $$

Factorization of $P(X_i \mid X_1, X_2, \dots , X_{i-1}, X_i, X_{i+1}, \dots , X_{n})$ by the topology of the Bayes net: e.g.

$$ P(X_i \mid X_1, X_2, \dots , X_{i-1}, X_i, X_{i+1}, \dots , X_{n}) = P(X_1) P(X_2 \mid X_1) P(X_3 \mid X_1) \dots P(X_i \mid \dots) P( \dots \mid \dots X_i \dots ) P( \dots \mid \dots X_i \dots )\dots $$

The important point is that there are a lot of factors without any dependencies on $X_i$ (either on the left or right side of the bar '$\mid$'). So the fraction above can be reduced from this terms:

$$ P(X_i \mid x_1, x_2, \dots , x_{i-1}, x_{i+1}, \dots , x_{n}) = $$
 

$$ =\frac{P(X_i \mid \dots) P( \dots \mid \dots X_i \dots ) P( \dots \mid \dots X_i \dots )} {\sum_{x'_i} P(x'_i \mid \dots) P( \dots \mid \dots x'_i \dots ) P( \dots \mid \dots x'_i \dots ) } $$

Literature