introduction_into_bayesian_inference_with_pymc slides

Introduction into Bayesian Inference with PyMc

In a full bayesian approach the parameters of a model are considered also as random variables. In this way a Bayesian handles with the uncertainty about the values of the parameters.

Example: Tossing Thumtacks

In [246]:
from IPython.display import Image
Image(filename='./thumbtack.jpg', width=200, height=200)

Bernoulli trials

To simulate the tossing behaviour of the thumtack we need the Bernoulli distribution, with $\theta=0.3$ the probability of success ($X=1$; here: thumtack shows head):

$$ P(X=1) = \theta \\ P(X=0) = 1 - \theta $$

$n$-iid (independent and identically distributed) Benoulli trials can be simulated by:

In [247]:
import scipy.stats
scipy.stats.bernoulli.rvs(0.3, size=10)
array([1, 0, 1, 0, 1, 0, 0, 0, 1, 1])

Binomial Distribution

with $\theta$ the probability of successes of a Bernoulli trial the probability mass function (pmf) of $k$ successes of $n$ iid trials is given by

$$ P( X = k \mid \theta) = {n \choose k} \theta^k (1-\theta)^{n-k} $$
In [352]:
plot_binom(theta=0.3, n=10)

Beta Distribution

Probability density function (pdf) of the $Beta(\alpha, \beta)$ distribution is:

$$ \begin{align} p(\theta \mid \alpha,\beta) & = \frac{\theta^{\alpha-1}(1-\theta)^{\beta-1}}{\int_0^1 u^{\alpha-1} (1-u)^{\beta-1}\, du} \end{align} $$


  • $\theta \in { } [0,1]$
  • shape parameter: $\alpha > 0, \beta > 0$

Note: The denominator normalizes the pdf

Beta distribution in scipy

In [280]:
alpha=1; beta=1
r = np.arange(0.,1.01,.01)
plt.plot(r, scipy.stats.beta.pdf(r, alpha, beta), 'r-', lw=5, alpha=0.6, label='beta pdf')

Conjugate Prior

Bayes Rule for the parameters $\vec \theta$ given the observation $\mathcal D$ (data):

$$ p(\vec \theta | \mathcal D) = \frac{p({\mathcal D \mid \vec \theta}) p(\vec \theta)}{p(\mathcal D)} $$

In Bayesian probability theory, if the posterior distributions $p(\theta | \mathcal D)$ are in the same family as the prior probability distribution $p(\theta)$, the prior and posterior are then called conjugate distributions, and the prior is called a conjugate prior for the likelihood function.

Example: The Beta distribution is the conjugate prior if the likelihood function is the Binomial distribution.

Beta distribution as conjugate prior of the Binomial distribution

sufficient statistics for Binominal (model): $k, n$
(a statistic is sufficient with respect to a statistical model and its associated unknown parameter if "no other statistic that can be calculated from the same sample provides any additional information as to the value of the parameter".)

Bayes Rule:

$$ p(\theta \mid \mathcal D) = \frac{p(\mathcal D \mid \theta) p(\theta)}{p(\mathcal D)} $$

here a sufficent statistics of $\mathcal D$ is the number of successes $X=k$ and the number of trials $n$.

So $$ p(\theta \mid \mathcal D) \propto \theta^k (1-\theta)^{n-k} \theta^{\alpha-1}(1-\theta)^{\beta-1} = \theta^{k+\alpha-1}(1-\theta)^{n-k+\beta-1} $$

Beta distribution as conjugate prior of the Binomial distribution
  • prior for $\theta$: $Beta(\alpha, \beta)$
  • (new) observations: $k$-successes in $n$-trials
  • posterior: $Beta(\alpha + k, \beta + n - k)$
In [353]:
plot_prior_posterior_beta(alpha=1., beta=1., k=2, n=10)
Thumtack machine
  • think of a machine that produces thumtacks
  • the machine can be parameterized with $\alpha, \beta$
  • even with fixed $\alpha, \beta$ values the machine produces in general thumtacks with different $\theta$
In [357]:
# without any observation of a produced thumtack: k=0, n=0

# with machine parameter: alpha=2., beta=6.
# tacking one produced thumtack and throwing is 100 times
#   with 20 positive observations: 
plot_prior_posterior_beta(alpha=2., beta=6., k=20, n=100)

Random Variables in PyMc

Here we choose $X$ to be Beta distributed (conjugate prior to the Binomial distribution):

$$ X \sim Beta(\alpha=1,\beta=6) $$

Such a variable is called stochastic in the terminology of pymc.

In [315]:
alpha=1; beta=6;
X = pymc.Beta('X', alpha=alpha, beta=beta)

Drawing samples from a distribution

X.random(): Draws a new value for a stochastical conditional on its parents and returns it.

In [305]:
print X.random()
print X.random()

First PyMc Example: Binomial trials

Let's assume we have some data from iid binomial trials and we want to estimate the probability of observing a 1.

In [309]:
#simulate some observations, n = 100 :
theta_unk = 0.3 # unknown theta, that's the parameter we want to estimate
nb_data = n = 40
data = scipy.stats.bernoulli.rvs(theta_unk, size=nb_data)
print data
k = data.sum()
theta_ml = float(k)/float(n)
print "Maximum likelihood estimate of theta:", theta_ml
[0 1 0 1 0 0 0 1 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 1
 0 0 0]
Maximum likelihood estimate of theta: 0.25

Model in PyMc


We choose the Beta distribution as prior with $\alpha=1$ und $\beta=1$:

theta = pymc.Beta('theta', alpha=1, beta=1)

In [334]:
theta = pymc.Beta('theta', alpha=1, beta=1)
samples = [theta.random() for i in range(20000)]
plt.hist(samples, bins=100, normed=True )
plt.title("prior distribution for $\\theta$")

Observed Variable

Our observed variable is a Bernoulli random variable.
For a pymc Bernoulli variable we can use another pymc variable as value for p: our theta-Variable:

pymc.Bernoulli('bernoulli', p=theta, value=data, observed=True)

In [355]:
# we want to use the model with different data, so we have to specify a
# function in pymc
def create_model(data):
    #create a uniform prior, the lower and upper limits of which are 0 and 1
    theta = pymc.Beta('theta', alpha=1, beta=1)
    bernoulli = pymc.Bernoulli('bernoulli',p=theta, value=data, observed=True)
    model = pymc.Model([theta, bernoulli])
    return model

model = create_model(data)

as (directed) graphical model

In [339]:

from IPython.display import Image

Sampling from the model

In [340]:
mcmc = pymc.MCMC(model)
p_trace = mcmc.trace('theta')[:]
 [-----------------100%-----------------] 50000 of 50000 complete in 2.4 sec

Plotting the posterior of $\theta$

In [341]:
ax = plt.subplot(121)
hist, bins = np.histogram(p_trace, bins=np.arange(0.,1.01,0.01), normed=True)
width = 0.7 * (bins[1] - bins[0])
center = (bins[:-1] + bins[1:]) / 2, hist, align='center', width=width)
plt.xlim([0., 1.0])
plt.ylim([0., 6.0])
ax = plt.subplot(122)
ax.plot(bins, scipy.stats.beta.pdf(bins, k+1, n-k+1), 'r-', lw=5, alpha=0.6, label='beta pdf')
plt.xlim([0., 1.0])
plt.ylim([0., 6.0])

Comparing pymc's approximation with the exact solution:

In [272]:
plot_theta_histograms(histograms, bins, data)