Multinomial-Dirichlet slides

Categorical data / Multinomial distribution

Random variable vector $\vec v$. Each element in the vector corresponds to a trial (categorical variable).

$v_j \in \{1,2, \dots , k\}$

  • $k$ possible outcomes (categories)

e.g. (with k=3): $\vec v = (3,1,2,2,3,3,2,1) $

$\theta_i$ probability to get the outcome $i$ in a trial, with:

  • $\theta_1 + \dots + \theta_k = 1$ and
  • $\theta_i \geq 0$.

e.g. $$ P(v_j \mid \vec \theta) = (\theta_1, \theta_2, \theta_3) $$

Multinomial distribution

with:

  • $k$: possible outcomes
  • $\theta_i$ probability to get the outcome $i$ in a trial, with: $\theta_1 + \dots + \theta_k = 1$
  • $i \in \{1, \dotsc, k\}$

The probability mass fuction to get in $n$-iid trials the number of $x_i$-outcomes for category $i$ is given by the multinomial distribution:

$$ P(X_1=x_1, X_2=x_2, \dotsc, X_k=x_k)= \frac {n!}{x_1! \cdot x_2! \dotsm x_k!}\theta_1^{x_1} \cdot \theta_2^{x_2} \dotsm \theta_k^{x_k} $$

with: $x_1 + \dots + x_k = n$

Dirichlet distribution as conjugate prior

The Dirichlet distribution is the conjugate prior of the multinomial distribution, i.e. if the prior distribution of the multinomial parameters is Dirichlet then the posterior distribution is also a Dirichlet distribution (with parameters different from those of the prior)

  • the posterior distribution is easy to compute
  • it in some sense is possible to quantify how much our beliefs have changed after collecting the data.

Dirichlet pdf (probability density function):

$$ f(\theta_1,\dots, \theta_{K}; \alpha_1,\dots, \alpha_K) = \frac{1}{\mathrm{B}(\alpha)} \prod_{i=1}^K \theta_i^{\alpha_i - 1} $$

with

  • $\theta_i \geq 0$ and $\sum_i \theta_i = 1$
  • the multinomial Beta function $\mathrm{B}(\alpha)$ as normalizing constant.
  • (concentration) parameter $\alpha_1, \dots, \alpha_k$

Dirchlet pdf in scipy

Visualization on the probability simplex in $\mathbb{R}^3$

Simplex: $\theta_i \geq 0$ and $\sum_i \theta_i = 1$
Example: $k=3$:

alt text

Dirichlet on simplex with $\alpha = (1, 1, 1)$

In [11]:
draw_pdf_contours(Dirichlet([1., 1., 1.]))

Dirichlet on simplex with $\alpha = (.9999, 0.9999, 0.9999)$

In [12]:
draw_pdf_contours(Dirichlet([.9999, 0.9999, 0.9999]))
In [13]:
draw_pdf_contours(Dirichlet([1., 1., 1.]))
In [14]:
draw_pdf_contours(Dirichlet([10., .9, 3.]))
In [15]:
draw_pdf_contours(Dirichlet([1., 2., 5.]))
$$ \mathbb{E}(\theta_i) = \frac{\alpha_i}{A} $$

with $$ A = \sum_i \alpha_i $$

Posterior given data

In [18]:
alpha = np.array([1., 1., 1.]) # prior parameter of Dirichlet
x = np.array([10, 3, 4]) # observations

alpha_posterior = plot_posterior_dirichlet(alpha, x)
print alpha_posterior
[ 11.   4.   5.]

Mode of the Dirichlet

The mode is the value that appears most often in a set of data.

The mode of the Dirichlet is given by:

$$ \theta_i =\frac{\alpha_i - 1}{A - k} $$

with

  • $k$: number of different classes
  • $A = \sum_i \alpha_i$
In [23]:
import pymc 

p = pymc.Dirichlet('p', theta=alpha_posterior)
c = pymc.Categorical('c', p)

model = pymc.Model([p,c])
mcmc = pymc.MCMC(model)
mcmc.sample(iter=50000,burn=1000)
Couldn't import dot_parser, loading of dot files will not be possible.
 [-----------------100%-----------------] 50000 of 50000 complete in 10.1 sec
In [24]:
c_samples = mcmc.trace("c")[:]
c_samples
Out[24]:
array([2, 0, 0, ..., 0, 0, 2])