Random variable vector $\vec v$. Each element in the vector corresponds to a trial (categorical variable).
$v_j \in \{1,2, \dots , k\}$
e.g. (with k=3): $\vec v = (3,1,2,2,3,3,2,1) $
$\theta_i$ probability to get the outcome $i$ in a trial e.g.
$ P(v_j = i \mid \theta) = \theta_i $
$\theta = \{\theta_1, \theta_2, \dots, \theta_n\}$
with:
The probability mass fuction to get in $n$-iid trials the number of $x_i$-outcomes for category $i$ is given by the multinomial distribution:
$$ P(X_1=x_1, X_2=x_2, \dotsc, X_k=x_k)= \frac {n!}{x_1! \cdot x_2! \dotsm x_k!}\theta_1^{x_1} \cdot \theta_2^{x_2} \dotsm \theta_k^{x_k} $$
with: $x_1 + \dots + x_k = n$
The Dirichlet distribution is the conjugate prior of the multinomial distribution, i.e. if the prior distribution of the multinomial parameters is Dirichlet then the posterior distribution is also a Dirichlet distribution (with parameters different from those of the prior)
Dirichlet pdf (probability density function):
$$ p(\theta_1,\dots, \theta_{K} \mid \alpha_1,\dots, \alpha_K) = \frac{1}{\mathrm{B}(\alpha)} \prod_{i=1}^K \theta_i^{\alpha_i - 1} $$
with
Simplex: $\theta_i \geq 0$ and $\sum_i \theta_i = 1$
Example: $k=3$:
Dirichlet on simplex with $\alpha = (1, 1, 1)$
draw_pdf_contours(Dirichlet([1., 1., 1.]))
Dirichlet on simplex with $\alpha = (.9999, 0.9999, 0.9999)$
draw_pdf_contours(Dirichlet([.9999, 0.9999, 0.9999]))
draw_pdf_contours(Dirichlet([10., 20., 3.]))
draw_pdf_contours(Dirichlet([1., 2., 5.]))
$$ \mathbb{E}(\theta_i) = \frac{\alpha_i}{A} $$ with $$ A = \sum_i \alpha_i $$
alpha = np.array([1., 1., 1.]) # prior parameter of Dirichlet
x = np.array([10, 3, 4]) # observations
alpha_posterior = plot_posterior_dirichlet(alpha, x)
print ("posterior parameters: ", alpha_posterior)
posterior parameters: [11. 4. 5.]
The mode is the value that appears most often in a set of data.
The mode of the Dirichlet is given by:
$$ \theta_i =\frac{\alpha_i - 1}{A - k} $$
with
theta_unk = [0.2, 0.5, 0.3]
#data = np.random.multinomial(20, theta_unk, size=1)
data = np.random.choice(3, 200, p=theta_unk)
data
array([2, 2, 2, 1, 2, 1, 1, 0, 1, 2, 2, 1, 1, 0, 2, 1, 0, 1, 1, 1, 0, 0, 2, 0, 1, 1, 0, 2, 1, 0, 1, 1, 1, 2, 0, 2, 2, 0, 1, 2, 2, 1, 1, 2, 0, 2, 2, 1, 0, 0, 2, 1, 2, 0, 1, 0, 0, 0, 2, 2, 0, 0, 1, 2, 1, 0, 2, 2, 1, 1, 2, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 2, 0, 2, 1, 2, 0, 0, 2, 0, 1, 0, 1, 2, 1, 2, 1, 1, 0, 1, 0, 0, 0, 2, 2, 2, 2, 2, 2, 1, 1, 2, 2, 0, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 0, 1, 0, 2, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 2, 2, 2, 0, 0, 1, 1, 2, 0, 2, 2, 0, 0, 2, 2, 1, 2, 2, 1, 0, 0, 1, 1, 0, 2, 2, 1, 1, 0, 2, 2, 2, 2, 1, 2, 0])
import pymc3
def create_model(data):
with pymc3.Model() as model:
k = data.max()+1
a = np.ones(k)
p = pymc3.Dirichlet('p', a, shape=a.shape)
c = pymc3.Categorical('c', p, observed=data, shape=1)
return model
model = create_model(data)
with model:
step = pymc3.Metropolis(model.vars)
trace = pymc3.sample(20000, step)
a = pymc3.traceplot(trace)
Multiprocess sampling (2 chains in 2 jobs) Metropolis: [p] Sampling 2 chains: 100%|██████████| 41000/41000 [00:12<00:00, 3392.72draws/s] The number of effective samples is smaller than 10% for some parameters.