Probability in a nutshell

Discrete Variables

Random experiments

The possible outcomes $i$ of a random experiment determine the value of a random variable $x \in \Omega_{\mathcal X} = \{ a_1, a_2, \dots ,a_l\}$

Examples:

  • dice:
    • $\Omega_{\mathcal X} = \{1,2,3,4,5,6\}$ or
    • $\Omega_{\mathcal X} = \{even, odd\}$
  • selection of a random character of a book:
    • $\Omega_{\mathcal X} = \{ a,b,\dots , - \}$

Probability

Every outcome $a_i$ has a probability $P(x = a_i)$ (short: $p_i$)

with

  • $0 \leq p_i \leq 1$
  • $P(\Omega_{\mathcal X}) = \sum_{a_i \in \Omega_{\mathcal X}} p_i = 1$

Subsets $T$ of $\Omega_{\mathcal X}$

The probability of a subset $T$ of $\Omega_{\mathcal X}$ is given by the sum of the probabilities of all elements in the subset:

$$ P(T) = \sum_{i \in T} p_i $$

Example:

Probability of a vocal $V$

  • $V = \{ {\tt a},{\tt e},{\tt i},{\tt o},{\tt u } \}$
  • $P(V) = P(x={\tt a}) + P(x={\tt e}) + P(x={\tt i}) + P(x={\tt o}) + P(x={\tt u})$

Join Probabilites

The outcome of a random experiment can also be an ordered pair (or in general tuple) of random variables $x_1, x_2$ with

  • $x_1 \in \Omega_{\mathcal X_1} = \{ a_1, a_2, \dots , a_l\}$
  • $x_2 \in \Omega_{\mathcal X_2} = \{ b_1, b_2, \dots , b_k\}$

$P(x_1, x_2)$ is the joint probability of $x_1$ and $x_2$

The random variables $x_1$ and $x_2$ need not to be independent (see below).

Example:

Throw of a dice with $x \in {even, odd}$ and $y \in {prime, notPrime}$

  • $P(even, prime) = 1/6$
  • $P(odd, prime) = 2/6$
  • $P(even, notPrime) = 2/6$
  • $P(odd, notPrime) = 1/6$

The specification of the probabilities of all possible states (outcomes) determines a probability distribution.

Marginalization

The marginal probability $P(x)$ can be obtained from the join probabilities by summation:

$$ P(x_1 = a_i) \equiv \sum_{x_2 \in \Omega_{\mathcal X_2}} P(x_1 = a_i,x_2) . $$

or analog for the marginal distribution $P(x_2)$ (in shorter notation):

$$ P(x_2) \equiv \sum_{x_1 \in \Omega_{\mathcal X_1}} P(x_1, x_2) $$

Example:

For the throw of the dice (see above) $P(prime) = P(even, prime) + P(odd, prime) = 1/2$

Conditional probability

The conditional probability is defined by: $$ p(x_1 = a_i| x_2 = b_j) \equiv \frac{P(x_1 = a_i, x_2 = b_j)}{P(x_2 = b_j)} $$

  • if $P(x_2 = b_j) \neq 0$
  • if $P(x_2 = b_j) = 0$ than $P(x_1 = a_i | x_2 = b_j)$ is not defined

Example: For the throw of the dice (see above) $$ P(prime|even) = \frac{1/6}{1/6 + 2/6} = 1/3 $$

Product rule

$$ P(x_1,x_2) = P(x_1 | x_2 ) P( x_2) = P(x_2 | x_1) P( x_1) $$

Sum rule

$$ P(x_1) = \sum_{x_2} P(x_1,x_2) = \sum_{x_2} P(x_1|x_2) P(x_2) $$

Bayes rule

From the product rule follows the bayes rule: $$ P(x_2 | x_1) = \frac{ P(x_1 | x_2 ) P( x_2 ) } { P(x_1 ) } = \frac{ P(x_1 | x_2 ) P( x_2 ) } { \sum_{x_2'} P(x_1 | x_2' ) P( x_2') } $$

(Statistical) independence

Two random variables are statistically independent if and only if (iff): $$ P(x_1, x_2) = p(x_1) p(x_2) $$

or $$ p(x_2 \mid x_1) = p(x_2) $$

Notation: $x_1 \perp x_2$

Chain Rule

The joint probability of $n$ random variables $x_1, x_2, \dots x_n$ can be decomposed with the chain rule:

$$ p(x_1, x_2, \dots x_n) = p(x_1) p(x_2 \mid x_1) p(x_3 \mid x_1, x_2)\dots p(x_n \mid x_1, x_2, \dots x_{n-1}) $$

Conditional independence

Two random variables $x_1, x_2$ are conditional independent given a third variable $x_3$ iff:

$$ P(x_1, x_2 \mid x_3) = P(x_1 \mid x_3) (x_2 \mid x_3) $$

Notation:
$x_1 \perp x_2 \mid x_3$

Expectation value

The expectation value of a function $f(x)$ is:

$$ \mathbb{E}_{\mathcal{X}}[f(x)] = \sum_x f(x) p(x) $$

Interpretation of probabilities

  • Bayesian interpretation: Probability is a degree of belief
  • Frequentist interpretation:

Continuous variables

Probability density

For continuous variables $p(x)$ is a probability density function (pdf): $$ p(x) \geq 0, \int_{-\infty}^\infty p(x) dx = 1 $$

With the probability that the value of $x$ is in the intervall $[a,b]$:

$$ P(a \leq x \leq b) = \int_a^b p(x) dx $$

Sum rule

$$ P(x_1) = \int_{\mathcal X_2} P(x_1,x_2) dx_2 $$

Expectation value

The expectation value of a function $f(x)$ is:

$$ \mathbb{E}_{\mathcal{X}}[f(x)] = \int_\infty^\infty f(x) p(x) dx = \int_\mathcal{X} f(x) dp(x) $$

Expectation value of a join probability distribution

The expectation value of a function $f(x_1, x_2)$ is: $$ \mathbb{E}_{\mathcal{X_1,X_2}}[f(x_1,x_2)] = \int_\mathcal{X_1} \int_\mathcal{x_2} f(x_1,x_2) p(x_1,x_2) dx_1 dx_2 = \int_{\mathcal{X_1}\times\mathcal{X_2}} f(x_1,x_2) dp(x_1,x_2) $$