Probability in a nutshell

Random experiments

The possible outcomes $i$ of a random experiment determine the value of a random variable $x \in \Omega_{\mathcal X} = \{ a_1, a_2, \dots ,a_l\}$

Examples:

• dice:
• $\Omega_{\mathcal X} = \{1,2,3,4,5,6\}$ or
• $\Omega_{\mathcal X} = \{even, odd\}$
• selection of a random character of a book:
• $\Omega_{\mathcal X} = \{ a,b,\dots , - \}$

Probability

Every outcome $a_i$ has a probability $P(x = a_i)$ (short: $p_i$)

with

• $0 \leq p_i \leq 1$
• $P(\Omega_{\mathcal X}) = \sum_{a_i \in \Omega_{\mathcal X}} p_i = 1$

Subsets $T$ of $\Omega_{\mathcal X}$

The probability of a subset $T$ of $\Omega_{\mathcal X}$ is given by the sum of the probabilities of all elements in the subset:

$$P(T) = \sum_{i \in T} p_i$$

Example:

Probability of a vocal $V$

• $V = \{ {\tt a},{\tt e},{\tt i},{\tt o},{\tt u } \}$
• $P(V) = P(x={\tt a}) + P(x={\tt e}) + P(x={\tt i}) + P(x={\tt o}) + P(x={\tt u})$

Join Probabilites

The outcome of a random experiment can also be an ordered pair (or in general tuple) of random variables $x_1, x_2$ with

• $x_1 \in \Omega_{\mathcal X_1} = \{ a_1, a_2, \dots , a_l\}$
• $x_2 \in \Omega_{\mathcal X_2} = \{ b_1, b_2, \dots , b_k\}$

$P(x_1, x_2)$ is the joint probability of $x_1$ and $x_2$

The random variables $x_1$ and $x_2$ need not to be independent (see below).

Example:

Throw of a dice with $x \in {even, odd}$ and $y \in {prime, notPrime}$

• $P(even, prime) = 1/6$
• $P(odd, prime) = 2/6$
• $P(even, notPrime) = 2/6$
• $P(odd, notPrime) = 1/6$

The specification of the probabilities of all possible states (outcomes) determines a probability distribution.

Marginalization

The marginal probability $P(x)$ can be obtained from the join probabilities by summation:

$$P(x_1 = a_i) \equiv \sum_{x_2 \in \Omega_{\mathcal X_2}} P(x_1 = a_i,x_2) .$$

or analog for the marginal distribution $P(x_2)$ (in shorter notation):

$$P(x_2) \equiv \sum_{x_1 \in \Omega_{\mathcal X_1}} P(x_1, x_2)$$

Example:

For the throw of the dice (see above) $P(prime) = P(even, prime) + P(odd, prime) = 1/2$

Conditional probability

The conditional probability is defined by: $$p(x_1 = a_i| x_2 = b_j) \equiv \frac{P(x_1 = a_i, x_2 = b_j)}{P(x_2 = b_j)}$$

• if $P(x_2 = b_j) \neq 0$
• if $P(x_2 = b_j) = 0$ than $P(x_1 = a_i | x_2 = b_j)$ is not defined

Example: For the throw of the dice (see above) $$P(prime|even) = \frac{1/6}{1/6 + 2/6} = 1/3$$

Product rule

$$P(x_1,x_2) = P(x_1 | x_2 ) P( x_2) = P(x_2 | x_1) P( x_1)$$

Sum rule

$$P(x_1) = \sum_{x_2} P(x_1,x_2) = \sum_{x_2} P(x_1|x_2) P(x_2)$$

Bayes rule

From the product rule follows the bayes rule: $$P(x_2 | x_1) = \frac{ P(x_1 | x_2 ) P( x_2 ) } { P(x_1 ) } = \frac{ P(x_1 | x_2 ) P( x_2 ) } { \sum_{x_2'} P(x_1 | x_2' ) P( x_2') }$$

(Statistical) independence

Two random variables are statistically independent if and only if (iff): $$P(x_1, x_2) = p(x_1) p(x_2)$$

Notation: $x_1 \perp x_2$

Chain Rule

The joint probability of $n$ random variables $x_1, x_2, \dots x_n$ can be decomposed with the chain rule:

$$p(x_1, x_2, \dots x_n) = p(x_1) p(x_2|x_1) p(x_3| x_1, x_2)\dots p(x_n| x_1, x_2, \dots x_{n-1})$$

Conditional independence

Two random variables $x_1, x_2$ are conditional independent given a third variable $x_3$ iff:

$$P(x_1, x_2| x_3) = P(x_1|x_3) (x_2|x_3)$$

Interpretation of probabilities

• Bayesian interpretation: Probability is a degree of belief
• Frequentist interpretation:

Continuous variables

Probability density

For continuous variables $p(x)$ is a probability density function (pdf): $$p(x) \geq 0, \int_{-\infty}^\infty p(x) dx = 1$$

With the probability that the value of $x$ is in the intervall $[a,b]$:

$$P(a \leq x \leq b) = \int_a^b p(x) dx$$

Expectation value

The expectation value of a function $f(x)$ is:

$$\mathbb{E}_{\mathcal{X}}[f(x)] = \int_\infty^\infty f(x) p(x) dx = \int_\mathcal{X} f(x) dp(x)$$

Expectation value of a join probability distribution

The expectation value of a function $f(x_1, x_2)$ is: $$\mathbb{E}_{\mathcal{X_1,X_2}}[f(x_1,x_2)] = \int_\mathcal{X_1} \int_\mathcal{x_2} f(x_1,x_2) p(x_1,x_2) dx_1 dx_2 = \int_{\mathcal{X_1}\times\mathcal{X_2}} f(x_1,x_2) dp(x_1,x_2)$$