$$ \mathcal D = \{ X_1, X_2, \dots X_n\} $$

Here we consider the data as random variables (frequentist view).

Also, the Data is iid (independent and identically distributed):

  • independent: the outcome of one observation does not effect the outcome of another observation
  • identically distributed, i.e. all $X_i$ are drawn from the same probability distribution.

Definition of a statistic

A statistic is a random variable $S$ that is a function of the data $\mathcal D$, i.e. $$ S = f \left(\mathcal D \right) $$

An estimator is a statistic attended to approximate a parameter governing the distribution of the data $\mathcal D$.

Notation: $\hat \theta$ is an estimator of $\theta$

Bias of an estimator

The bias of an estimator $\hat \theta$ is $$ \text{bias}(\hat \theta) := \mathbb E\left[ \hat \theta \right] - \theta $$ Difference between mean and true value of $\theta$.

The expectation is according to different data sets $\mathcal D$.

An estimator is unbiased if the bias is zero, i.e. $$\text{bias}(\hat \theta)= 0$$ or $$ \mathbb E\left[ \hat \theta \right] = \theta $$

Variance of an estimator

$$ var(\hat \theta) = \mathbb E \left[\left(\mathbb E [\hat \theta] - \hat \theta\right)^2\right] $$


\begin{align} var(\hat \theta) &= \mathbb E \left[\left(\mathbb E [\hat \theta] - \hat \theta\right)^2\right] \\ &= \mathbb E \left[(\mathbb E [\hat \theta])^2 -2\mathbb E [\hat \theta] \hat \theta+ \hat \theta^2\right] \\ &= (\mathbb E [\hat \theta])^2 - 2 \mathbb E [\hat \theta]\mathbb E [\hat \theta] + \mathbb E [\hat \theta^2] \\ &= \mathbb E [\hat \theta^2] - (\mathbb E [\hat \theta])^2 \end{align}


Data generating distribution is a univariate Gaussian:

$$ X_i \sim \mathcal N(\mu, \sigma^2) $$

The Gaussian has two parameters which we want to estimate from the data:

  • Mean: $\mu = \mathbb E \left[ X\right]$
  • Variance of the Gaussian $\sigma^2 = \mathbb E \left[ (X - \mu)^2 \right]$

Note that the technical term variance is used here for two different concepts

  • variance of the estimator $var(\hat \theta)$ and
  • variance of the Gaussian $\sigma^2$.

We can use the following estimators of $\mu$ and $\sigma^2$:

  • Sample mean is an estimator of the mean $\mu$:
    $$\hat \mu = \frac{1}{n} \sum_{i=1}^n X_i =: \bar X $$

  • (Biased) sample variance is an estimator of the variance $\sigma^2$:
    $$\hat \sigma_b^2 = \frac{1}{n}\sum_{i=1}^n (X_i- \bar X)^2$$

  • (Unbiased) sample variance is an estimator of the variance $\sigma^2$:
    $$\hat \sigma_u^2 = \frac{1}{n-1} \sum_{i=1}^n (X_i- \bar X)^2$$

Estimator $\hat \mu$ is unbiased:

$$ \mathbb E \left[ \hat \mu \right] = \mathbb E \left[ \frac{1}{n} \sum_i X_i \right] = \frac{1}{n} \sum_i \mathbb E \left[ X_i\right] = \frac{1}{n} \sum_i \mu = \mu $$

Variance of the estimator $\hat \mu$

$$ \begin{align} var(\hat \mu) & := \mathbb E \left[ (\bar X - \mu)^2 \right] = \mathbb E \left[ \left(1/n \sum_i X_i - \mu\right)^2 \right] \\ & =\mathbb E \left[ \left(1/n \sum_i X_i\right)^2 - 2\mu /n \sum_i X_i + \mu ^2 \right] \\ & = \mathbb E \left[ \bar X^2 \right] - 2 \mu/n \sum_i \mathbb E \left[ X_i \right] + \mu^2\\ & = \mathbb E \left[ \bar X^2 \right] - 2 \mu \mu + \mu^2\\ & = \mathbb E \left[ \bar X^2 \right] - \mu^2\\ \end{align} $$

This derivation was analog to the derivation of the alternative definition of the variance.

For a more in depth discussion of the example and the bias and variance of the estimators $\hat \sigma_b^2$ and $\hat \sigma_u^2$, see

Mean squared error of an estimator

$$ mse(\hat \theta) = \mathbb E \left[ \left(\theta - \hat \theta ]\right)^2 \right] $$

Relation between mean squared error, bias and variance

$$\begin{align} mse(\hat \theta) &= \mathbb E \left[ \left(\theta - \hat \theta ]\right)^2 \right] \\ &= \mathbb E [ \theta^2] - 2 \mathbb E[\theta \hat \theta] + \mathbb E[\hat \theta^2]\\ &= \theta^2 - 2 \mathbb \theta \mathbb E [\hat \theta] + \mathbb E[\hat \theta^2] + (\mathbb E [\hat \theta])^2 - (\mathbb E [\hat \theta])^2\\ &= \left(\theta^2 - 2 \mathbb \theta \mathbb E [\hat \theta] + (\mathbb E [\hat \theta])^2 \right) + \left( \mathbb E[\hat \theta^2] - (\mathbb E [\hat \theta])^2 \right)\\ &=\left(\mathbb E\left[ \hat \theta \right] - \theta \right)^2 + var(\hat \theta) \\ &= \left(\text{bias}(\hat \theta)\right)^2 + var(\hat \theta) \end{align}$$

Sufficient Statistics

A statistic $S$ is sufficient if

$$ p(\hat \theta \mid \mathcal D) = p\left(\hat \theta \mid S(\mathcal D)\right) $$

For estimation of the parameters all information is "compressed" in the sufficient statistic.