Estimators

Data

$$ \mathcal D = \{ X_1, X_2, \dots X_n\} $$

Here we consider the data as random variables.

Data is iid (independent and identically distributed):

  • independent: the outcome of one observation does not effect the outcome of another observation
  • identically distributed: for all events

Definition of a statistic

A statistic is a random variable $S$ that is a function of the data $\mathcal D$, i.e. $$ S = f \left(\mathcal D \right) $$

An estimator is a statistic attended to approximate an parameter governing the distribution of the data $\mathcal D$.

Notation:

  1. $\hat \theta$ is an estimator of $\theta$

Example:

$$ X_1, X_2, \dots X_n \sim \mathcal N(\mu, \sigma^2) $$

with

  • Mean: $\mu = \mathbb E \left[ X\right]$
  • Variance: $\sigma^2 = \mathbb E \left[ (X - \mu)^2 \right]$

Estimators:

  • Sample mean is an estimator of the mean $\mu$:
    $$\hat \mu = \bar X = \frac{1}{n} \sum_{i=1}^n X_i$$

  • (Biased) sample variance is an estimator of the variance $\sigma^2$:
    $$\hat \sigma_b^2 = \frac{1}{n}\sum_{i=1}^n (X_i- \bar X)^2$$

  • (Unbiased) sample variance is an estimator of the variance $\sigma^2$:
    $$\hat \sigma_u^2 = \frac{1}{n-1} \sum_{i=1}^n (X_i- \bar X)^2$$

Bias of an estimator

The bias of an estimator $\hat \theta$ is $$ \text{bias}(\hat \theta) = \mathbb E\left[ \hat \theta \right] - \theta $$ Difference between mean and true value of $\theta$

An estimator is unbiased if the bias is zero ($\text{bias}(\hat \theta)= 0$)or $$ \mathbb E\left[ \hat \theta \right] = \theta $$

Example $\hat \mu$ is unbiased:

$$ \mathbb E \left[ \hat \mu \right] = \mathbb E \left[ \frac{1}{n} \sum_i X_i \right] = \frac{1}{n} \sum_i \mathbb E \left[ X_i\right] = \frac{1}{n} \sum_i \mu = \mu $$

Example $\sigma_b^2$:

Note: $$ \begin{align} \hat \sigma_b^2 & = \frac{1}{n}\sum_{i=1}^n \left(X_i- \bar X\right)^2 = \frac{1}{n}\sum_{i=1}^n \left(X_i^2 - 2 X_i \bar X+ \bar X^2\right) = \frac{1}{n}\sum_{i=1}^n X_i^2 - \frac{1}{n}\sum_{i=1}^n \left( 2 X_i \bar X \right) + \bar X^2 \\ & = \frac{1}{n}\sum_{i=1}^n X_i^2 - 2 \bar X^2 + \bar X^2 = \frac{1}{n}\sum_{i=1}^n X_i^2 - \bar X^2 \end{align} $$

and the alternative definition of the variance: $$ Var(X) = \sigma^2 = \mathbb E \left[ X^2 \right] - \mu^2 $$

$$ \begin{align} Var(\bar X) & = \mathbb E \left[ (\bar X - \mu)^2 \right] = \mathbb E \left[ \left(1/n \sum_i X_i - \mu\right)^2 \right] \\ & =\mathbb E \left[ \left(1/n \sum_i X_i\right)^2 - 2\mu /n \sum_i X_i + \mu ^2 \right] \\ & = \mathbb E \left[ \bar X^2 \right] - 2 \mu/n \sum_i \mathbb E \left[ X_i \right] + \mu^2\\ & = \mathbb E \left[ \bar X^2 \right] - 2 \mu \mu + \mu^2\\ & = \mathbb E \left[ \bar X^2 \right] - \mu^2\\ \end{align} $$

and $$ Var(\bar X) = \frac{\sigma^2}{n} $$

$$ \begin{align} \mathbb E \left[ \hat \sigma_b^2 \right] & = \mathbb E \left[ \frac{1}{n}\sum_{i=1}^n X_i^2 - \bar X^2 \right] = \mathbb E \left[ \frac{1}{n}\sum_{i=1}^n X_i^2 \right] - \mathbb E \left[ \bar X^2 \right] \\ &= ... \end{align} $$

see

Variance of an estimator

$$ var(\hat \theta) = \mathbb E \left[\left(\mathbb E [\hat \theta] - \hat \theta\right)^2\right] $$

or

$$\begin{align} var(\hat \theta) &= \mathbb E \left[\left(\mathbb E [\hat \theta] - \hat \theta\right)^2\right] \\ &= \mathbb E \left[(\mathbb E [\hat \theta])^2 -2\mathbb E [\hat \theta] \hat \theta+ \hat \theta^2\right] \\ &= (\mathbb E [\hat \theta])^2 - 2 \mathbb E [\hat \theta]\mathbb E [\hat \theta] + \mathbb E [\hat \theta^2] \\ &= \mathbb E [\hat \theta^2] - (\mathbb E [\hat \theta])^2 \end{align}$$

Mean squared error of an estimator

$$ mse(\hat \theta) = \mathbb E \left[ \left(\theta - \hat \theta ]\right)^2 \right] $$

Relation between mean squared error, bias and variance

$$\begin{align} mse(\hat \theta) &= \mathbb E \left[ \left(\theta - \hat \theta ]\right)^2 \right] \\ &= \mathbb E [ \theta^2] - 2 \mathbb E[\theta \hat \theta] + \mathbb E[\hat \theta^2]\\ &= \theta^2 - 2 \mathbb \theta \mathbb E [\hat \theta] + \mathbb E[\hat \theta^2] + (\mathbb E [\hat \theta])^2 - (\mathbb E [\hat \theta])^2\\ &= \left(\theta^2 - 2 \mathbb \theta \mathbb E [\hat \theta] + (\mathbb E [\hat \theta])^2 \right) + \left( \mathbb E[\hat \theta^2] - (\mathbb E [\hat \theta])^2 \right)\\ &=\left(\mathbb E\left[ \hat \theta \right] - \theta \right)^2 + var(\hat \theta) \\ &= \left(\text{bias}(\hat \theta)\right)^2 + var(\hat \theta) \end{align}$$

Sufficient Statistics

A statistic $S$ is sufficient if

$$ p(\hat \theta \mid \mathcal D) = p\left(\hat \theta \mid S(\mathcal D)\right) $$

For estimation of the parameters all information is "compressed" in the sufficient statistic.