## Estimators¶

#### Data¶

$$\mathcal D = \{ X_1, X_2, \dots X_n\}$$

Here we consider the data as random variables.

Data is iid (independent and identically distributed):

• independent: the outcome of one observation does not effect the outcome of another observation
• identically distributed: for all events

#### Definition of a statistic¶

A statistic is a random variable $S$ that is a function of the data $\mathcal D$, i.e. $$S = f \left(\mathcal D \right)$$

An estimator is a statistic attended to approximate an parameter governing the distribution of the data $\mathcal D$.

Notation:

1. $\hat \theta$ is an estimator of $\theta$

#### Example:¶

$$X_1, X_2, \dots X_n \sim \mathcal N(\mu, \sigma^2)$$

with

• Mean: $\mu = \mathbb E \left[ X\right]$
• Variance: $\sigma^2 = \mathbb E \left[ (X - \mu)^2 \right]$

Estimators:

• Sample mean is an estimator of the mean $\mu$:
$$\hat \mu = \bar X = \frac{1}{n} \sum_{i=1}^n X_i$$

• (Biased) sample variance is an estimator of the variance $\sigma^2$:
$$\hat \sigma_b^2 = \frac{1}{n}\sum_{i=1}^n (X_i- \bar X)^2$$

• (Unbiased) sample variance is an estimator of the variance $\sigma^2$:
$$\hat \sigma_u^2 = \frac{1}{n-1} \sum_{i=1}^n (X_i- \bar X)^2$$

#### Bias of an estimator¶

The bias of an estimator $\hat \theta$ is $$\text{bias}(\hat \theta) = \mathbb E\left[ \hat \theta \right] - \theta$$ Difference between mean and true value of $\theta$

An estimator is unbiased if the bias is zero ($\text{bias}(\hat \theta)= 0$)or $$\mathbb E\left[ \hat \theta \right] = \theta$$

Example $\hat \mu$ is unbiased:

$$\mathbb E \left[ \hat \mu \right] = \mathbb E \left[ \frac{1}{n} \sum_i X_i \right] = \frac{1}{n} \sum_i \mathbb E \left[ X_i\right] = \frac{1}{n} \sum_i \mu = \mu$$

Example $\sigma_b^2$:

Note: \begin{align} \hat \sigma_b^2 & = \frac{1}{n}\sum_{i=1}^n \left(X_i- \bar X\right)^2 = \frac{1}{n}\sum_{i=1}^n \left(X_i^2 - 2 X_i \bar X+ \bar X^2\right) = \frac{1}{n}\sum_{i=1}^n X_i^2 - \frac{1}{n}\sum_{i=1}^n \left( 2 X_i \bar X \right) + \bar X^2 \\ & = \frac{1}{n}\sum_{i=1}^n X_i^2 - 2 \bar X^2 + \bar X^2 = \frac{1}{n}\sum_{i=1}^n X_i^2 - \bar X^2 \end{align}

and the alternative definition of the variance: $$Var(X) = \sigma^2 = \mathbb E \left[ X^2 \right] - \mu^2$$

\begin{align} Var(\bar X) & = \mathbb E \left[ (\bar X - \mu)^2 \right] = \mathbb E \left[ \left(1/n \sum_i X_i - \mu\right)^2 \right] \\ & =\mathbb E \left[ \left(1/n \sum_i X_i\right)^2 - 2\mu /n \sum_i X_i + \mu ^2 \right] \\ & = \mathbb E \left[ \bar X^2 \right] - 2 \mu/n \sum_i \mathbb E \left[ X_i \right] + \mu^2\\ & = \mathbb E \left[ \bar X^2 \right] - 2 \mu \mu + \mu^2\\ & = \mathbb E \left[ \bar X^2 \right] - \mu^2\\ \end{align}

and $$Var(\bar X) = \frac{\sigma^2}{n}$$

\begin{align} \mathbb E \left[ \hat \sigma_b^2 \right] & = \mathbb E \left[ \frac{1}{n}\sum_{i=1}^n X_i^2 - \bar X^2 \right] = \mathbb E \left[ \frac{1}{n}\sum_{i=1}^n X_i^2 \right] - \mathbb E \left[ \bar X^2 \right] \\ &= ... \end{align}

see

#### Variance of an estimator¶

$$var(\hat \theta) = \mathbb E \left[\left(\mathbb E [\hat \theta] - \hat \theta\right)^2\right]$$

or

\begin{align} var(\hat \theta) &= \mathbb E \left[\left(\mathbb E [\hat \theta] - \hat \theta\right)^2\right] \\ &= \mathbb E \left[(\mathbb E [\hat \theta])^2 -2\mathbb E [\hat \theta] \hat \theta+ \hat \theta^2\right] \\ &= (\mathbb E [\hat \theta])^2 - 2 \mathbb E [\hat \theta]\mathbb E [\hat \theta] + \mathbb E [\hat \theta^2] \\ &= \mathbb E [\hat \theta^2] - (\mathbb E [\hat \theta])^2 \end{align}

#### Mean squared error of an estimator¶

$$mse(\hat \theta) = \mathbb E \left[ \left(\theta - \hat \theta ]\right)^2 \right]$$

#### Relation between mean squared error, bias and variance¶

\begin{align} mse(\hat \theta) &= \mathbb E \left[ \left(\theta - \hat \theta ]\right)^2 \right] \\ &= \mathbb E [ \theta^2] - 2 \mathbb E[\theta \hat \theta] + \mathbb E[\hat \theta^2]\\ &= \theta^2 - 2 \mathbb \theta \mathbb E [\hat \theta] + \mathbb E[\hat \theta^2] + (\mathbb E [\hat \theta])^2 - (\mathbb E [\hat \theta])^2\\ &= \left(\theta^2 - 2 \mathbb \theta \mathbb E [\hat \theta] + (\mathbb E [\hat \theta])^2 \right) + \left( \mathbb E[\hat \theta^2] - (\mathbb E [\hat \theta])^2 \right)\\ &=\left(\mathbb E\left[ \hat \theta \right] - \theta \right)^2 + var(\hat \theta) \\ &= \left(\text{bias}(\hat \theta)\right)^2 + var(\hat \theta) \end{align}

#### Sufficient Statistics¶

A statistic $S$ is sufficient if

$$p(\hat \theta \mid \mathcal D) = p\left(\hat \theta \mid S(\mathcal D)\right)$$

For estimation of the parameters all information is "compressed" in the sufficient statistic.