$$ \mathcal D = \{ X_1, X_2, \dots X_n\} $$
Here we consider the data as random variables (frequentist view).
Also, the Data is iid (independent and identically distributed):
A statistic is a random variable $S$ that is a function of the data $\mathcal D$, i.e. $$ S = f \left(\mathcal D \right) $$
An estimator is a statistic attended to approximate a parameter governing the distribution of the data $\mathcal D$.
Notation: $\hat \theta$ is an estimator of $\theta$
The bias of an estimator $\hat \theta$ is $$ \text{bias}(\hat \theta) := \mathbb E\left[ \hat \theta \right] - \theta $$ Difference between mean and true value of $\theta$.
The expectation is according to different data sets $\mathcal D$.
An estimator is unbiased if the bias is zero, i.e. $$\text{bias}(\hat \theta)= 0$$ or $$ \mathbb E\left[ \hat \theta \right] = \theta $$
$$ var(\hat \theta) = \mathbb E \left[\left(\mathbb E [\hat \theta] - \hat \theta\right)^2\right] $$
or
\begin{align} var(\hat \theta) &= \mathbb E \left[\left(\mathbb E [\hat \theta] - \hat \theta\right)^2\right] \\ &= \mathbb E \left[(\mathbb E [\hat \theta])^2 -2\mathbb E [\hat \theta] \hat \theta+ \hat \theta^2\right] \\ &= (\mathbb E [\hat \theta])^2 - 2 \mathbb E [\hat \theta]\mathbb E [\hat \theta] + \mathbb E [\hat \theta^2] \\ &= \mathbb E [\hat \theta^2] - (\mathbb E [\hat \theta])^2 \end{align}
Data generating distribution is a univariate Gaussian:
$$ X_i \sim \mathcal N(\mu, \sigma^2) $$
The Gaussian has two parameters which we want to estimate from the data:
Note that the technical term variance is used here for two different concepts
We can use the following estimators of $\mu$ and $\sigma^2$:
Sample mean is an estimator of the mean $\mu$:
$$\hat \mu = \frac{1}{n} \sum_{i=1}^n X_i =: \bar X $$
(Biased) sample variance is an estimator of the variance $\sigma^2$:
$$\hat \sigma_b^2 = \frac{1}{n}\sum_{i=1}^n (X_i- \bar X)^2$$
(Unbiased) sample variance is an estimator of the variance $\sigma^2$:
$$\hat \sigma_u^2 = \frac{1}{n-1} \sum_{i=1}^n (X_i- \bar X)^2$$
Estimator $\hat \mu$ is unbiased:
$$ \mathbb E \left[ \hat \mu \right] = \mathbb E \left[ \frac{1}{n} \sum_i X_i \right] = \frac{1}{n} \sum_i \mathbb E \left[ X_i\right] = \frac{1}{n} \sum_i \mu = \mu $$
Variance of the estimator $\hat \mu$
$$ \begin{align} var(\hat \mu) & := \mathbb E \left[ (\bar X - \mu)^2 \right] = \mathbb E \left[ \left(1/n \sum_i X_i - \mu\right)^2 \right] \\ & =\mathbb E \left[ \left(1/n \sum_i X_i\right)^2 - 2\mu /n \sum_i X_i + \mu ^2 \right] \\ & = \mathbb E \left[ \bar X^2 \right] - 2 \mu/n \sum_i \mathbb E \left[ X_i \right] + \mu^2\\ & = \mathbb E \left[ \bar X^2 \right] - 2 \mu \mu + \mu^2\\ & = \mathbb E \left[ \bar X^2 \right] - \mu^2\\ \end{align} $$
This derivation was analog to the derivation of the alternative definition of the variance.
For a more in depth discussion of the example and the bias and variance of the estimators $\hat \sigma_b^2$ and $\hat \sigma_u^2$, see
$$ mse(\hat \theta) = \mathbb E \left[ \left(\theta - \hat \theta ]\right)^2 \right] $$
$$\begin{align} mse(\hat \theta) &= \mathbb E \left[ \left(\theta - \hat \theta ]\right)^2 \right] \\ &= \mathbb E [ \theta^2] - 2 \mathbb E[\theta \hat \theta] + \mathbb E[\hat \theta^2]\\ &= \theta^2 - 2 \mathbb \theta \mathbb E [\hat \theta] + \mathbb E[\hat \theta^2] + (\mathbb E [\hat \theta])^2 - (\mathbb E [\hat \theta])^2\\ &= \left(\theta^2 - 2 \mathbb \theta \mathbb E [\hat \theta] + (\mathbb E [\hat \theta])^2 \right) + \left( \mathbb E[\hat \theta^2] - (\mathbb E [\hat \theta])^2 \right)\\ &=\left(\mathbb E\left[ \hat \theta \right] - \theta \right)^2 + var(\hat \theta) \\ &= \left(\text{bias}(\hat \theta)\right)^2 + var(\hat \theta) \end{align}$$
A statistic $S$ is sufficient if
$$ p(\hat \theta \mid \mathcal D) = p\left(\hat \theta \mid S(\mathcal D)\right) $$
For estimation of the parameters all information is "compressed" in the sufficient statistic.