Open question:
Non-training data daten (out-of-sample}) for approximation of the out-of-sample error $E_{out}$.
$$ \mathbb{E}[loss(h({\bf x}), y)] = E_{out}(h) $$Split the $m$ labeled data $\mathcal{D}$ in
So we have $m_{train} = m-m_{val}$ data for training of the parameters.
with the symbols for
no correlations of the losses of the validation data (data is iid) $\rightarrow$
the covariances are all zero (with the Konecker-Delta $\delta_{ij}$)
Tradeoff of using the data $\mathcal{D}$ for validation or training ($m = m_{train} + m_{val}$)
Rule of thumb: approx. 20% of the data for validation (if you have enough data).
Training and validation
Training with all data $m$ $\rightarrow h$
Different models are:
Different Hypotheses $h_1^-, h_2^-, \dots, h_M^-$: $|\mathcal{H}_{val}| = M$
Selection of the Hypothese $h_{m^*}^-$ with the lowest value for $E_{val}$:
$$ E_{out}(h^-_{m^*}) \leq E_{val}(h^-_{m^*}) + \mathcal{O}\left(\sqrt{\frac{\ln M}{m_{val}}}\right) $$
Learning with $m_{train}$ training examples (see learning curves): $$ E_{out}(h_{m^*}) \leq E_{out}(h_{m^*}^-) \leq E_{val}(h_{m^*}) + \mathcal{O}\left(\sqrt{\frac{\ln M}{m_{val}}}\right) $$
Goal: Model selection
Instead of training of a modell $h^-$ with $\mathcal{D}_{train}$ and validation with $\mathcal{D}_{val}$:
Do this procedure $v$ times: v-fold cross validation ($v$-iterations)
Extrem case one-leave-out: $v=m_{train}$, i.e. validation with only a training example each.