Known :
Polynom of grade 4: $$ h_\theta(x) = \theta_0 + \theta_1 x + \theta_2 x^2 + \theta_3 x^3 + \theta_4 x^4 $$
Constraint $\theta_3 = \theta_4 = 0$ should lower the model complexity.
Constraint: $|\theta_3| \leq \epsilon$ and $|\theta_4| \leq \epsilon$ with small $\epsilon$ $\rightarrow$ model complexity should only has increased a little bit.
Cost funktion for polynom of degree 4 with constraint: small values for $\theta_3$ and $\theta_4$:
$$ J(\theta) = \frac{1}{2m} \sum_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)})^2 + \lambda \theta_3^2 + \lambda \theta_4^2 $$
with large hyperparameter $\lambda$.
\begin{align*} J(\theta) = &\frac{1}{m} \left[ \sum_{i=1}^{m} loss(h_\theta(x^{(i)}),y^{(i)}) + \frac{\lambda}{2} \sum_{j=1}^n \theta_j^2 \right] %// = & \frac{1}{2m} \left[ \sum_{i=1}^{m} loss(h_\theta(x^{(i)}),y^{(i)}) %+ \frac{\lambda}{2} \vec \theta^T \vec \theta \right] \end{align*} with
Instead of minimization of $J_{train}$ (training error) we minimize an augmented error}: $$ J_{aug} = J_{train}(\theta) + \frac{\lambda}{m}\Omega(\theta) = J_{train}(\theta) + overfit penalty $$
Cost function $$ J(\theta) = \frac{1}{m} \left[ \sum_{i=1}^{m} loss(h_\theta(x^{(i)}), y^{(i)}) + \frac{\lambda}{2} \sum_{j=1}^n \theta_j^2 \right] $$
with the Update Rule
$$ \theta_j \leftarrow \theta_j - \alpha \frac{\partial}{\partial \theta_j} J(\theta) $$
for $j=0$ (no change) $$ \theta_j \leftarrow \theta_j - \alpha \frac{1}{m} \sum_{i=1}^m (h_\theta(\vec x^{(i)}) - y^{(i)}) x_0^{(i)} $$ for $j \neq 0$ $$ \theta_j \leftarrow \theta_j - \alpha \left[ \frac{1}{m} \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)}) x_j^{(i)} + \frac{\lambda}{m} \theta_j \right] $$
Transformation of the update rule for $j \neq 0$ results in $$ \theta_j \leftarrow \theta_j (1-\alpha \frac{\lambda}{m}) - \frac{\alpha}{m} \sum_{i=1}^m (h_\theta(\vec x^{(i)}) - y^{(i)}) x_j^{(i)} $$
In comparision with the update rule without regularization we see that $\theta_j$ is multiplied with an weight decay factor: $$ (1-\alpha \frac{\lambda}{m}) < 1 $$
(from [Abu])
Performance of the uniform regularizer at differnt levels of stochastic noise $\sigma$. Both target and model are polynominals of order 15.
Extend your "linear and logistic regression" implementation with regularization.
get_cost_function(loss, lambda_reg)