$m$-observations: {$ x^{(i)}$} with target values {$y^{(i)} \in \mathbb R$}
Goal: prediction of $y$ for a new $x$.
Linear Model
Note: There are two parameters: $\theta_0,\theta_1$ (parametric model)
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
###blood haemoglobin (Hb) levels and packed cell volumes (PCV) of 14 female blood bank donors
blood = np.array([
[15.5,0.450],
[13.6,0.420],
[13.5,0.440],
[13.0,0.395],
[13.3,0.395],
[12.4,0.370],
[11.1,0.390],
[13.1,0.400],
[16.1,0.445],
[16.4,0.470],
[13.4,0.390],
[13.2,0.400],
[14.3,0.420],
[16.1,0.450]])
x = blood[:,0]
y = blood[:,1]
from numpy import arange, array, ones, linalg
A = array([ x, ones(len(x))])
theta = linalg.lstsq(A.T,y)[0] # obtaining the parameters
line = theta[0]*x+theta[1] # regression line
#plt.plot(x,line,'r-',x,y,'o')
plt.scatter(x,y, label="Experimental Values")
plt.xlabel("blood haemoglobin(Hb) levels / g/dL")
plt.ylabel("packed cell volumes (PCV) / ?")
plt.legend()
plt.show()
Find a line $h_{\Theta}(x)$ that goes "as near as possible" through the data.
So we are looking for values of the parameters $\Theta = \{\theta_0, \theta_1\}$
A quantification for "as near as possible" is given later. Any ideas?
plt.scatter(x,y, label="Experimental Values")
plt.plot(x,line,'r-', label='Fit with $\Theta_0 = $' + str(round(theta[1], 3)) +
" $\Theta_1 = $" + str(round(theta[0],4)) )
plt.xlabel("blood haemoglobin(Hb) levels / g/dL")
plt.ylabel("packed cell volumes (PCV)")
plt.legend()
plt.show()
haemoglobin level ($x$) | packed cell volume ($y$) |
---|---|
15.5 | 0.450 |
13.6 | 0.420 |
13.5 | 0.440 |
13.0 | 0.395 |
$\dots$ | $\dots$ |
from IPython.display import Image
Image("../univariateLinearRegression/pics/trainings-procedure.png")
Why the name ``Linear regression with one variable''?
Prediction of a floating point number: Regression
(Hypothesis is linear with respect to the variable $x$ - not the reason for linear regression)
Note:
Linear regression can be solved algebraically (with a pseudo inverse). This is not considered here.
Remember we are looking for a strait line that goes "as near as possible" through the data.
We give each possible strait line a value which quantifies the quality.
Typical cost function for regression (can be derived from the "Maximum Likelihood Principle".)
Note:
We can reflect this fact in the implementation:
def get_linear_hypothesis(theta_0, theta_1):
return lambda x: theta_0 + theta_1 * x
def get_squared_error_cost_function(x, y, get_hypothesis):
assert(len(x)==len(y))
m = len(x)
return lambda theta_0, theta_1: 1. / (2. * m) * ((get_hypothesis(theta_0, theta_1)(x) - y)**2).sum()
no intersect ($\theta_0$), i.e. hypothesis is $ h_{\theta_1}(x) = \theta_1 \cdot x $
training data: $\mathcal D = \{ (1, .5), (2, 3), (3, 4) \}$
x = np.array([0.,1., 3.])
y = np.array([0, 2., 4.])
costs = np.zeros([len(np.arange(0.,3.,0.1))])
thetas_1 = np.arange(0.,3.,0.1)
cost_function = get_squared_error_cost_function(x, y, get_linear_hypothesis)
for i, theta_1 in enumerate(thetas_1):
costs[i] = cost_function(0., theta_1)
f, axarr = plt.subplots(1, 2, figsize=(8,4))
axarr[0].scatter(x,y); axarr[0].set_title('Data'); axarr[0].set_xlabel("x"); axarr[0].set_ylabel("y")
axarr[1].plot(thetas_1, costs); axarr[1].set_title('Cost');axarr[1].set_xlabel("$\Theta_1$"); axarr[1].set_ylabel("cost")
x = y = np.arange(-1.0, 1.0, 0.1)
y = 3. + 2. * x + np.random.randn(len(y)) * 0.2
thetas = [[2., 1.5, 'c'], [2.5,3.,'k'], [3., 2., 'g']]
#computation of the cost-grid
w0s = np.arange(1.5,4.5,0.02)
w1s = np.arange(.5,3.5,0.02)
cost = np.zeros([len(w0s),len(w1s)])
cost_function = get_squared_error_cost_function(x, y, get_linear_hypothesis)
for i, theta_0 in enumerate(w0s):
for j, theta_1 in enumerate(w1s):
cost[i][j] = cost_function(theta_0, theta_1)
#contour-plot
X, Y = np.meshgrid(w0s, w1s)
plt.subplot(121);plt.plot(x,y,'+r');plt.xlabel('x');plt.ylabel('y');plt.title('Data and Hypotheses')
for t in thetas:
h = get_linear_hypothesis(t[0], t[1])(x)
plt.plot(x,h,'-'+t[2])
plt.subplot(122);plt.contour(X, Y, cost);plt.xlabel('$\Theta_0$');plt.ylabel('$\Theta_1$');plt.title('Contour Plot')
for t in thetas:
plt.plot(t[0],t[1],'o'+ t[2])
plt.tight_layout()
#3d-plot
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.cm as cm
vmin = cost.min()
vmax = cost.max()
fig = plt.figure(figsize=(10,6))
ax = fig.add_subplot(122, projection='3d')
ax.plot_surface(X, Y, cost, cmap=cm.jet, rstride=5, cstride=5, vmax=vmax, vmin = vmin, antialiased=True )
ax.set_xlabel('$\Theta_0$');ax.set_ylabel('$\Theta_1$');ax.set_zlabel('cost');ax.set_title('3D-Plot')
Note:
The cost function for linear regression is convex, i.e. only one (global) minimum.
The gradient of a scalar function (e.g. cost function) is:
$$ \begin{align*} \vec \nabla J(\theta_0, \theta_1) = \begin{pmatrix} \frac{\partial }{\partial \theta_0} \\ \frac{\partial }{\partial \theta_1} \end{pmatrix} J(\theta_0, \theta_1) = \begin{pmatrix} \frac{\partial J(\theta_0, \theta_1)}{\partial \theta_0} \\ \frac{\partial J(\theta_0, \theta_1)}{\partial \theta_1} \end{pmatrix} \end{align*} $$mit dem Nabla-Operator $$ \begin{align*} \vec \nabla = \begin{pmatrix} \frac{\partial }{\partial \theta_0} \\ \frac{\partial }{\partial \theta_1} \end{pmatrix} \end{align*} $$
Recap: Problem formulation
Goal: $\text{minimize}_{\Theta} J(\Theta)$
Note for the implementation: simultaneous update of all parameters
\begin{align*} temp0 &\leftarrow \theta_0 - \alpha \frac{\partial}{\partial \theta_0} J(\theta_0, \theta_1) \\ temp1 &\leftarrow \theta_1 - \alpha \frac{\partial}{\partial \theta_1} J(\theta_0, \theta_1) \\ \theta_0 &\leftarrow temp0 \\ \theta_1 &\leftarrow temp1 \\ \end{align*}\begin{align} \frac{\partial}{\partial \theta1} J(\Theta) &= \frac{\partial}{\partial \theta_1} \frac{1}{2m} \sum{i=1}^m (h\Theta(x^{(i)})-y^{(i)})^2 \ & = \frac{\partial}{\partial \theta_1} \frac{1}{2m} \sum{i=1}^m (\theta0 + \theta_1 \cdot x^{(i)} - y^{(i)}) ^2 \ & = \frac{1}{m} \sum{i=1}^m (\theta_0 + \theta_1 \cdot x^{(i)} - y^{(i)}) \cdot x^{(i)} \end{align}
step size depends on two factors:
$\alpha$ must be choosen carefully.
with python and numpy
Use the cost function and the linear hypothesis (see above).
theta_0, theta_1 = compute_new_theta(x, y, theta_0, theta_1, alpha)
Write a python function that given as input start values for $\theta_0$ and $\theta_1$ iteratively applies the the `compute_new_theta' function to find "good" $\Theta$ values (in a python function).
Plot the best hypothesis (strait line) together with the data.
Try different values for the hyper parameter $\alpha$ and plot for all the progress (cost value over iterations) in a graph.
# generation of synthetic train data
x_min = -10.
x_max = 10.
m = 10
x = np.random.uniform(x_min, x_max, m)
a = 10.
b = 5.
y_noise_sigma = 4.
y = a + b * x + np.random.randn(m) * y_noise_sigma
plt.plot(x, y, "bo")
plt.xlabel("x")
plt.ylabel("y")