feed-forward-neural-network-theano slides

## Feed Forward Neural Networks¶

#### with Theano¶

In [2]:
from IPython.display import Image
Image(filename='./pics/Deeplearning 2.png')

Out[2]:

For each layer:

Input: $\vec x^T = (x_1, x_2, \dots x_n)$

Output of the j-th neuron: $$h_j = \sigma(\sum_{i=1}^n w_{ij} x_i + b_j)$$

In matrix form ($W$ is a matrix):

$$\vec h = \sigma(\vec x \cdot W + \vec b)$$
• element-wise application of the activation function $\sigma$.

#### Train data - not linear separable¶

In [8]:
plot_train_data(X_train[t_train==0], X_train[t_train==1])


#### Feature Transformation¶

$$\phi_1(\vec x) = x_1^2$$$$\phi_2(\vec x) = x_2^2$$

In the new feature space the data is linear-separable:

In [10]:
phi_train = X_train**2
plot_train_transformed(phi_train[t_train==0], phi_train[t_train==1])


#### Learning of the feature transformation by neural networks¶

• The activity vector of the "Hidden-Layers" can be interpreted as a transformation of the input vector.
In [14]:
def logistic_function(x):
return 1./(1. + T.exp(-x))

In [15]:
def relu(x):
return T.switch(x<0, 0, x)


#### Feed Forward Neural Network with a hidden layer¶

Aktivity vector of the first hidden layer:

$$\vec h^{(1)} = \sigma_1 \left(\vec x \cdot W^{(1)} + \vec b^{(1)} \right)$$

Activity of the output $\vec o$ (with only one output $o$ is a scalar):

$$\vec o = \vec h^{(2)}= \sigma_2 \left( \vec h^{(1)} \cdot W^{(2)} + \vec b^{(2)} \right)$$
In [16]:
# (first) hidden layer
a = T.dot(X, W_h) + b_h

# activity function "rectified linear units"
h = relu(a)

# output neuron:
y = logistic_function(T.dot(h, W_o) + b_o)


#### cost function and l2-regularization¶

In [49]:
#TODO sum is only used for casting a vector with one element to a scalar!
cross_entropy = T.sum(-(T.dot(target, T.log(y)) + T.dot((1.-target), T.log(1.-y))))

l2_reg = T.mean(T.sqr(W_h)) + T.mean(T.sqr(W_o))

lambda_ = 0.02
cost = cross_entropy + lambda_ * l2_reg

In [52]:
cost_func = theano.function(inputs=[X, target], outputs=[cost])

In [54]:
cost_func(X_train, t_train)

Out[54]:
[array(44.90793435302534)]
In [55]:
def get_train_functions(cost, v, target, learning_rate=0.01):
gparams = []
for param in params:
gparams.append(gparam)

for param, gparam in zip(params, gparams):
updates.append((param, param - gparam * learning_rate))

learn_fn = theano.function(inputs = [v, target],
outputs = cost,
return learn_fn

learn_fn = get_train_functions(cost, X, target)

In [56]:
nb_epochs=5000
train_errors = np.ndarray(nb_epochs)

for x in range(nb_epochs):
train_errors[x] = learn_fn(X_train, t_train)

train_errors

Out[56]:
array([ 44.90793435,  44.81909323,  44.74636265, ...,   2.94757535,
2.9474241 ,   2.94727294])
In [57]:
plt.plot(range(nb_epochs), train_errors, '-b')
plt.xlabel('Iterations')
plt.ylabel('Cost')

Out[57]:
<matplotlib.text.Text at 0x10bcf7c90>

#### Decision Boundary¶

In [62]:
plot_contour(X_train[t_train==0], X_train[t_train==1], 'train data')

In [63]:
plot_contour(X_test[t_test==0], X_test[t_test==1], 'test data')