neuronTypes slides

Neuron types¶

Output units¶

The activation function of the neurons should correspond to the error function.
The important point is that we want to prevent saturation which results in very slow learning.

Task dependence¶

The output unit type is dependent on the task:

for classification:
- sigmoid neuron (two classes)
- softmax (multiple exclusive classes)
for regression:
- linear (identity)

Example: Binary Classification¶

For a classifiation problem with two classes the output $o$ of a sigmoid unit predicts the probability of class 1:

$$ o(\vec x) = p(y=1 \mid \vec x; \theta) $$

The appropriate loss (function) for example $k$ is cross entropy:

$$ J^{(k)} (\vec \theta) = - t^{(k)} \log o(\vec x^{(k)}) - (1-t^{(k)}) \log (1-o(\vec x^{(k)})) $$

The exponentiation in the logistic function and the logarithm of the loss function cancels out. The derivation of $J^{(k)} (\vec \theta)$ for adapting the output weights $\theta^{(l)}$ will become:

$$ \frac{J^{(k)}(\vec \theta)}{\partial \vec \theta^{(l)}} \propto (o(\vec x^{(k)}) - t^{(k)}) $$

"natural pairing of error function and output unit activation function, which gives rise to this simple form for the derivative." [Bis95, p. 232]

Hidden Units¶

The "classical" neuron types for hidden units is the tanh (see e.g. [Le98]).

Recently other units are used, e.g.

Rectified Linear Units (relu) [Glo11]
Maxout [God13]

Sigmoid units¶

In [20]:

# the sigmoid function:
def sigmoid(z):
    return 1./(1 + np.exp(-z))

plot_func(sigmoid, "Sigmoid", (-.1, 1.1))

For small or large input values the derivation is nearly zero (saturation). Therefore learning with first order methods is in this range nearly impossible.

Tanh¶

In [23]:

plot_func(np.tanh, "Tangens Hyperbolicus", (-1.1, 1.1))

Modification of tanh which don't saturate¶

proposed by Yann LeCun [Le89]

In [29]:

def tanh_mod(z, a = 0.02):
    return 1.7159 * np.tanh(2./3 * z) + a * z

plot_func(tanh_mod, "mod. tanh", (-2.1, 2.1))

Rectified Linear Units¶

A neuron type which has no saturation for positive input is the rectified linear unit [Glo11].

In [36]:

def linear_rectified(z):
    return np.maximum(0, z)

plot_func(linear_rectified, "ReLu", (-.1, 10.))

Leaky Rectified Linear Units¶

In [37]:

def leaky_linear_rectified(z, a = .01):
    return np.maximum(0, z) + a * np.minimum(0, z)


plot_func(linear_rectified, "Leaky ReLu", (-.2, 10.))

Softplus¶

Softplus is a smooth variant of the linear rectified unit:

$$ a(z) = \log(1. + e^z) $$

In [35]:

def softplus(z):
    return np.log(1.+np.exp(z))

plot_func(softplus, "Softplus", (-.2, 10))

Literature¶

[Bis95] Bishop, Christopher M. Neural networks for pattern recognition. Oxford university press, 1995.
[Glo11] X. Glorot, "Deep Sparse Rectifier Neural Networks (2011)
[God13] Ian J. Goodfellow , David Warde-farley , Mehdi Mirza , Aaron Courville , Yoshua Bengio: Maxout Networks, ICML, 2013
[Le98] Y. LeCun, L. Bottou, G. Orr and K. Muller: Efficient BackProp, in Orr, G. and Muller K. (Eds), Neural Networks: Tricks of the trade, Springer, 1998

≈‚