neuronTypes slides

Neuron types

Output units

  • The activation function of the neurons should correspond to the error function.

  • The important point is that we want to prevent saturation which results in very slow learning.

Task dependence

The output unit type is dependent on the task:

  • for classification:
    • sigmoid neuron (two classes)
    • softmax (multiple exclusive classes)
  • for regression:
    • linear (identity)

Example: Binary Classification

For a classifiation problem with two classes the output $o$ of a sigmoid unit predicts the probability of class 1:

$$ o(\vec x) = p(y=1 \mid \vec x; \theta) $$

The appropriate loss (function) for example $k$ is cross entropy:

$$ J^{(k)} (\vec \theta) = - t^{(k)} \log o(\vec x^{(k)}) - (1-t^{(k)}) \log (1-o(\vec x^{(k)})) $$

The exponentiation in the logistic function and the logarithm of the loss function cancels out. The derivation of $J^{(k)} (\vec \theta)$ for adapting the output weights $\theta^{(l)}$ will become:

$$ \frac{J^{(k)}(\vec \theta)}{\partial \vec \theta^{(l)}} \propto (o(\vec x^{(k)}) - t^{(k)}) $$

"natural pairing of error function and output unit activation function, which gives rise to this simple form for the derivative." [Bis95, p. 232]

Hidden Units

The "classical" neuron types for hidden units is the tanh (see e.g. [Le98]).

Recently other units are used, e.g.

  • Rectified Linear Units (relu) [Glo11]
  • Maxout [God13]

Sigmoid units

In [20]:
# the sigmoid function:
def sigmoid(z):
    return 1./(1 + np.exp(-z))

plot_func(sigmoid, "Sigmoid", (-.1, 1.1))

For small or large input values the derivation is nearly zero (saturation). Therefore learning with first order methods is in this range nearly impossible.


In [23]:
plot_func(np.tanh, "Tangens Hyperbolicus", (-1.1, 1.1))

Modification of tanh which don't saturate

proposed by Yann LeCun [Le89]

In [29]:
def tanh_mod(z, a = 0.02):
    return 1.7159 * np.tanh(2./3 * z) + a * z

plot_func(tanh_mod, "mod. tanh", (-2.1, 2.1))

Rectified Linear Units

A neuron type which has no saturation for positive input is the rectified linear unit [Glo11].

In [36]:
def linear_rectified(z):
    return np.maximum(0, z)

plot_func(linear_rectified, "ReLu", (-.1, 10.))

Leaky Rectified Linear Units

In [37]:
def leaky_linear_rectified(z, a = .01):
    return np.maximum(0, z) + a * np.minimum(0, z)

plot_func(linear_rectified, "Leaky ReLu", (-.2, 10.))


Softplus is a smooth variant of the linear rectified unit:

$$ a(z) = \log(1. + e^z) $$
In [35]:
def softplus(z):
    return np.log(1.+np.exp(z))

plot_func(softplus, "Softplus", (-.2, 10))


  • [Bis95] Bishop, Christopher M. Neural networks for pattern recognition. Oxford university press, 1995.
  • [Glo11] X. Glorot, "Deep Sparse Rectifier Neural Networks (2011)
  • [God13] Ian J. Goodfellow , David Warde-farley , Mehdi Mirza , Aaron Courville , Yoshua Bengio: Maxout Networks, ICML, 2013
  • [Le98] Y. LeCun, L. Bottou, G. Orr and K. Muller: Efficient BackProp, in Orr, G. and Muller K. (Eds), Neural Networks: Tricks of the trade, Springer, 1998