Deep-Learning-Overview slides

Deep Learning

an overview

Christian Herta, HTW Berlin

Talk slides and teaching material for deep learning at

"Traditional" machine learning

Engineering of Features:

"Traditional" approach for image and speech

Deep learning is feature learning

learning of representations

depth is the number of transformation steps.

"Machine Perception"

Read (high dimensional) data and transform them into a "higher" representation to perform tasks / reach goals.

High dimensional data
  • Images / Videos
  • Sound (Voice / Music)
  • Natural Language
  • Time Series

What kinds of representations?

(typical) representation:

vector (or sequence of vectors)

$$ {\vec h} = (h_1, h_2, \dots h_n) $$

(distributed representations)

(Simple) Feed Forward Neural Network


  • Transforming the input vector through many layers.
  • The hidden state of each layer corresponds to a representation of the input.

Layer of a simple feed forward neural network

Affine transformation

$$ \vec z = \hat W \cdot \vec x + \vec b $$

followed by an element-wise application of a non-linear function $\sigma (\dots)$

$$ \vec h = \sigma ( \vec z ) $$
Prediction is easy if we have good representations
  • e.g. for classification the representation of the last hidden layer are linear separable

Word Embeddings

representations for words (learned from sentences)


  • "low" dimensional space ($\sim 10^2$)
  • Syntactic and semantic information is encoded in the space (directions)
  • With simple vector arithmetics we can answer questions like
    • Man is is related to Woman like King to ?
    • Germany ($\vec G$) is related to Berlin ($\vec B$) like Ukraine ($\vec U$) to ?

The nearest word of $$ \vec U - \vec G + \vec B $$ is Kiew.

Feature Transformations

  • learning representations through many layers

Convolutional Neural Networks</h3>

typical neural network for image and video processing


Image Recognition

  • Classification of Images
  • ImageNet Dataset

Recurrent Neural Networks

  • for sequence data
  • have internal state which acts like a memory

e.g. Natural Language Processing:

  • RNN Language Models
    • represents sentences
    • can generate (new) unseen sentences

Language Model (RNN enrolled in time)


Generative Models

  • Generative Adversarial Networks
  • Variational Autoencoder

Encoder Decoder Models

Neural Machine Translation


Neural Machine Translation


(image from

Image Captions


  • Encoder: Transforming the image into a vector representation.
  • Decoder: Language model RNN transforms the vector representation into a sentence.

Image to Image Translation

by Conditional Generative Models:
e.g. [Conditional Adversarial Networks](
  • Input: The user draws a sketch
  • Output: A photorealistic picture is generated