ChristianHerta

Reinforcement Learning¶

Markov Decision Processes and Dynamic Programming with example implementation for the Jacks Car Rental Problem (explicit construction of the Transition Kernel).
Monte Carlo Model Free Prediction and Control (Example: Open AI Blackjack Environment (OpenAi Gym) )
Temporal Difference Learning: TD(0): Policy Evaluation, Sarsa and Q-Learning on the Frozen Lake Environment (OpenAi Gym)
TD($\lambda$), Eligibility Traces
Function Approximation (for value functions and policies):
- Black Box Policy Optimization
- Policy Gradient Introduction