Syllabus

Tentative Course Content

Introduction to RL - Multi-armed bandits - Policy Gradient Methods - Contextual Bandits - Finite Markov Decision Process - Dynamic Programming - Policy Iteration - Value Iteration - Monte Carlo Methods - Temporal Difference Learning - n-step bootstrapping - Eligibility Traces - Model-based RL - Planning - On-policy prediction with function approximation - on-policy control with function approximation - off-policy control with function approximation - Deep Reinforcement Learning - Hierarchical RL - POMDPs - inverse-RL - Exploration in RL - Offline RL.

Reference Materials

Lecture notes will be available from the course web page. The course is based on the following references.

[SB] Rich Sutton and Andrew Barto. Reinforcement Learning: An Introduction. Available free online.
[Szepesvari] Csaba Szepesvari. Algorithms for Reinforcement Learning. Available free online
[BT] Dimitri Berstekas and John Tsitsiklis. Neuro-Dynamic Programming