Syllabus
Tentative Course Content
Introduction to RL - Multi-armed bandits - Policy Gradient Methods - Contextual Bandits - Finite Markov Decision Process - Dynamic Programming - Policy Iteration - Value Iteration - Monte Carlo Methods - Temporal Difference Learning - n-step bootstrapping - Eligibility Traces - Model-based RL - Planning - On-policy prediction with function approximation - on-policy control with function approximation - off-policy control with function approximation - Deep Reinforcement Learning - Hierarchical RL - POMDPs - inverse-RL - Exploration in RL - Offline RL.
Reference Materials
Lecture notes will be available from the course web page. The course is based on the following references.
- [SB] Rich Sutton and Andrew Barto. Reinforcement Learning: An Introduction. Available free online.
- [Szepesvari] Csaba Szepesvari. Algorithms for Reinforcement Learning. Available free online
- [BT] Dimitri Berstekas and John Tsitsiklis. Neuro-Dynamic Programming