Tentative Course Content

Introduction to RL - Sequential Decision Problems - Supervised Learning - Online Learning - Function Approximation - Immediate RL - Multi-armed bandits - Contextual Bandits - Monte-Carlo Methods - Markov Decision Process - Dynamic Programming - Policy Iteration - Value Iteration - Temporal Difference Learning - Sarsa - Q-Learning - DQN - n-step bootstrapping - Eligibility Traces - Policy Gradient Methods - Reinforce - Actor-Critic - A3C - SAC - Deterministic Policy Gradient - DDPG - TD3 - Natural Policy Gradient - Trust Region Methods - Model-based RL - Hierarchical RL - Frontiers in RL.

Reference Materials

There is no primary textbook for this course. Here is a list of relevant references:

  1. [SB] Rich Sutton and Andrew Barto. Reinforcement Learning: An Introduction. Available free online.
  2. [Szepesvari] Csaba Szepesvari. Algorithms for Reinforcement Learning. Available free online.
  3. [BT] Dimitri Berstekas and John Tsitsiklis. Neuro-Dynamic Programming.
  4. [LS] Tor Lattimore and Csaba Szepesvari. Bandit Algorithms. Available free online.
  5. [Powell] Warren Powell. Reinforcement Learning and Stochastic Optimization: A Unified Framework for Sequential Decisions.
  6. [Puterman] Martin Puterman. Markov Decision Processes.
  7. [Berstekas] Dimitri Berstekas. Dynamic Programming and Optimal Control - Vols I and II.