Schedule
Tentative Class Schedule
All the lecture scribbles can be found here.
Lec | Date | Topic | Mandatory Readings | Optional Readings |
---|---|---|---|---|
1 | 28 Aug 2023 | Introduction, Sequential Decision Problems | Slides, SB Chapter 1, Faulty Reward Functions | Computing Machinery and Intelligence by Alan Turing, Probability Review, Linear Algebra Review |
2 | 11 Sep 2023 | Supervised Learning, Online Learning, Function Approximation | Relevant parts of chapters 1, 2, 7, 11, 12 from my ML lecture notes. | |
3 | 18 Sep 2023 | Immediate Reinforcement Learning: Multi-armed Bandits and Contextual Bandits | SB Chapter 2 | Algorithms for MAB problems and clinical trial application, A survey on practical applications of bandits |
4 | 25 Sep 2023 | Gradient Bandits, Contextual Bandits, Imitation Learning, Markov Decision Process | SB Chapter 2, 3. DAGGER | |
5 | 02 Oct 2023 | Markov Decision Process and Dynamic Programming | SB Chapter 3 and 4. | Reward is Enough, Reward is not Enough |
6 | 16 Oct 2023 | Monte-Carlo Methods, Monte-Carlo Prediction, Monte-Carlo Control, Off-policy Learning, Temporal Difference Learning | SB Chapter 5 and 6. | |
7 | 23 Oct 2023 | Temporal Difference Learning, Sarsa, Q-Learning, n-step methods, Divergence of off-policy TD with function approximation | SB Chapter 6, 7, and 11. | |
27 Oct 2023 | Mid-Term Exam (12:45 pm to 2:15 pm) | Sample Questions | ||
8 | 30 Oct 2023 | Deep Q-Learning, Rainbow, Average Reward RL | DQN, DQN Nature, Double DQN, Rainbow, SB Chapter 10. | |
9 | 06 Nov 2023 | Policy Gradient Methods - I (Reinforce, Actor-Critic, A3C, A2C) | SB Chapter 13, A3C paper | REINFORCE paper |
10 | 13 Nov 2023 | Policy Gradient Methods - II ( Deterministic Policy Gradient, DDPG, TD3, Natural Policy Gradient, Trust Region Methods | DPG, DDPG, TD3, NPG Explained, TRPO, PPO | |
11 | 20 Nov 2023 | Model-Based RL: Background Planning, Decision-Time Planning, Dyna-Q, MPC, CEM, MCTS, AlphaGo Family, Dreamer Family | SB Chapter 8, AlphaGo, Dreamer | Dreamer v2, Dreamer v3, MuZero |
12 | 27 Nov 2023 | Multi-agent RL: Independent Learning, Self-Play, CTDE. Partial Observability: POMDP, Belief-state MDPs, RNNs for POMDPs | MARL, POMDP | |
13 | 04 Dec 2023 | Hierarchical RL: Options, SMDP Q-Learning, Intra-Option Q-Learning. Frontiers in RL | HRL, Continual RL | |
19 Dec 2023 | Final Exam (1:30 pm to 4:00 pm) | Mid-term Paper |
Tutorials
Lec | Date | Time | Topic | Lecture Videos | Lecture Materials |
---|---|---|---|---|---|
1 | 15 Sep 2023 | 1 pm - 2:30 pm | PyTorch | Recording | Notebook |