Course Info
Welcome to the Fall 2024 edition of the Reinforcement Learning course!
Designing autonomous decision-making systems is one of the longstanding goals of Artificial Intelligence. Such decision-making systems, if realized, can have a big impact on machine learning for robotics, game playing, control, and health care, to name a few. This course introduces Reinforcement Learning as a general framework for designing such autonomous decision-making systems. By the end of this course, you will have a solid knowledge of the core challenges in designing RL systems and how to approach them.
This course will be offered in English. However, students can submit written work that is to be graded in English or French.
Quebec university students from outside Polytechnique Montreal can register for the course via Inter-University Transfer Authorization.
Please note that I will also be teaching Machine Learning (ML) (INF8245AE) this Fall. You can take both courses (ML and RL) in parallel.
If you are a student at Poly, UdeM, HEC, McGill, or Mila, then you can request to audit this course by filling out this Google Form. You can directly come to the class in person after filling out this form.
General Information
When?
Mondays 12:45 pm to 3:45 pm (starting from 26 Aug 2024).
Where?
L-1710
L-1710 is in the Lassonde building of Poly. You can find the building on the map here.
About Labs
The lab slot for this course is every Friday from 11:30 am to 2:45 pm. The lab slot will be mainly used for online tutorials, in-person/online recitations, and TA office hours. You can use the rest of the lab time to work on the practical assignments by yourself. Any in-person lab activities will happen in M-1120.
People
Instructors
- Sarath Chandar
- Nishanth Anand
TAs
- Ali Rahimi-Kalahroudi
- Antoine Clavaud
- Artem Zholus
- Esther Derman
Office Hours
Name | Day | Time | Location |
---|---|---|---|
Instructors | Monday | 3.45 PM to 4.45 PM | M-3406 |
Please note that TAs will hold their office hours during the lab slot.
Logistics
Prerequisites
Basic knowledge of Probability Theory/statistics (MTH2302 or equivalent), calculus, and linear algebra (MTH1007 or equivalent) is required.
You should be already familiar with the following sections in this book: Mathematics for Machine Learning.
- Section 2: Subsections 2.1, to 2.6 (inclusive)
- Section 3: All subsections
- Section 4: Subsections 4.1 to 4.5.1 (inclusive)
- Section 5: Subsections 5.1, 5.2, 5.3, 5.4, 5.5, 5.7
- Section 6: Subsections 6.1 to 6.5 (inclusive)
The course is intended for hard-working, technically skilled, highly motivated students. Participants will be expected to display initiative, creativity, scientific rigour, critical thinking, and good communication skills.
If you do not have the necessary prerequisites, then you have to spend a lot of time in this course (more than what is required for a 4-credit course).
Useful Online Courses Covering the Prerequisites
While I do not expect you to know everything from the following courses, I recommend you to do these video courses at some point in the future if you are serious about doing Reinforcement Learning.
- Prof. Gilbert Strang’s video lectures on linear algebra.
- Prof. John Tsitsiklis’s video lectures on Applied Probability.
- Prof. Krishna Jagannathan’s video lectures on Probability Theory.
- Prof. Deepak Khemani’s video lectures on Artificial Intelligence.
- My video lectures on Machine Learning.
Video Recordings
The lectures and tutorials might be recorded and released to the public. By registering for the course, you agree to record and release videos.
Programming Language
We will use Python 3 in all the assignments.
Evaluation Criteria
The class grade will be based on the following components:
- 4 Theory/Programming assignments (individual) - 30%
- Mid-term examination - 20%
- End-term examination - 30%
- course project (team) - 20%
To obtain a passing grade on the course (D or better), a necessary but not sufficient condition is to obtain at least 50% in both the mid-term and end-term exams combined.
We will use Gradescope for all the assignments. More detailed instructions on how to use Gradescope will be released at the beginning of the course.
Late Submissions
If you submit your assignments and project reports after the deadline, we will follow the following penalty scheme:
- You will be penalized 5% if your submission is within 24 hours (1 day) from the deadline.
- You will be penalized 10% if your submission is after 24 hours from the deadline and within 48 hours (2 days) from the deadline.
- You will be penalized 20% if your submission is after 48 hours from the deadline and within 72 hours (3 days) from the deadline.
- You cannot submit your assignments/reports after 72 hours from the deadline.
Syllabus
Tentative Course Content
Introduction to RL - Sequential Decision Problems - Immediate RL - Multi-armed bandits - Contextual Bandits - Markov Decision Process - Dynamic Programming - Policy Iteration - Value Iteration - Monte-Carlo Methods - Temporal Difference Learning - Sarsa - Q-Learning - n-step bootstrapping - Eligibility Traces - Function Approximation - DQN - Rainbow - Policy Gradient Methods - Reinforce - Actor-Critic - A3C - SAC - Deterministic Policy Gradient - DDPG - TD3 - Natural Policy Gradient - Trust Region Methods - Model-based RL - Hierarchical RL - Frontiers in RL.
Reference Materials
There is no primary textbook for this course. Here is a list of relevant references:
- [SB] Rich Sutton and Andrew Barto. Reinforcement Learning: An Introduction. Available free online.
- [Szepesvari] Csaba Szepesvari. Algorithms for Reinforcement Learning. Available free online.
- [BT] Dimitri Berstekas and John Tsitsiklis. Neuro-Dynamic Programming.
- [LS] Tor Lattimore and Csaba Szepesvari. Bandit Algorithms. Available free online.
- [Powell] Warren Powell. Reinforcement Learning and Stochastic Optimization: A Unified Framework for Sequential Decisions.
- [Puterman] Martin Puterman. Markov Decision Processes.
- [Berstekas] Dimitri Berstekas. Dynamic Programming and Optimal Control - Vols I and II.
- [Sayed] Ali H. Sayed. Inference and Learning from Data - Vol II.