Welcome to the Fall 2024 edition of the Reinforcement Learning course!

Designing autonomous decision-making systems is one of the longstanding goals of Artificial Intelligence. Such decision-making systems, if realized, can have a big impact on machine learning for robotics, game playing, control, and health care to name a few. This course introduces Reinforcement Learning as a general framework to design such autonomous decision-making systems. By the end of this course, you will have a solid knowledge of the core challenges in designing RL systems and how to approach them.

This course will be offered in English. However, the students in this course can submit in English or French any written work that is to be graded.

Quebec university students from outside Polytechnique Montreal can register for the course via Inter-University Transfer Authorization.

Please note that I will also be teaching Machine Learning (ML) (INF8245E) this Fall. You can take both courses (ML and RL) in parallel.

If you are a student at Poly, UdeM, HEC, McGill, or Mila, then you can request to audit this course by filling out this Google Form. You can directly come to the class in person after filling out this form.

General Information

When?
Mondays 12:45 pm to 3:45 pm (starting from 28 Aug 2024).

Where?
L-1710
L-1710 is in the Lassonde building of Poly. You can find the main building on the map here.
There will not be a remote option this year.

About Labs
The lab slot for this course is every Friday from 11:30 am to 2:45 pm. The lab slot will be mainly used for online tutorials, in-person/online recitations, and TA office hours. You can use the rest of the lab time to work on the practical assignments by yourself. Any in-person lab activities will happen in M-1120.

People

Instructor

TAs

  • Hadi Nekoei
  • Darshan Patil
  • Prashant Govindarajan

Office Hours

Name Day Time Location
Sarath Monday 3.45 PM to 4.45 PM M-3406

Please note that TAs will hold their office hours during the lab slot.

Logistics

Prerequisites

Basic knowledge of Probability Theory/statistics (MTH2302 or equivalent), calculus, and linear algebra (MTH1007 or equivalent) is required.

You should be already familiar with the following sections in this book: Mathematics for Machine Learning.

  • Section 2: Subsections 2.1, to 2.6 (inclusive)
  • Section 3: All subsections
  • Section 4: Subsections 4.1 to 4.5.1 (inclusive)
  • Section 5: Subsections 5.1, 5.2, 5.3, 5.4, 5.5, 5.7
  • Section 6: Subsections 6.1 to 6.5 (inclusive)

The course is intended for hard-working, technically skilled, highly motivated students. Participants will be expected to display initiative, creativity, scientific rigour, critical thinking, and good communication skills.

If you do not have the necessary prerequisites, then you have to spend a lot of time in this course (more than what is required for a 4-credit course).

Useful Online Courses Covering the Prerequisites

While I do not expect you to know everything from the following courses, I recommend you to do these video courses at some point in the future if you are serious about doing Reinforcement Learning.

Video Recordings

The lectures and tutorials might be recorded and released to the public. By registering for the course, you agree to record and release videos.

Programming Language

We will use Python 3 in all the assignments.

Evaluation Criteria

The class grade will be based on the following components:

  • 4 Theory/Programming assignments (individual) - 30%
  • Mid-term examination - 15%
  • End-term examination - 30%
  • course project (team) - 25%

We will use Gradescope for all the assignments. More detailed instructions on how to use Gradescope will be released at the beginning of the course.

Late Submissions

If you submit your assignments and project reports after the deadline, we will follow the following penalty scheme:

  • You will be penalized 5% if your submission is within 24 hours (1 day) from the deadline.
  • You will be penalized 10% if your submission is after 24 hours from the deadline and within 48 hours (2 days) from the deadline.
  • You will be penalized 20% if your submission is after 48 hours from the deadline and within 72 hours (3 days) from the deadline.
  • You cannot submit your assignments/reports after 72 hours from the deadline.

Syllabus

Tentative Course Content

Introduction to RL - Sequential Decision Problems - Supervised Learning - Online Learning - Function Approximation - Immediate RL - Multi-armed bandits - Contextual Bandits - Monte-Carlo Methods - Markov Decision Process - Dynamic Programming - Policy Iteration - Value Iteration - Temporal Difference Learning - Sarsa - Q-Learning - DQN - n-step bootstrapping - Eligibility Traces - Policy Gradient Methods - Reinforce - Actor-Critic - A3C - SAC - Deterministic Policy Gradient - DDPG - TD3 - Natural Policy Gradient - Trust Region Methods - Model-based RL - Hierarchical RL - Frontiers in RL.

Reference Materials

There is no primary textbook for this course. Here is a list of relevant references:

  1. [SB] Rich Sutton and Andrew Barto. Reinforcement Learning: An Introduction. Available free online.
  2. [Szepesvari] Csaba Szepesvari. Algorithms for Reinforcement Learning. Available free online.
  3. [BT] Dimitri Berstekas and John Tsitsiklis. Neuro-Dynamic Programming.
  4. [LS] Tor Lattimore and Csaba Szepesvari. Bandit Algorithms. Available free online.
  5. [Powell] Warren Powell. Reinforcement Learning and Stochastic Optimization: A Unified Framework for Sequential Decisions.
  6. [Puterman] Martin Puterman. Markov Decision Processes.
  7. [Berstekas] Dimitri Berstekas. Dynamic Programming and Optimal Control - Vols I and II.