Please fill this Google form if you are planning to either attend in-person or remote.

Date: August 11, 2022

Time: 9am to 5pm EST

Address: 6650 Saint-Urbain, Montréal, QC H2S 3H1

Room: Mila Agora

Contact: goncalo-filipe.torcato-mordido@mila.quebec

Time Speaker Topic Abstract
9:00am
-
9:15am
Sarath Chandar Opening remarks A welcome message with an overview of various research activities at CRL.
9:15am
-
10:00am
Louis Clouâtre Language Understanding and the Ubiquity of Local Structure Recent research as shown that neural language models are surprisingly insensitive to text perturbation, such as shuffling the order of words. If the order of words is unnecessary to perform natural language understanding on many tasks, what is? We empirically demonstrate that local structure is always relied upon by neural language models to build understanding, and global structure is often unused. These results hold for over 400 different languages. We use this property of neural language models to automatically detect which of those 400 different languages are not currently well understood by our current crop of pretrained cross-lingual model, thus providing visibility into where our efforts should go as a research community.
10:00am
-
10:45am
Pranshu Malviya TAG: Task-based Accumulated Gradients for Lifelong learning When an agent encounters a continual stream of new tasks in the lifelong learning setting, it leverages the knowledge it gained from the earlier tasks to help learn the new tasks better. In such a scenario, identifying an efficient knowledge representation becomes a challenging problem. Most research works propose to either store a subset of examples from the past tasks in a replay buffer, dedicate a separate set of parameters to each task or penalize excessive updates over parameters by introducing a regularization term. While existing methods employ the general task-agnostic stochastic gradient descent update rule, we propose a task-aware optimizer that adapts the learning rate based on the relatedness among tasks. We utilize the directions taken by the parameters during the updates by additively accumulating the gradients specific to each task. These task-based accumulated gradients act as a knowledge base that is maintained and updated throughout the stream. We empirically show that our proposed adaptive learning rate not only accounts for catastrophic forgetting but also exhibits knowledge transfer. We also show that our method performs better than several state-of-the-art methods in lifelong learning on complex datasets. Moreover, our method can also be combined with the existing methods and achieve substantial improvement in performance.
10:45am
-
11:15am
  Break  
11:15am
-
12:00pm
Sarath Chandar Memory-augmented Optimizers for Deep Learning In this talk, I will introduce the idea of adding external memory to standard optimization methods to improve their performance in deep learning. Specifically, I will introduce a new family of critical gradient-based optimizers. Such optimizers retain a limited view of their gradient history in their internal memory and scale well to large real-life datasets. Our experiments show that the proposed memory augmented extensions of standard optimizers enjoy accelerated convergence and improved performance on a majority of computer vision and language tasks that we considered.
12:00pm
-
1:00pm
  Lunch break  
1:00pm
-
1:45pm
Ali Rahimi-Kalahroudi Towards Evaluating Adaptivity of Model-Based Reinforcement Learning Methods In recent years, a growing number of deep model-based reinforcement learning (RL) methods have been introduced. The interest in deep model-based RL is not surprising, given its many potential benefits, such as higher sample efficiency and the potential for fast adaption to changes in the environment. However, we demonstrate, using an improved version of the recently introduced Local Change Adaptation (LoCA) setup, that well-known model-based methods such as PlaNet and DreamerV2 perform poorly in their ability to adapt to local environmental changes. Combined with prior work that made a similar observation about the other popular model-based method, MuZero, a trend appears to emerge, suggesting that current deep model-based methods have serious limitations. We dive deeper into the causes of this poor performance, by identifying elements that hurt adaptive behavior and linking these to underlying techniques frequently used in deep model-based RL. We empirically validate these insights in the case of linear function approximation by demonstrating that a modified version of linear Dyna achieves effective adaptation to local changes. Furthermore, we provide detailed insights into the challenges of building an adaptive nonlinear model-based method, by experimenting with a nonlinear version of Dyna.
1:45pm
-
2:30pm
Hadi Nekoei Dealing with Non-stationarity in Decentralized Multi-agent RL One of the key research challenges in decentralized multi-agent reinforcement learning (MARL) is the non-stationarity of the learning environment when the agents are learning simultaneously. In this talk, we first formally prove that independent iterative best response (IIBR) update scheme that is widely being used in MARL literature due to its implementation efficiency is not guaranteed to converge while sequential iterative best response (SIBR) always converges to an agent-by-agent optimal solution. Even though SIBR can completely alleviate the challenge of non-stationarity, it is slow. We propose a practical MARL algorithm inspired by two-timescale stochastic approximation (borkar1997) that speeds up SIBR and outperform state-of-the-art decentralized learning methods on almost all the tasks in the epymarl benchmark. This can be seen as a first step towards more decentralized MARL methods based on SIBR and multi-timescale learning.
2:30pm - 3:00pm Darshan Patil RLHive: A Framework for Reinforcement Learning Research RLHive is a framework designed to facilitate research in reinforcement learning. It provides the components necessary to run a full RL experiment, for both single agent and multi-agent environments. It is designed to be readable and easily extensible, to allow users to quickly run and experiment with their own ideas. We establish an intuitive and well-documented API making it easy to adapt prior algorithms to novel use cases.
3:00pm
-
3:30pm
  Break  
3:30pm
-
4:15pm
Daphné Lafleur Combining Reinforcement Learning and Constraint Programming for Sequence-Generation Tasks with Hard Constraints While Machine Learning (ML) techniques are good at generating data similar to a dataset, they lack the capacity to enforce constraints. On the other hand, any solution to a Constraint Programming (CP) model satisfies its constraints but has no obligation to imitate a dataset. Yet, we sometimes need both. In this paper we borrow RL-Tuner, a Reinforcement Learning (RL) algorithm introduced to tune neural networks, as our enabling architecture to exploit the respective strengths of ML and CP. RL-Tuner maximizes the sum of a pretrained network’s learned probabilities and of manually-tuned penalties for each violated constraint. We replace the latter with outputs of a CP model representing the marginal probabilities of each note and the number of constraint violations. As was the case for the original RL-Tuner, we apply our algorithm to music generation since it is a highly-constrained domain for which CP is especially suited. We show that combining ML and CP, as opposed to using them individually, allows the agent to reflect the pretrained network while taking into account constraints, leading to melodic lines that respect both the corpus’ style and the music theory constraints.
4:15pm
-
5:00pm
Simon Guiroy Improving Meta-Learning Generalization with Activation-Based Early-Stopping Meta-learning algorithms for few-shot learning aim to train neural networks capable of generalizing to novel tasks using only a few examples. Early-stopping is crucial to achieving the desired generalization, halting model training when it reaches optimal generalization to unseen examples. Early-stopping mechanisms in Meta-Learning typically rely on measuring the model performance on labeled examples from a meta-validation set drawn from the training dataset. However, this is problematic in few-shot transfer learning settings, where the meta-test set comes from a different target dataset and can potentially have a large distributional shift compared to the meta-validation set. In this work, we propose an alternative to using validation-based early-stopping for meta-learning. Specifically, we analyze the evolution, during meta-training, of the neural activations at each hidden layer, on a small set of unlabelled support examples from a single task of the target tasks distribution, as this constitutes a minimal and justifiably accessible information from the target problem. Our experiments show that simple, label agnostic statistics on the activations offer an effective way to estimate how the target generalization evolves over time. The activation distributions, at each hidden layer, are characterized from their first and second order moments, then further summarized along the feature dimensions, resulting in a compact yet intuitive characterization in a four-dimensional space. Detecting when, throughout training time, the target activation trajectory diverges from the source trajectory, allows us to perform early-stopping and improve generalization in a large array of few-shot transfer learning settings.