The CRL symposium is an annual event that showcases some highlights of the research work that happened in the Chandar Lab in the last year. The symposium will also have invited talks from our academic/industrial collaborators and a keynote. The keynote talk for this year will be given by Prof. Karthik Narasimhan (Princeton University).

Date: August 8 and 9, 2023

Time: 9 am to 5 pm EST

Mode: Hybrid (both remote and in-person)

Address: 6650 Saint-Urbain, Montréal, QC H2S 3H1

Room: Mila Agora

How to register? Please fill out this Google form if you are planning to either attend in-person or remotely.

Contact: goncalo-filipe.torcato-mordido@mila.quebec

Day 1 (August 8)

Time Speaker Topic Abstract
9:00am
-
9:15am
Sarath Chandar (CRL) Opening remarks A welcome message with an overview of various research activities at CRL.
9:15am
-
9:45am
Gonçalo Mordido (CRL) Lookbehind optimizer: k steps back, 1 step forward The Lookahead optimizer improves the training stability of deep neural networks by having a set of fast weights that “look ahead” to guide the descent direction. Here, we combine this idea with sharpness-aware minimization (SAM) to stabilize its multi-step variant and improve the loss-sharpness trade-off. We propose Lookbehind, which computes k gradient ascent steps (“looking behind”) at each iteration and combine the gradients to bias the descent step toward flatter minima.
9:45am
-
10:15am
Reza Babanezhad (Samsung) Fast online node labeling for very large graphs This talk considers the online node classification problem under a transductive learning setting. Current methods either invert a graph kernel matrix with O(n^3) runtime and O(n^2) space complexity or sample a large volume of random spanning trees, thus are difficult to scale to large graphs. In this work, we propose an improvement based on the \textit{online relaxation} technique introduced by a series of works . We first prove an effective regret O(\sqrt{n^{1+\gamma}}) when suitable parameterized graph kernels are chosen, then propose an approximate algorithm FastONL enjoying O(k\sqrt{n^{1+\gamma}}) regret based on this relaxation. The key of FastONL is a generalized local push method that effectively approximates inverse matrix columns and applies to a series of popular kernels. Furthermore, the per-prediction cost is O(\vol{\mathcal{S}}\log 1/\eps) locally dependent on the graph with linear memory cost. Experiments show that our scalable method enjoys a better tradeoff between local and global consistency.
10:15am
-
10:30am
  Break  
10:30am
-
11:00am
Pranshu Malviya (CRL) Promoting exploration in memory-augmented Adam using critical momenta Adaptive gradient-based optimizers such as Adam often generalize worse than non-adaptive methods. Recent studies have tied this performance gap to flat minima selection: adaptive methods tend to find solutions in sharper basins of the loss landscape, which in turn hurts generalization. To overcome this issue, we propose a new memory-augmented version of Adam that promotes exploration towards flatter minima by using a buffer of critical momentum terms during training. We empirically show that our method improves the performance of several variants of Adam on standard supervised language modelling and image classification tasks.
11:00am
-
11:30am
Naga Karthik (CRL) Continual learning with episodic memories for medical image segmentation Current deep learning models for medical image segmentation are highly sensitive to distributional shifts due to variations in scanners and acquisition parameters, thus hurting their performance when evaluated on different domains. Continual learning (CL) offers a promising alternative to address this challenge by training a model on sequentially arriving data while adapting to the distributional shifts of new tasks. In this work, we propose a framework for medical image segmentation based on CL with episodic memories. We empirically analyze the role of memory buffer sizes and show that similar/better performance with respect to the multi-task training can be achieved by storing only up to five samples per domain, irrespective of the sequence of domains. We validate our approach on two multi-site datasets brain multiple sclerosis lesions and spinal cord gray matter segmentation tasks.
11:30am
-
12:00pm
Simon Guiroy (CRL) Improving the generalization of vision foundation models to target domains In this work we propose and examine an approach to foundation model selection based on just a few unlabeled examples from the target task. Our approach (ABE++) measures a form of neural coherence between the target and source activation distributions, and allows one to pick a model just before neural coherence starts to break down. We provide experiments in which foundation models are pretrained on ImageNet1K and we examine target domains consisting of the Food101, PlantNet-300K and iNaturalist evaluations. Our approach significantly improves generalization across these different target domains compared to prior work on ABE and other baselines.
12:00pm
-
1:00pm
  Lunch break  
1:00pm
-
1:30pm
Darshan Patil (CRL) There and back again: Forward-backward as a strong performer for reset-free RL In the real world, the strong episode resetting mechanisms that are needed to train agents in simulation are unavailable. The resetting assumption limits the potential of reinforcement learning in the real world, as providing resets to an agent usually requires the creation of additional handcrafted mechanisms or human interventions. Recent work aims to train agents (forward) with learned resets by constructing a second (backward) agent that returns the forward agent to the initial state. We find that the termination and timing of the transitions between these two agents are crucial for algorithm success. With this in mind, we create a new algorithm, Early Switching Forward Backward (ESFB) which intelligently switches between the two agents based on the agent’s confidence in achieving its current goal. Our new method achieves state-of-the-art performance on several challenging environments for reset-free RL.
1:30pm
-
2:00pm
Janarthanan Rajendran (CRL) Reinforcement learning with poor reward signals Reinforcement learning (RL) provides a computational framework for goal-directed learning from interaction. In particular, it defines the problem of an agent interacting with its environment to achieve a goal represented by a numerical reward signal. The agent is not told which actions to take, but instead must discover which actions lead to higher rewards through interaction. For many real-world RL tasks, the quality of the reward signal is often poor (e.g., sparse, noisy, or change over time), making the learning of the task slow and difficult for current RL methods. In this talk, I will present an outline of my research program that aims to develop RL methods that can effectively learn with poor reward signals.
2:00pm
-
2:30pm
Miao Liu (IBM) Learning for advising, adaptation and explanation in multiagent domains Learning in multiagent domains is fundamentally difficult because an agent interacts with other simultaneously learning agents in a shared environment resulting in a large learning space to explore. Meanwhile, the joint learning of agents induces non-stationary environment dynamics from the perspective of each agent, requiring an agent to adapt its behavior with respect to potentially unknown changes in the policies of other agents. Moreover, currently multiagent reinforcement learning (MARL) solutions are dominated by deep RL methods, most of which have large model sizes resulting response times that are too slow for real time control and lack human interpretability in the decisions. In this talk, I will introduce 1) a novel framework enabling multiple agents advising each other in cooperative settings to improve the sample complexity of teamwide learning, 2) a few recent ways to address nonstationary when multiple agents are learning simultaneously, and 3) several new approaches, including strategic state extraction and logical neural network towards interpretable MARL based on real-world applications.
2:30pm
-
3:00pm
  Break  
3:00pm
-
3:30pm
Ali Rahimi-Kalahroudi (CRL) Replay buffer with local forgetting for adapting to local environment changes in deep MBRL One of the key behavioral characteristics used in neuroscience to determine whether the subject of study—be it a rodent or a human—exhibits model-based learning is effective adaptation to local changes in the environment. In reinforcement learning, however, recent work has shown that modern deep model-based reinforcement-learning (MBRL) methods adapt poorly to local environment changes. In this work, we reiterate the design choices that preclude effective adaptation and show that a conceptually simple variation of the traditional replay buffer can overcome these limitations. By removing only samples from the buffer from the local neighbourhood of the newly observed samples, deep world models can be built that maintain their accuracy across the state-space, while also being able to effectively adapt to local changes in the reward function.
3:30pm
-
4:00pm
Xutong Zhao (CRL) Conditionally optimistic exploration for cooperative deep multi-agent reinforcement learning This work proposes a multi-agent exploration method that effectively encourages cooperative exploration based on the idea of sequential action-computation scheme. The high-level intuition is that to perform optimism-based exploration, agents would explore cooperative strategies if each agent’s optimism estimate captures a structured dependency relationship with other agents. Our method Conditionally Optimistic Exploration (COE) is compatible with any value decomposition method for centralized training with decentralized execution. Experiments across various cooperative MARL benchmarks show that COE outperforms current state-of-the-art exploration methods on hard-exploration tasks.
4:00pm
-
4:30pm
Hadi Nekoei (CRL) Towards few-shot coordination: Revisiting ad-hoc teamplay challenge in the game of Hanabi Cooperative Multi-agent Reinforcement Learning (MARL) algorithms with Zero-Shot Coordination (ZSC) have gained significant attention in recent years. While ZSC is crucial for cooperative MARL agents, it might not be possible for complex tasks and changing environments. Agents also need to adapt and improve their performance with minimal interaction with other agents. In this work, we show empirically that state-of-the-art ZSC algorithms have poor performance when paired with agents trained with different methods, and they require millions of samples to adapt to these new partners. To investigate this issue, we formally defined a framework based on a popular cooperative multi-agent game called Hanabi to evaluate the adaptability of MARL methods. After evaluating several SOTA algorithms using our framework, our experiments reveal that naive Independent Q-Learning (IQL) agents in most cases adapt as quickly as the SOTA ZSC algorithm Off-Belief Learning (OBL). This finding raises an interesting research question: How to design MARL algorithms with high ZSC performance and capability of fast adaptation to unseen partners. As a first step, we studied the role of different hyper-parameters and design choices on the adaptability of current MARL algorithms. We hope this initial analysis will inspire more work on designing both general and adaptive MARL algorithms.
4:30pm
-
5:00pm
Rached Bouchoucha (Polytechnique Montreal) Toward debugging deep reinforcement learning programs with RLExplore RLExplorer, a fault diagnosis approach for DRL-based systems. RLExplorer automatically runs verification routines based on properties of the learning dynamics to detect the occurrence of DRL-specific faults. Our approach monitors training traces and conducts diagnoses that capture the learning dynamics. It then logs the results of these diagnoses as warnings that explain the root source of the error and the potential solutions for it.

Day 2 (August 9)

Time Speaker Topic Abstract
9:00am
-
9:30am
Megh Thakkar (Google) Self-influence guided data reweighting for language model pre-training Language Models (LMs) pre-trained with self-supervision on large text corpora have become the default starting point for developing models for various NLP tasks. Once the pre-training corpus has been assembled, all data samples in the corpus are treated with equal importance during LM pre-training. However, due to varying levels of relevance and quality of data, equal importance to all the data samples may not be the optimal choice. While data reweighting has been explored in the context of task-specific supervised learning and LM fine-tuning, model-driven reweighting for pre-training data has not been explored. We fill this important gap and propose PReSence, a two phased learning method which leverages self-influence (SI) scores as an indicator of sample importance, a first step in the research direction of data sample reweighting for pre-training language models.
9:30am
-
10:00am
Nirav Pravinbhai Bhatt (IIT Madras) Functional groups are all you need: Chemically interpretable molecular representation for property prediction Molecular property prediction using a molecule’s structure is a crucial step in drug and novel material discovery, as computational screening approaches rely on predicted properties to refine the existing design of molecules. Although the problem has existed for decades, it has recently gained attention due to the advent of big data and deep learning. On average, one FDA drug is approved for 250 compounds entering the preclinical research stage, requiring screening of chemical libraries containing more than 20000 compounds. In-silico property prediction approaches using learnable representations increase the pace of development and reduce the cost of discovery. We propose developing molecule representations using functional groups in chemistry to address the problem of deciphering the relationship between a molecule’s structure and property. Functional groups are substructures in a molecule with distinctive chemical properties that influence its chemical characteristics. These substructures are found by (i) curating functional groups annotated by chemists and (ii) mining a large corpus of molecules to extract frequent substructures using a pattern-mining algorithm. We show that the Functional Group Representation (FGR) framework beats state-of-the-art models on several benchmark datasets while ensuring explainability between the predicted property and molecular structure to experimentalists. The work is done jointly with Roshan M S B, Joe B., and Guna Sekhar, G.
10:00am
-
10:30am
Gabriele Prato (CRL) Evaluation for language models as epistemic models In the era of artificial intelligence, the role of large language models (LLMs) is becoming increasingly pivotal. Despite their extensive usage, their proficiency to integrate and consolidate knowledge — a key aspect for effective inference — remains under-studied. To address this gap, we introduce EpiK-Eval, a unique question-answering benchmark designed to probe LLMs’ ability to formulate a coherent and consistent knowledge representation from segmented narratives. Employing various LLMs in our evaluation, we find that they demonstrate substantial shortcomings in this task. We argue that these shortcomings stem from the intrinsic nature of the existing training objectives. Consequently, we advocate for refining the approach towards knowledge consolidation, as it harbors the potential to dramatically improve their overall effectiveness and performance. The insights gained through this research have implications for the future development of more robust and reliable LMs.
10:30am
-
11:00am
  Break  
11:00am
-
12:00pm
Karthik Narasimhan (Princeton University) Keynote: Towards general-purpose language-enabled agents: Machines that can read, think and act Large language models (LLMs) have ushered in exciting capabilities in language understanding and text generation, with systems like ChatGPT holding fluent dialogs with users and being almost indistinguishable from a human. While this has obviously raised conversational systems and chatbots to a new level, it also presents exciting new opportunities for building artificial agents with improved decision making capabilities. In other words, the ability to reason with language can allow us to build agents that can 1) perform complex action sequences to effect change in the world, 2) learn new skills by ‘reading’ in addition to ‘doing’, and 3) allow for easier personalization and control over their behavior. In this talk, I will demonstrate how we can build and benchmark language-enabled agents that exhibit the above traits in various use cases such as web interaction, robotic tool manipulation and puzzle solving.
12:00pm
-
1:00pm
  Lunch break  
1:00pm
-
1:30pm
Prasanna Parthasarathi (Huawei) Towards furthering the reasoning mechanisms in LLMs We, as a community, have witnessed remarkable progress in recent months towards enhancing reasoning capabilities with large language models. Such systems play a crucial role in diverse cross-domain applications, emphasizing the importance of tackling new challenges to foster efficient mechanisms of reason. In this engaging talk, I will provide a concise overview of the significant strides we have made in advancing reasoning mechanisms in LLMs. Moreover, I will discuss the underexplored directions that hold potential for furthering these mechanisms and unlocking new frontiers in developing artificial intelligence.
1:30pm
-
2:00pm
Amirhossein Kazemnejad (Mila/McGill) Measuring the knowledge acquisition-utilization gap in pretrained language models We propose a framework to measure the gap between acquired and utilized knowledge in pretrained language models. We find that while larger models acquire more factual knowledge. their ability to utilize this knowledge robustly in downstream tasks remains limited. Our study provides insights into models capabilities beyond just their acquired knowledge, highlighting an important utilized knowledge gap.
2:00pm
-
2:30pm
Ioana Baldini (IBM) Auditing language models for bias The current AI landscape is dominated by larger and larger pre-trained language models (PLMs) applied to different domains without properly understanding how to evaluate their capabilities, and more importantly, how to audit and test their fairness behavior. Auditing PLMs for unwanted social bias is challenging, not only due to their opaque behavior, but also due to the multidisciplinary nature of the work. In the context of PLM-based Natural Language Processing (NLP) systems, this problem is exacerbated by the fact that no single entity controls all the components of an NLP pipeline. As such, it is crucial to assess the fairness of a PLM by only inspecting its outputs. In this talk, we will go over one such example of PLM bias auditing, detailing the challenges and complexities that emerge in the process.
2:30pm
-
3:00pm
  Break  
3:00pm
-
3:30pm
Abdelrahman Zayed (CRL) Should we attend more or less? Modulating attention for fairness In this presentation, we investigate the role of attention in the propagation of social biases. Specifically, we study the relationship between the entropy of the attention distribution and the model’s performance and fairness. We then propose a novel method for modulating attention weights to improve model fairness after training.
3:30pm
-
4:00pm
Andreas Madsen (CRL) Faithfulness measurable masked language models This talk presents a new approach to the faithfulness measurability challenge in interpretability, by training a model such that measuring faithfulness of importance measures is easy. The approach works on masked language models in general, and solves long standing challenges in interpretability regarding measuring faithfulness. Additionally, because measuring faithfulness is easy we are also able to optimize for an optimal explanation and find new synergies between our approach and occlusion-based importance measures, resulting in unprecedented faithful explanations.
4:00pm
-
5:00pm
  Closing remarks and snacks