Chandar Research Lab (CRL) Annual Symposium 2024
We welcome you to the fifth annual CRL symposium !
The CRL symposium is an annual event that showcases some highlights of the research work that happened in the Chandar Lab in the last year. The symposium will also have invited talks from our academic/industrial collaborators and a keynote. The keynote talk for this year will be given by Dr. Ahmad Beirami (Google DeepMind).
Date: August 19 and 20, 2024
Time: 9 am to 5 pm EST
Mode: Hybrid (both remote and in-person)
Address: 6650 Saint-Urbain, Montréal, QC H2S 3H1
Room: Mila Agora
How to register? Please register on Eventbrite (it takes 1min) if you are planning to either attend in-person or remotely.
Contact: mathieu.reymond@mila.quebec
Day 1 (August 19)
Time | Speaker | Topic | Abstract |
---|---|---|---|
9:00am - 9:15am |
Sarath Chandar (CRL) | Opening remarks | A welcome message with an overview of various research activities at CRL. |
9:15am - 9:45am |
Pranshu Malviya (CRL) | Predicting the Impact of Model Expansion through the Minima Manifold: A Loss Landscape Perspective | The optimal model for a given task is often challenging to determine, requiring training multiple models from scratch which becomes prohibitive as dataset and model sizes grow. A more efficient alternative is to reuse smaller pre-trained models by expanding them, however, this is not widely adopted as how this impacts training dynamics remains poorly understood. While prior works have introduced statistics to measure these effects, they remain flawed. To rectify this, we offer a new approach for understanding and quantifying the impact of expansion through the lens of the loss landscape, which has been shown to contain a manifold of linearly connected minima. Building on this new perspective, we propose a metric to study the impact of expansion by estimating the size of the manifold. Experimental results show a clear relationship between gains in performance and manifold size, enabling the comparison of candidate models and presenting a first step towards expanding models more reliably based on geometric properties of the loss landscape. |
9:45am - 10:15am |
Razvan Pascanu (Google DeepMind) | A look at In-Context Learning | In this talk I will try to present in-context learning as a form of meta-learning behaviour that naturally emerges in architectures that have memory. I will present one perspective of the underlying mechanism, namely that of it being a form of gradient descent, as well as mentioning the alternative behaviour of knn. With these in mind, I would move forward and discuss ICL in the context of reinforcement learning. In particular I will show certain limitations of the ICL behaviour in RL and point to open problems in this context. If time permits, I will end the talk with a discussion on the interaction between ICL and continual learning, and try to motivate why the intersection between these topics deserves more attention. |
10:15am - 10:45am |
Coffee break | ||
10:45am - 11:15am |
Ross Goroshin (Google DeepMind) | Learning Representations of Non-Linear Dynamical Systems | Koopman representations aim to learn features of nonlinear dynamical systems (NLDS) which lead to linear dynamics in the latent space. Theoretically, such features can be used to simplify many problems in modeling and control of NLDS. In this work we study autoencoder formulations of this problem, and different ways they can be used to model dynamics, specifically for future state prediction over long horizons. We discover several limitations of predicting future states in the latent space and propose an inference-time mechanism, which we refer to as Periodic Reencoding, for faithfully capturing long term dynamics. We justify this method both analytically and empirically via experiments in low and high dimensional NLDS. |
11:15am - 11:45am |
Prashant Govindarajan (CRL) | Reinforcement Learning for Material Discovery | Solid-state materials, which are made up of periodic 3D crystal structures, are particularly useful for a variety of real-world applications such as batteries, fuel cells, and catalytic materials. Designing solid-state materials, especially in a robust and automated fashion, remains an ongoing challenge. Navigating through the exponentially large chemical space to search for desirable materials is an extremely challenging task in material discovery. Recent developments in generative and geometric deep learning have shown promising results in molecule and material discovery but often lack evaluation with high-accuracy computational methods. This work aims to design novel and stable crystalline materials conditioned on a desired band gap. To achieve conditional generation, we: (1) formulate crystal design as a sequential decision-making problem and use conservative Q-learning to learn a conditional policy. To do so, we formulate a reward function that incorporates constraints for energetic and electronic properties obtained directly from density functional theory (DFT) calculations; (2) evaluate the generated materials from the policy using DFT calculations for both energy and band gap; (3) compare our results to relevant baselines. Our experiments show that conditioned policies achieve targeted crystal design and demonstrate the capability to perform crystal discovery evaluated with accurate and computationally expensive DFT calculations. |
11:45am - 12:15pm |
Sai Krishna Gottipati (AI Redefined) | Grounded Language Instruction through DEmonstration in RL (GLIDE-RL) | We introduce a teacher-instructor-student curriculum learning framework for training an RL agent capable of following natural language instructions that can generalize to previously unseen language instructions. In this multi-agent framework, the teacher and the student agents learn simultaneously based on the student’s current skill level. We further demonstrate the necessity for training the student agent with not just one, but multiple teacher agents. I will further show some ongoing experiments on the effectiveness for pre-training language models for faster learning of the student. |
12:15pm - 1:30pm |
Lunch break | ||
1:30pm - 2:30pm |
Ahmad Beirami (Google DeepMind) | Keynote: Language Model Alignment: Theory & Algorithms | The goal of the language model alignment (post-training) process is to draw samples from an aligned distribution that improves a reward (e.g., make the generation safer) but does not perturb much from the base model. A simple baseline for this task is best-of-N, where N responses are drawn from the base model, ranked based on a reward, and the highest ranking one is selected. More sophisticated techniques generally solve a KL-regularized reinforcement learning (RL) problem with the goal of maximizing expected reward subject to a KL divergence constraint between the aligned model and the base model. In this talk, we give an overview of language model alignment and give an understanding of key results in this space through simplified examples. We also present a new modular alignment technique, called controlled decoding, which solves the KL-regularized RL problem while keeping the base model frozen through learning a prefix scorer, offering inference-time configurability. Finally, we also shed light on the remarkable performance of best-of-N in terms of achieving competitive or even better reward-KL tradeoffs when compared to state-of-the-art alignment baselines. |
2:30pm - 3:00pm |
Coffee break | ||
3:00pm - 4:00pm |
Sarath Chandar (CRL) | Continual Learning, Lifelong Learning, and All that | TBA |
4:00pm - 4:30pm |
Mohammad Reza Samsami (CRL) | Mastering Memory Tasks with World Models | Current model-based reinforcement learning (MBRL) agents struggle with long-term dependencies. This limits their ability to effectively solve tasks involving extended time gaps between actions and outcomes, or tasks demanding the recalling of distant observations to inform current actions. To improve temporal coherence, we integrate a new family of state space models (SSMs) in world models of MBRL agents to present a new method, Recall to Imagine (R2I). This integration aims to enhance both long-term memory and long-horizon credit assignment. Through a diverse set of illustrative tasks, we systematically demonstrate that R2I not only establishes a new state-of-the-art for challenging memory and credit assignment RL tasks, such as BSuite and POPGym, but also showcases superhuman performance in the complex memory domain of Memory Maze. At the same time, it upholds comparable performance in classic RL tasks, such as Atari and DMC, suggesting the generality of our method. We also show that R2I is faster than the state-of-the-art MBRL method, DreamerV3, resulting in faster wall-time convergence. |
4:30pm - 5:00pm |
Maryam Hashemzadeh (CRL) | Sub-goal Distillation: A Method to Improve Small Language Agents | Large Language Models (LLMs) are limited by high computational costs in long-horizon tasks. To mitigate this, we propose transferring LLM performance to a smaller model (770M parameters) using a hierarchical agent with planning and execution modules. The planning module learns through Knowledge Distillation from an LLM to generate sub-goals and the execution module learns to accomplish these sub-goals using elementary actions. This approach reduces reliance on real-time LLM access and cuts costs. We showed that, in ScienceWorld which is a challenging and multi-task interactive text environment, our method surpasses standard imitation learning by 16.7% (absolute). Our analysis highlights the efficiency of our approach compared to other LLM-based methods. |
Day 2 (August 20)
Time | Speaker | Topic | Abstract |
---|---|---|---|
9:00am - 9:30am |
Abdelrahman Zayed (CRL) | Why Don’t Prompt-Based Fairness Metrics Correlate? | The widespread use of large language models has brought up essential questions about the potential biases these models might learn. This led to the development of several metrics aimed at evaluating and mitigating these biases. In this paper, we first demonstrate that prompt-based fairness metrics exhibit poor agreement, as measured by correlation, raising important questions about the reliability of fairness assessment using prompts. Then, we outline six relevant reasons why such a low correlation is observed across existing metrics. Based on these insights, we propose a method called Correlated Fairness Output (CAIRO) to enhance the correlation between fairness metrics. CAIRO augments the original prompts of a given fairness metric by using several pre-trained language models and then selects the combination of the augmented prompts that achieves the highest correlation across metrics. We show a significant improvement in Pearson correlation from 0.3 and 0.18 to 0.90 and 0.98 across metrics for gender and religion biases, respectively. |
9:30am - 10:00am |
Gabriele Prato (CRL) | On the Wholesome Understanding of Training Datasets | One of the essential functions of a Large Language Model (LLM) is to develop a deep understanding of the world. However, the process of training these models on individual text documents raises important questions about how effectively they can achieve this understanding. In this talk, we will explore the challenges that standard pre-training methods present to an LLM’s ability to comprehensively grasp the entirety of its training corpus. Additionally, we will propose a potential approach to address and mitigate these challenges, aiming to enhance the model’s overall comprehension. |
10:00am - 10:30am |
Jerry Huang (CRL) | Context-Aware Assistant Selection for Improved Inference Acceleration with Large Language Models | Despite their widespread adoption, large language models (LLMs) remain prohibitive to use under resource constraints, among many reasons being the high latency associated with auto-regressive generation. Assisted decoding, where a small draft model guides a larger target model’s generation, has helped alleviate this, but remains dependent on alignment between the two models. Thus if the draft model is insufficiently capable on some domain relative to the target model, performance can degrade. Alternatively, one can leverage multiple draft models to better cover the expertise of the target, but when multiple black-box draft models are available, selecting an assistant without details about its construction can be difficult. We observe this problem as a contextual bandit, where a policy must choose a draft model based on a context, and show that even without prior knowledge of the draft models, training a policy over the alignment of these outputs can accelerate performance on multiple domains provided the candidates are effective. Further results show this to hold on various settings with multiple assisted decoding candidates, highlighting its flexibility and the advantageous role that such decision making can play. |
10:30am - 11:00am |
Coffee break | ||
11:00am - 11:30pm |
Pin-Yu Chen (IBM) | Exploring and Mitigating Safety Risks in Large Language Models and Generative AI | Large language models (LLMs) and Generative AI (GenAI) are at the forefront of current AI research and technology. With their rapidly increasing popularity and availability, challenges and concerns about their misuse and safety risks are becoming more prominent than ever. In this talk, I will provide new tools and insights to explore and mitigate the safety and robustness risks associated with state-of-the-art LLMs and GenAI models. In particular, I will cover (i) safety risks in fine-tuning LLMs, (ii) LLM jailbreak mitigation, (iii) prompt engineering for safety debugging, and (iv) robust detection of AI-generated text from LLMs. |
11:30am - 12:00pm |
Kamran Chitsaz (CRL) | Exploring Quantization for Efficient Pre-Training of Transformer Language Models | The increasing scale of Transformer models has led to an increase in their pre-training computational requirements. While quantization has proven to be effective after pre-training and during fine-tuning, applying quantization in Transformers during pre-training has remained largely unexplored at scale for language modeling. This study aims to explore the impact of quantization for efficient pre-training of Transformers, with a focus on linear layer components. By systematically applying straightforward linear quantization to weights, activations, gradients, and optimizer states, we assess its effects on model efficiency, stability, and performance during training. By offering a comprehensive recipe of effective quantization strategies to be applied during the pre-training of Transformers, we promote high training efficiency from scratch while retaining language modeling ability. |
12:00pm - 1:30pm |
Lunch break | ||
1:30pm - 2:00pm |
Megh Thakkar (CRL) | A Deep Dive into the Trade-Offs of Parameter-Efficient Preference Alignment Techniques | Large language models are first pre-trained on trillions of tokens and then instruction-tuned or aligned to specific preferences. While pre-training remains out of reach for most researchers due to the compute required, fine-tuning has become affordable thanks to parameter-efficient methods such as LoRA and QLoRA. Alignment is known to be sensitive to the many factors involved, including the quantity and quality of data, the alignment method, and the adapter rank. However, there has not yet been an extensive study of their effect on downstream performance. To address this gap, we conduct an in-depth investigation of the impact of popular choices for three crucial axes: (i) the alignment dataset (HH-RLHF and BeaverTails), (ii) the alignment technique (SFT and DPO), and (iii) the model (LLaMA-1, Vicuna-v1.3, Mistral-7b, and Mistral-7b-Instruct). Our extensive setup spanning over 300 experiments reveals consistent trends and unexpected findings. We observe how more informative data helps with preference alignment, cases where supervised fine-tuning outperforms preference optimization, and how aligning to a distinct preference boosts performance on downstream tasks. Through our in-depth analyses, we put forward key guidelines to help researchers perform more effective parameter-efficient LLM alignment. |
2:00pm - 2:30pm |
Andreas Madsen (CRL) | Are self-explanations from Large Language Models faithful? | Large language models are increasingly being used by the public, in the form of chat models. These chat systems often provide detailed and highly convincing explanations for their answers, even when not explicitly prompted to do so. This makes users more confident in these models. However, it’s unclear if these explanations reflect the model’s behavior. We measure the truthfulness (i.e. interpretability-faithfulness) of the explanations that LLMs provide, so called self-explanations. We do so by holding the models accountable to their own explanations, using self-consistency checks. We find that the truthfulness is highly dependent on the model and the specific task. Suggesting we should not have general confidence in their explanations. |
2:30pm - 3:00pm |
Coffee break | ||
3:00pm - 3:30pm |
Quentin Fournier | Protein Language Models: Curation Challenges Scale | Protein language models (pLMs) learn the complex fitness landscape explored by natural selection through the records left behind in protein databases, enabling tasks like property prediction and protein design. Following the scaling trend seen in natural language processing, larger pLMs have been pre-trained on larger datasets. However, the premise that scale leads to better performance is caveated in pLMs by the risk of deviating from the consensus found in homologs. More crucially, the assumption that databases are representative of the fitness landscape is likely false. By developing an efficient codebase, designing a modern architecture, and addressing the sample bias, we introduce AMPLIFY, a best-in-class pLM that is 17x less expensive to train and up to 2,000x faster for inference than ESM2 15B, its closest competitor. AMPLIFY exhibits the emergent property of being a zero-shot classification between non-proteins and disordered proteins. |
3:30pm - 4:00pm |
Artem Zholus | BindGPT: A Scalable Framework for 3D Molecular Design via Language Modeling and Reinforcement Learning | BindGPT is a new framework for building drug discovery models that leverages compute-efficient pretraining, supervised funetuning, prompting, reinforcement learning, and tool use of LMs. This allows BindGPT to build a single pre-trained model that exhibits state-of-the-art performance in 3D Molecule Generation, 3D Conformer Generation, Pocket-Conditioned 3D Molecule Generation, posing them as downstream tasks for a pretrained model, while previous methods build task-specialized models without task transfer abilities. At the same time, thanks to the fast transformer inference technology, BindGPT is 2 orders of magnitude (100 times) faster than previous methods at generation. |
4:00pm - 5:00pm |
Closing remarks and snacks |