We welcome you to the sixth annual CRL symposium !

The CRL symposium is an annual event that showcases some highlights of the research work that happened in the Chandar Lab in the last year. The symposium will also have a keynote. The keynote talk for this year will be given by Prof. Mengye Ren (New York University)

Date: July 23-24

Time: 9 am to 5 pm EST

Mode: Hybrid (both remote and in-person)

Address: 6650 Saint-Urbain, Montréal, QC H2S 3H1

Room: Mila Agora

How to register? Please register on Eventbrite (it takes 1min) if you are planning to either attend in-person or remotely.

Contact: ekaterina.lobacheva@mila.quebec

Day 1 (July 23)

Time Speaker Topic Abstract
9:00am
-
9:15am
Sarath Chandar Opening remarks A welcome message with an overview of various research activities at CRL.
9:15am
-
9:45am
Davide Baldelli What Hangman Reveals About Language Agents: Private State and Probabilistic Calibration To host a game of Hangman, a language model must privately commit to a secret word and choose it with genuine randomness, two things current systems quietly fail at. In the first part of this talk, I formalize the former as Private State Interactive Tasks, prove that agents conditioned only on public history cannot guarantee both secrecy and consistency, and show that a private working memory restores it. In the second part, I turn to the question of randomness, showing that probabilistic calibration is a trainable capability whose gains transfer to open-ended stochastic generation.
9:45am
-
10:15am
Darshan Patil CoPeP: Benchmarking Continual Pretraining for Protein Language Models Protein language models (pLMs) are trained on large protein databases that are continuously updated by the biology community, motivating continual learning both to keep up with ever-growing data and to take advantage of the temporal meta-information created during this process. We introduce the Continual Pretraining of Protein Language Models (CoPeP) benchmark for evaluating continual learning approaches on pLMs at scale in an impactful real-world application.
10:15am
-
10:30am
  Coffee break  
10:30am
-
11:00am
Artem Zholus TAPNext++: What’s Next for Tracking Any Point (TAP)? Tracking-Any-Point (TAP) models aim to track any point through a video — a crucial task in AR/XR and robotics applications. The recently introduced TAPNext approach proposes an end-to-end, recurrent transformer architecture to track points frame-by-frame in a purely online fashion, demonstrating competitive performance at minimal latency. However, we show that TAPNext struggles with longer video sequences and also frequently fails to re-detect query points that reappear after being occluded or leaving the frame. In this work, we present TAPNext++, a model that tracks points in sequences that are orders of magnitude longer while preserving the low memory and compute footprint of the architecture.
11:00am
-
11:30am
Jerry Huang On the Uncertainty Calibration of Large Language Models Trusting deep neural networks requires consistency in how models quantify their uncertainty. However, large language models complicate this through multiple stages of abstract training as well as the gaps between training objectives and downstream evaluation. One major stage where issues arises is during instruction tuning. We first demonstrate how label smoothing can be used to ensure smaller calibration error but at the cost of limited trainability at latter stages. To address this, we propose an instance-specific smoothing adjustment to the loss regularization, allowing for trainable models that remain well calibrated throughout.
11:30am
-
12:00pm
Maryam Hashemzadeh Dialectics of Alignment: Harnessing Unsafe Knowledge for Dynamic Safety Routing Current alignment paradigms, reliant on data erasure and blanket refusals, often sacrifice epistemological depth and utility. To overcome this limitation, we introduce SafeMoE. This Mixture-of-Experts framework treats “unsafe” data as a valuable knowledge source rather than noise. By isolating harmful corpora into domain-specific LoRA experts and using a router trained on safe-informative primitives, SafeMoE synthesizes deep domain insights while strictly enforcing safety constraints. Our results show a >20% relative improvement in safe response rates and superior informativeness, demonstrating that robust safety is best achieved through the controlled integration, rather than the erasure, of unsafe knowledge.
12:00pm
-
1:15pm
  Lunch break  
1:15pm
-
2:15pm
Mengye Ren (New York University) Keynote: The Always-Learning Machine Today’s AI models acquire most of their knowledge through offline, i.i.d. learning. In-context learning offers some capacity for online adaptation, but a crucial question remains: can models keep learning at deployment, or even learn from scratch, through continuous streams of experience? In this talk, I will present several recent efforts toward building always-learning machines for perception and planning. Starting with experiential video streams, I show that event segmentation—clustering event concepts in lifelong video—enables effective visual representation learning and event recognition from scratch. In JEPA world models, always-learning can yield rapid test-time learning and generalization for planning. Finally, I will discuss my recent work on creative exploration, and on linking always learning and world modeling to the self.
2:15pm
-
2:45pm
  TBD  
2:45pm
-
3:00pm
  Coffee break  
3:00pm
-
3:30pm
Istabrak Abbes What Does Layer-Importance Reveal About Transformers and State-Space Models? We study how far layer-importance analysis developed for transformers transfers to state-space models. We decompose layer importance into Necessity, how much the pretrained model depends on a layer as measured by the loss from bypassing it, and Plasticity, where fine-tuning concentrates task-specific weight updates. The two families diverge: in residual transformers, Necessity and Plasticity anti-align across depth, whereas in Mamba-style SSMs they overlap. The sign of this alignment predicts adaptation behavior—transformers update their most plastic layers, increasing catastrophic forgetting, while this tier-dependent effect disappears in SSMs.
3:30pm
-
4:00pm
Saurav Jha (+Nilaksh/Artem Zholus) Reconstruction or Semantics? What Makes a Latent Space Useful for Robotic World Models World model-based policy evaluation is a practical proxy for testing real-world robot control. As these models increasingly adopt latent diffusion modeling (LDM), choosing the right latent space becomes critical. While the status quo uses autoencoding latent spaces like VAEs that are primarily trained for pixel reconstruction, recent work suggests benefits from pretrained encoders with representation-aligned semantic latent spaces. We systematically evaluate these latent spaces for action-conditioned LDM by comparing six reconstruction and semantic encoders to train world model variants under a fixed protocol on BridgeV2 dataset, and show effective world model training in high-dimensional representation spaces.

Day 2 (July 24)

Time Speaker Topic Abstract
9:00am
-
9:30am
Nilaksh Squeezing More from the Stream : Learning Representation Online for Streaming Reinforcement Learning In streaming Reinforcement Learning (RL), agents learn from data once and immediately discard it. This saves memory for on-device applications but makes learning highly inefficient, as it is difficult to extract deep patterns from fleeting data. To get the most out of every observation, we adapt Self-Predictive Representations (SPR) for streaming RL. Because streaming data arrives in a highly correlated sequence, simply adding SPR causes training instability. We solve this by adjusting the network’s learning updates to prevent conflicting signals. Tested across the Atari, MinAtar, and Octax benchmarks, our approach consistently outperforms existing streaming methods.
9:30am
-
10:00am
Diego Cerda Mardini Consistent but Miscalibrated: Evaluating LLM Limitations for Risk Communication in Natural Language Whether LLMs are reliable explainers of probabilistic information in natural language remains unclear. This requires consistent descriptors for identical inputs, and descriptors that reflect underlying magnitudes. We evaluated nine LLMs on selecting verbal descriptors for simulated probabilistic predictions across six domains and multiple inference settings. Models were consistent but miscalibrated, performing worse for uncertainty than likelihood. Precomputed summary statistics did not improve calibration, locating the performance bottleneck to the verbalization layer. To conclude, current LLMs are not yet reliable zero-shot explanators of probabilistic outputs.
10:00am
-
10:15am
  Coffee break  
10:15am
-
10:45am
Alex Aselstyne A systematic analysis of machine learning pipelines for robust antimicrobial resistance prediction Antimicrobial resistance (AMR), the ability for bacteria to survive antibiotic exposure, poses an increasing public health risk. Predicting resistance from whole-genome sequencing using machine learning models has emerged as a promising direction, yet the influence of representation and model design on predictive performance remains understudied. Our systematic evaluation of the ML AMR prediction pipeline shows that tuned XGBoost models with k-mer representations are a robust and interpretable option, supporting the utility and biological validity of ML for AMR prediction.
10:45am
-
11:15am
Aidan Li Sparse Koopman Autoencoders Identify Local Dynamical Regimes in Multibasin Systems Koopman autoencoders (KAEs) forecast nonlinear dynamics by learning higher-dimensional latent representations with linear evolution. However, multibasin dynamical systems generally lack a single finite-dimensional global Koopman embedding. In this talk, we discuss how sparsity-inducing encoders can make latent supports serve as inspectable, label-free basin variables in Sparse Koopman Autoencoders (SKAEs), without basin labels or regime annotations being provided during training.
11:15am
-
11:30am
  Coffee break  
11:30am
-
12:00pm
Kamran Chitsaz The Markovian Thinker RL for reasoning LLMs has a trivial underlying RL environment (MDP) that treats the state as the whole prompt plus all past thinking tokens. That state keeps growing, which make the computation cost quadratic. We propose Markovian Thinking paradigm, where the state size remains bounded/fixed. This by design, and no matter the policy architecture, makes the compute cost linear with the number of thinking tokens, and memory stays flat.
12:00pm
-
12:30pm
Behnoush Khavari The Expressive Limits of Diagonal SSMs for State-Tracking We study the expressivity of input-Dependent Complex-valued Diagonal (DCD) SSMs, such as Mamba, Mamba-2 and Mamba-3, on sequential state-tracking tasks. We show that single-layer DCD SSMs cannot express state-tracking of any non-Abelian group at finite precision, and k-layer DCD SSMs can express state-tracking of a group iff that group has a subnormal series of length k, with Abelian factors. Empirically, we find that multi-layer models often fail to learn state-tracking for non-Abelian groups, highlighting a gap between expressivity and learnability.
12:30pm
-
1:45pm
  Lunch break  
1:45pm
-
2:15pm
Hadi Nekoei Shielded Controller Units for RL with Operational Constraints Applied to Remote Microgrids Remote microgrids require coordinating renewable generation, batteries, and fuel generators under uncertainty while respecting strict operational constraints. We introduce Shielded Controller Units (SCUs), an interpretable framework that uses system knowledge to enforce safety and regulatory constraints during RL control. SCUs decompose the environment into a hierarchy, with each unit responsible for a specific subset of constraints. On a real-world microgrid task, SCUs enable RL to reduce fuel consumption by 24% without increasing battery degradation, outperforming industry heuristics and constrained RL baselines while maintaining full constraint satisfaction.
2:15pm
-
2:45pm
Darshan Patil Loss Smoothing for Stable Adaptation Under Distribution Shift Neural networks are often adapted under distribution shift. Standard adaptation methods typically optimize the target objective directly, inducing an abrupt change from the source training objective. We propose loss smoothing, a simple approach that interpolates between the source and target training objectives at the start of adaptation. Across adaptation regimes such as (offline and online) RL and LLM finetuning, we find that loss smoothing consistently improves performance, suggesting that smoother objective transitions are a broadly useful tool for model adaptation.
2:45pm
-
5:00pm
  Closing remarks and social