Publications | Apprentissage profond par renforcement

Prépublications

Mem-π: Adaptive Memory through Learning When and What to Generate
Xiaoqiang Wang, Chao Wang, Hadi Nekoei, Christopher Pal, Alexandre Lacoste, Spandana Gella, Bang Liu et Perouz Taslakian
In ArXiv, 2026.
#RL, #NLP
[arXiv]
Reconstruction or Semantics? What Makes a Latent Space Useful for Robotic World Models
Nilaksh*, Saurav Jha*, Artem Zholus* et Sarath Chandar
In ArXiv, 2026.
#DL, #RL
[arXiv]
Hierarchical Planning with Latent World Models
Wancong Zhang, Basile Terver, Artem Zholus, Soham Chitnis, Harsh Sutaria, Mido Assran, Randall Balestriero, Amir Bar, Adrien Bardes, Yann LeCun et Nicolas Ballas
In ArXiv, 2026.
#DL, #RL
[arXiv], [website], [code]
Just-in-time Episodic Feedback Hinter: Leveraging Offline Knowledge to Improve LLM Agents Adaptation
Hadi Nekoei, Aman Jaiswal, Patrice Bechard, Oleh Shliazhko, Orlando Marquez Ayala, Mathieu Reymond, Massimo Caccia, Alexandre Drouin, Sarath Chandar et Alexandre Lacoste
In ArXiv, 2025.
#NLP, #RL
[arXiv]
GRPO-λ: Credit Assignment improves LLM Reasoning
Prasanna Parthasarathi*, Mathieu Reymond*, Boxing Chen, Yufei Cui et Sarath Chandar
In ArXiv, 2025.
#NLP, #RL
[arXiv]
CrystalGym: A New Benchmark for Materials Discovery Using Reinforcement Learning
Prashant Govindarajan, Mathieu Reymond, Antoine Clavaud, Mariano Phielipp, Santiago Miret et Sarath Chandar
In ArXiv, 2025.
#RL, #Other
[arXiv], [code]
Balancing Profit and Fairness in Risk-Based Pricing Markets
Jesse Thibodeau, Hadi Nekoei, Afaf Taïk, Janarthanan Rajendran et Golnoosh Farnadi
In ArXiv, 2025.
#RL
[arXiv]
Maximum Reward Formulation In Reinforcement Learning
Sai Krishna Gottipati, Yashaswi Pathak, Rohan Nuttall, Raviteja Chunduru, Ahmed Touati, Sriram Ganapathi Subramanian, Matthew E Taylor et Sarath Chandar
In arXiv, 2020.
#RL
[arXiv]

Articles de conférence et de revue

2026

Squeezing More from the Stream: Learning Representation Online for Streaming Reinforcement Learning
Nilaksh*, Antoine Clavaud*, Mathieu Reymond, François Rivest et Sarath Chandar
International Conference on Machine Learning (ICML), 2026.
#RL, #DL
[openreview], [arXiv], [code]
Unraveling the Complexity of Memory in RL Agents: An Approach for Classification and Evaluation
Egor Cherepanov, Nikita Kachaev, Artem Zholus, Alexey K. Kovalev et Aleksandr I. Panov
International Conference on Learning Representations (ICLR), 2026.
#RL
[openreview], [arXiv]
The Markovian Thinker: Architecture-Agnostic Linear Scaling of Reasoning
Milad Aghajohari, Kamran Chitsaz, Amirhossein Kazemnejad, Sarath Chandar, Alessandro Sordoni, Aaron Courville et Siva Reddy
International Conference on Learning Representations (ICLR), 2026.
#RL, #NLP
[openreview], [arXiv], [code]

2025

How to Train Your LLM Web Agent: A Statistical Diagnosis
Dheeraj Vattikonda, Santhoshi Ravichandran, Emiliano Penaloza, Hadi Nekoei, Megh Thakkar, Thibault Le Sellier de Chezelles, Nicolas Gontier, Miguel Muñoz-Mármol, Sahar Omidi Shayegan, Stefania Raimondo, Xue Liu, Alexandre Drouin, Laurent Charlin, Alexandre Piché, Alexandre Lacoste et Massimo Caccia
Conference on Neural Information Processing Systems (NeurIPS), 2025.
#NLP, #RL
[openreview], [arXiv]
Rendering-Aware Reinforcement Learning for Vector Graphics Generation
Juan A. Rodriguez, Haotian Zhang, Abhay Puri, Aarash Feizi, Rishav Pramanik, Pascal Wichmann, Arnab Mondal, Mohammad Reza Samsami, Rabiul Awal, Perouz Taslakian, Spandana Gella, Sai Rajeswar, David Vazquez, Christopher Pal et Marco Pedersoli
Conference on Neural Information Processing Systems (NeurIPS), 2025.
#NLP, #RL
[openreview], [arXiv]
Shielded Controller Units for RL with Operational Constraints Applied to Remote Microgrids
Hadi Nekoei, Alexandre Blondin Massé, Rachid Hassani, Sarath Chandar et Vincent Mai
Reinforcement Learning Conference (RLC), 2025.
#RL
[arXiv], [code]
Boosting LLM Reasoning via Spontaneous Self-Correction
Xutong Zhao, Tengyu Xu, Xuewei Wang, Zhengxing Chen, Di Jin, Liang Tan, Yen-Ting, Zishun Yu, Zhuokai Zhao, Yun He, Sinong Wang, Han Fang, Sarath Chandar et Chen Zhu
Conference on Language Modeling (COLM), 2025.
#NLP, #RL
[openreview], [arXiv]
A Generalist Hanabi Agent
Arjun Vaithilingam Sudhakar*, Hadi Nekoei*, Mathieu Reymond, Miao Liu, Janarthanan Rajendran et Sarath Chandar
International Conference on Learning Representations (ICLR), 2025.
#RL
[website], [openreview], [arXiv], [code]
BindGPT: A Scalable Framework for 3D Molecular Design via Language Modeling and Reinforcement Learning
Artem Zholus, Maksim Kuznetsov, Roman Schutski, Rim Shayakhmetov, Daniil Polykovskiy, Sarath Chandar et Alex Zhavoronkov
AAAI Conference on Artificial Intelligence (AAAI), 2025. [Best poster award]
#DL, #RL
[website], [aaai], [arXiv], [code], [YouTube]

2024

Balancing Context Length and Mixing Times for Reinforcement Learning at Scale
Matthew Riemer, Khimya Khetarpal, Janarthanan Rajendran et Sarath Chandar
Conference on Neural Information Processing Systems (NeurIPS), 2024.
#RL
[neurips], [openreview]
Toward Debugging Deep Reinforcement Learning Programs with RLExplorer
Rached Bouchoucha, Ahmed Haj Yahmed, Darshan Patil, Janarthanan Rajendran, Amin Nikanjam, Sarath Chandar et Foutse Khomh
International Conference on Software Maintenance and Evolution (ICSME), 2024.
#RL
[arXiv]
Sub-goal Distillation: A Method to Improve Small Language Agents
Maryam Hashemzadeh, Elias Stengel-Eskin, Sarath Chandar et Marc-Alexandre Cote
Conference on Lifelong Learning Agents (CoLLAs), 2024. [Oral presentation.]
#RL, #NLP
[arXiv], [code]
Partial Models for Building Adaptive Model-Based Reinforcement Learning Agents
Safa Alver, Ali Rahimi-Kalahroudi et Doina Precup
Conference on Lifelong Learning Agents (CoLLAs), 2024.
#RL
[pmlr], [arXiv]
Mastering Memory Tasks with World Models
Mohammad Reza Samsami*, Artem Zholus*, Janarthanan Rajendran et Sarath Chandar
International Conference on Learning Representations (ICLR), 2024. [Oral presentation.]
#RL, #DL
[openreview], [arXiv], [code]
Intelligent Switching for Reset-Free RL
Darshan Patil, Janarthanan Rajendran, Glen Berseth et Sarath Chandar
International Conference on Learning Representations (ICLR), 2024.
#RL
[openreview], [arXiv], [code]
Learning Conditional Policies for Crystal Design Using Offline Reinforcement Learning
Prashant Govindarajan, Santiago Miret, Jarrid Rector-Brooks, Mariano Phielipp, Janarthanan Rajendran et Sarath Chandar
Digital Discovery Journal, 2024.
#RL
[paper]

2023

Multi-Agent Reinforcement Learning for Fast-Timescale Demand Response of Residential Loads
Vincent Mai, Philippe Maisonneuve, Tianyu Zhang, Hadi Nekoei, Liam Paull et Antoine Lesage-Landry
Machine Learning, 2023.
[International Conference on Autonomous Agents and Multiagent Systems (AAMAS) Extended Abstracts, 2023]
#RL
[springer], [acm], [arXiv]
Replay Buffer with Local Forgetting for Adapting to Local Environment Changes in Deep Model-Based Reinforcement Learning
Ali Rahimi-Kalahroudi, Janarthanan Rajendran, Ida Momennejad, Harm van Seijen et Sarath Chandar
Conference on Lifelong Learning Agents (CoLLAs), 2023.
[Deep Reinforcement Learning Workshop @ NeurIPS, 2022]
#RL
[pmlr], [arXiv]
Towards Few-shot Coordination: Revisiting Ad-hoc Teamplay Challenge In the Game of Hanabi
Hadi Nekoei, Xutong Zhao, Janarthanan Rajendran, Miao Liu et Sarath Chandar
Conference on Lifelong Learning Agents (CoLLAs), 2023.
#RL
[pmlr], [arXiv]
Dealing With Non-stationarity in Decentralized Cooperative Multi-Agent Deep Reinforcement Learning via Multi-Timescale Learning
Hadi Nekoei, Akilesh Badrinaaraayanan, Amit Sinha, Mohammad Amini, Janarthanan Rajendran, Aditya Mahajan et Sarath Chandar
Conference on Lifelong Learning Agents (CoLLAs), 2023.
#RL
[pmlr], [arXiv]
Conditionally Optimistic Exploration for Cooperative Deep Multi-Agent Reinforcement Learning
Xutong Zhao, Yangchen Pan, Chenjun Xiao, Sarath Chandar et Janarthanan Rajendran
Conference on Uncertainty in Artificial Intelligence (UAI), 2023.
#RL
[pmlr], [arXiv]

2022

Combining Reinforcement Learning and Constraint Programming for Sequence-Generation Tasks with Hard Constraints
Daphné Lafleur, Sarath Chandar et Gilles Pesant
International Conference on Principles and Practice of Constraint Programming (CP), 2022.
#RL
[paper]
Towards Evaluating Adaptivity of Model-Based Reinforcement Learning Methods
Yi Wan*, Ali Rahimi-Kalahroudi*, Janarthanan Rajendran, Ida Momennejad, Sarath Chandar et Harm van Seijen
International Conference on Machine Learning (ICML), 2022.
#RL
[pmlr], [arXiv], [code]

2021

Continuous Coordination As a Realistic Scenario for Lifelong Learning
Hadi Nekoei, Akilesh Badrinaaraayanan, Aaron Courville et Sarath Chandar
International Conference on Machine Learning (ICML), 2021.
#RL
[pmlr], [arXiv], [code]
Towered Actor Critic for Handling Multiple Action Types in Reinforcement Learning For Drug Discovery
Sai Krishna Gottipati, Yashaswi Pathak, Boris Sattarov, Sahir, Rohan Nuttall, Mohammad Amini, Matthew E. Taylor et Sarath Chandar
AAAI Conference on Artificial Intelligence (AAAI), 2021.
#RL, #Other
[aaai]

2020

The LoCA Regret: A Consistent Metric to Evaluate Model-Based Behavior in Reinforcement Learning
Harm van Seijen, Hadi Nekoei, Evan Racah et Sarath Chandar
Conference on Neural Information Processing Systems (NeurIPS), 2020.
#RL
[neurips], [arXiv], [code]
Learning To Navigate The Synthetically Accessible Chemical Space Using Reinforcement Learning
Sai Krishna Gottipati*, Boris Sattarov*, Sufeng Niu, Yashaswi Pathak, Haoran Wei, Shengchao Liu, Karam MJ Thomas, Simon Blackburn, Connor W Coley, Jian Tang, Sarath Chandar et Yoshua Bengio
International Conference on Machine Learning (ICML), 2020.
#RL
[pmlr], [arXiv]