Mathieu Reymond

CrystalGym: A New Benchmark for Materials Discovery Using Reinforcement Learning

CrystalGym pioneers RL for material design by embedding expensive DFT calculations for property predictions directly into training—avoiding approximations used in generative methods. While current algorithms struggle with all tasks (e.g., band gap optimization), the environment establishes a benchmark for RL in noisy, slow-reward domains, focusing on problem aspects crucial for molecular generation.

Prashant Govindarajan, Mathieu Reymond, Antoine Clavaud, Mariano Phielipp, Santiago Miret, Sarath Chandar
AI for Accelerated Materials Design Workshop @ ICLR 2025 (to appear)

Page

A Generalist Hanabi Agent

Traditional MARL agents struggle to generalize or adapt to new partners. We propose R3D2, a generalist Hanabi agent that reformulates the task using text for better transfer and employs a distributed MARL algorithm to handle dynamic observation/action spaces. It achieves concurrent play across all game settings and successful collaboration with diverse algorithmic agents, a first in the field.

Arjun V Sudhakar, Hadi Nekoei, Mathieu Reymond, Miao Liu, Janarthanan Rajendran, Sarath Chandar
ICLR 2025 (to appear)

Page arXiv Code Video

Divide and Conquer: Provably Unveiling the Pareto Front with Multi-Objective Reinforcement Learning

We propose Iterated Pareto Referent Optimisation (IPRO), a method that decomposes multi-objective RL into constrained single-objective problems. IPRO guarantees convergence and provides bounds on solution quality, matching or outperforming existing methods. Its flexibility also extends to domains like planning and pathfinding.

Willem Röpke, Mathieu Reymond, Patrick Mannion, Diederik M. Roijers, Ann Nowé, Roxana Radulescu
AAMAS 2025 (to appear)

arXiv

Interactively learning the user's utility for best-arm identification in multi-objective multi-armed bandits

We propose MCBUL, a Monte-Carlo planning method for interactive multi-objective bandits that strategically optimizes when to query the decision-maker (to refine the utility model) versus when to explore (to evaluate policies). MCBUL outperforms baseline approaches with fixed query intervals, significantly improving the likelihood of identifying the true optimal policy.

Mathieu Reymond, Eugenio Bargiacchi, Diederik M. Roijers, Ann Nowé
AAMAS 2024

Page

Exploring the Pareto front of multi-objective COVID-19 mitigation policies using reinforcement learning

We apply multi-objective RL with Pareto Conditioned Networks to COVID-19 mitigation, developing policies that balance infection control and societal burden. Using Belgium’s first wave as a case study, we demonstrate how this approach automatically learns adaptive deconfinement strategies that reduce restrictions when hospitalizations are low, while maintaining epidemic control.

Mathieu Reymond, Conor F. Hayes, Lander Willem, Roxana Radulescu, Steven Abrams, Diederik M. Roijers, Enda Howley, Patrick Mannion, Niel Hens, Ann Nowé, Pieter Libin
Expert Systems with Applications 2024

Page Code

Local Advantage Networks for Multi-Agent Reinforcement Learning in Dec-POMDPs

Modern MARL methods have focused on finding factorized value functions. LAN takes a different approach by combining independent Q-learners with a transient centralized critic, enabling decentralized policies via advantage functions. It outperforms QPLEX on SMAC (80% wins in super-hard maps) while using fewer parameters, proving scalability need not require complex factorization.

Raphaël Avalos, Mathieu Reymond, Ann Nowé, Diederik M. Roijers
TMLR 2023

Page arXiv

WAE-PCN: Wasserstein-autoencoded Pareto Conditioned Networks

We combine Pareto Conditioned Networks (PCN) and Wasserstein auto-encoded MDPs (WAE-MDPs) to efficiently learn Pareto-optimal policies with formal safety and performance guarantees. This approach mitigates risks during exploration while ensuring robust trade-offs in multi-objective decision-making.

Mathieu Reymond*, Florent Delgrange*, Ann Nowé, Guillermo A. Pérez
Adaptive and Learning Agents Workshop @ AAMAS 2023

Page

Actor-critic multi-objective reinforcement learning for non-linear utility functions

We introduce a novel multi-objective RL algorithm that handles non-linear utility functions by learning a multi-variate return distribution, enabling direct optimization of complex utility functions without requiring a full Pareto front. Our method outperforms existing approaches on benchmark tasks, successfully solving problems where linear utility assumptions fail.

Mathieu Reymond, Conor F. Hayes, Diederik M. Roijers, Denis Steckelmacher, Ann Nowé
Autonomous Agents and Multi-Agent Systems 2023

Page arXiv Code

Monte Carlo tree search algorithms for risk-aware and multi-objective reinforcement learning

We introduce DMCTS, a risk-aware RL algorithm that optimizes policies by modeling posterior distributions over returns and planning with Thompson sampling. DMCTS explicitly accounts for outcome variability—crucial for safety-critical applications like medical decision-making. DMCTS outperforms SOTA approaches in multi-objective and risk-sensitive settings by better handling nonlinear utilities and uncertainty.

Conor F. Hayes, Mathieu Reymond, Diederik M. Roijers, Enda Howley, Patrick Mannion
Autonomous Agents and Multi-Agent Systems 2023

Page arXiv

Near On-Policy Experience Sampling in Multi-Objective Reinforcement Learning

Multi-objective RL faces convergence challenges when preference weights shift. We propose a novel experience sampling strategy that selects transitions based on weight and state similarity, keeping updates close to on-policy. Experiments on benchmark problems demonstrate significant performance improvements.

Shang Wang, Mathieu Reymond, Athirai A. Irissappane, Diederik M. Roijers
AAMAS 2022

Page

Pareto Conditioned Networks

We present Pareto Conditioned Networks (PCN), a scalable approach to multi-objective RL that encodes all Pareto-optimal policies in a single neural network. By framing policy learning as a supervised classification problem (conditioned on desired returns), PCN avoids exhaustive exploration while maintaining stability and making minimal assumptions about the Pareto front shape.

Mathieu Reymond, Eugenio Bargiacchi, Ann Nowé
AAMAS 2022

Page arXiv Code

A Practical Guide to Multi-Objective Reinforcement Learning and Planning

Real-world decisions require balancing conflicting objectives, yet RL often relies on oversimplified single-objective approaches. This guide enables researchers/practitioners to implement multi-objective methods. It identifies the factors that may influence the nature of the desired solution, and illustrates by example how these influence the design of multi-objective decision-making systems for complex problems.

Conor F. Hayes*, Roxana Radulescu*, Eugenio Bargiacchi, Johan Källström, Matthew Macfarlane, Mathieu Reymond, Timothy Verstraeten, Luisa Zintgraf, Richard Dazeley, Fredrik Heintz, Enda Howley, Athirai A. Irissappane, Patrick Mannion, Ann Nowé, Gabriel De Oliveira Ramos, Marcello Restelli, Peter Vamplew, Diederik M. Roijers
Autonomous Agents and Multi-Agent Systems

Page arXiv

Interactive Multi-Objective Reinforcement Learning in Multi-Armed Bandits with Gaussian Process Utility Models

We present Gaussian-process Utility Thompson Sampling (GUTS) for interactive multi-objective bandits. GUTS handles arbitrary non-linear preferences via parameterless Bayesian learning, incorporates monotonicity constraints, and minimizes user queries while maintaining statistical significance. Experiments show GUTS achieves sub-linear regret and query complexity while learning complex preferences.

Diederik M. Roijers, Luisa M. Zintgraf, Pieter Libin, Mathieu Reymond, Eugenio Bargiacchi, Ann Nowé
ECML-PKDD 2020

Page

Pareto-DQN: Approximating the Pareto front in complex multi-objective decision problems

We propose Pareto-DQN, a deep extension of the Pareto-Q Learning algorithm for estimating Pareto fronts in high-dimensional multi-objective RL. We cope with the dynamic size of the Pareto-optimal set by conditioning the network and sampling on the return-space. We demonstrate its effectiveness on the Deep-Sea-Treasure benchmark, and extend it to traffic control.

Mathieu Reymond, Ann Nowé
Adaptive and Learning Agents Workshop @ AAMAS 2019

Page

Reinforcement Learning for Demand Response of Domestic Household Appliances

We demonstrate that independent RL agents controlling multiple devices with difference rewards outperform centralized approaches in demand response systems, reducing (though not eliminating) grid constraint violations while optimizing electricity costs. Our approach extends prior work by incorporating household-level consumption limits to prevent grid overload during low-price periods.

Mathieu Reymond*, Christophe Patyn*, Roxana Radulescu, Ann Nowé, Geert Deconinck
Adaptive Learning Agents Workshop @ AAMAS 2018

Page

Mathieu Reymond

Publications