# Registered Data

## [00961] Reinforcement Learning for Financial Modeling

**Session Date & Time**: 5D (Aug.25, 15:30-17:10)**Type**: Proposal of Minisymposium**Abstract**: This minisymposium, sponsored by the SIAM activity group in Financial Mathematics, focuses on the development of novel reinforcement learning paradigms for solving problems in financial mathematics. The RL paradigm aims to approximate solutions to stochastic control problems in discrete time in a manner that is agnostic to the dynamics of environment and its response to agents’ actions. The collection of talks covers the incorporation of time-consistent risk-measures into RL, provides explicit error bounds on exploratory control, and develops a new approach to eliciting agents’ risk preferences in an novel inverse RL framework.**Organizer(s)**: Sebastian Jaimungal**Sponsor**: This session is sponsored by the SIAM Activity Group on Financial Mathematics and Engineering.**Classification**:__93E20__,__91G80__,__49N45__**Speakers Info**:- Lukasz Szpruch (University of Edinburgh )
- Mathieu Lauriere (NYU Shanghai )
- Ziteng Cheng (University of Toronto)
**Sebastian Jaimungal**(University of Toronto)

**Talks in Minisymposium**:**[03063] Reinforcement learning for mean field games and mean field control problems, with applications to finance****Author(s)**:**Mathieu Lauriere**(NYU Shanghai)

**Abstract**: Mean field games have been introduced to study Nash equilibria in large populations of strategic agents, while mean field control problems aim at modeling social optima in large groups of cooperative agents. These frameworks have found a wide range of applications, from economics and finance to social sciences and biology. In the past few years, the question of learning equilibria and social optima in a mean field setting has attracted a growing interest. In this talk, I will discuss several model-free methods based on reinforcement learning. Numerical experiments on stylized examples of financial models will be presented.

**[03255] Learning Risk Aversion with Inverse Reinforcement Learning via Interactive Questioning****Author(s)**:**Ziteng Cheng**(University of Toronto)- Anthony Coache (University of Toronto)
- Sebastian Jaimungal (University of Toronto)

**Abstract**: This paper proposes a novel framework for identifying an agent's risk aversion using interactive questioning. We assume that the agent's risk aversion is characterized by a spectral risk measure chosen from a finite set of candidates. We show that asking the agent to choose from a finite set of random costs, which may depend on their previous answers, is an effective means of identifying the agent's risk aversion. Specifically, we prove that the agent's risk aversion can be identified as the number of questions tends to infinity, and the questions are randomly designed. We also develop an algorithm for designing optimal questions and provide empirical evidence that our method learns risk aversion significantly faster than randomly designed questions in a simulated environment. Our framework has important applications in robo-advising and provides a new approach for identifying an agent's risk preferences.

**[04804] Fisher-Rao Gradient Descent for Stochastic Control Problems.****Author(s)**:**Lukasz Szpruch**(University of Edinburgh/The Alan Turing Institute )- David Siska (University of Edinburgh )
- Bekzhan Kerimkulov (University of Edinburgh )

**Abstract**: We study the convergence of Gradient and Mirror Descent schemes for approximating solutions to stochastic control problems with measure-valued controls in continuous time. By exploiting Pontryagin Optimality Principle, these rely on solving forward and backward (adjoint) equations and performing static optimisation problems regularised with Bregman divergence and can be interpreted as implicit and explicit discretisations of Fisher-Rao gradient flow. In the general (non-convex) case, we show that the objective function decreases along the gradient step. Moreover, in the (strongly) convex case, when Pontryagin Optimality Principle provides a sufficient condition for optimality, we prove that the objective converges at the (exponential) linear rate to its optimal value. The main technical difficulty is to show that stochastic control problem admits suitable relative smoothness and convexity properties. These are obtained by utilising the theory of Bounded Mean Oscillation (BMO) martingales required for estimates on the adjoint Backward Stochastic Differential Equation (BSDE).

**[05398] Risk Budgeting Allocation for Dynamic Risk Measures****Author(s)**:**Sebastian Jaimungal**(University of Toronto)- Silvana Manuela Pesenti (University of Toronto)
- Yuri Saporito (FGV)
- Rodrigo Targino (FGV)

**Abstract**: We develop an approach for risk budgeting allocation -- a risk diversification portfolio strategy -- where risk is measured using time-consistent dynamic risk measures. For this, we introduce a notion of dynamic risk contributions that generalise the classical Euler contributions and which allow us to obtain dynamic risk contributions in a recursive manner. Moreover, we prove that, for the class of dynamic coherent distortion risk measures, the risk allocation problem may be recast as a sequence of convex optimisation problems and, leveraging the elicitability of dynamic risk measures, develop an actor-critic approach to solve for risk budgeting strategy using deep learning.