Reinforcement learning in financial companies: reconciling performance and ethics

Reinforcement Learning (RL) is a branch of Machine Learning (ML) that is still little known to the general public. In this approach, an agent repeats actions and, depending on the result of these actions, receives a reward or punishment in the form of a score. Focussed on maximising the score, the agent adapts the next action (reinforcing what works, abandoning what fails) and thus achieves the programmed objective.

Take, for example, a robot that we want to teach to walk using Reinforcement Learning: if anything other than the robot’s feet touches the ground, it is punished; if it moves forward, it is rewarded. By making it repeat these attempts in a controlled environment, it learns from its mistakes and successes thanks to an unambiguous system of rewards and punishments.

The recent RL experiments that have been revealed to the public are certainly making headlines. Back in 2017, AlphaGO managed to beat the world champion at the GO game. This announcement alone illustrates the disruptive potential of this technology, which is profoundly reshuffling the cards of model design and understanding.

Reinforcement Learning: a stunning arrival in the world of finance

Recently, RL has entered the world of finance with the promise of performance in a variety of areas. It has made its mark in two areas: trading algorithms and portfolio management. In both cases, the model is rewarded or punished according to predefined performance indicators and learns by repeated trial and error how to invest to maximise its score.

For example, Deep Reinforcement Learning for Portfolio Management on Environmental, Social, and Governance (DRLPMESG) uses RL to manage a portfolio based on ESG criteria, and has produced some interesting results:

Portfolio with 5 best-performing stocks:

  • Annualised return: 46.58%
  • Sharpe ratio: 2.44
  • Cumulative return: 17%

This model was presented in a study entitled Deep Reinforcement Learning for ESG financial portfolio management in June 2023. The results are promising, outperforming the DJIA (Dow-Jones Industrial Average) established as the benchmark in terms of performance targets. The harmonisation of ESG criteria calculations introduced by the CSRD acts as an increased performance lever for the DRLPMESG model.

By way of comparison, the ESG funds administered by Amundi (the world leader in asset management) achieved a return of 16.1% over the first half of 2023.

Use cases

Although Reinforcement Learning models are mainly used for trading algorithms in financial services, at Mazars we are exploring other application opportunities:

Use caseType of benefits
Trading algorithm and portfolio management (as seen above)Added value
Development of recommendation systems that learn to recommend products or services based on user preferences. Each time a recommendation is made, the model is rewarded in proportion to the level of engagement (click, purchase, etc.)Added value
Development of virtual assistants that learn to interact with users more naturally and effectively, being rewarded or punished by the users themselves according to their satisfaction with the responses provided.Added value
Development of a fraud detection agent (banking or other) that is rewarded for justified alerts and punished for false alerts. However, the rewards will have to be balanced so that the model can continue to make predictions.Risk reduction
Development of an agent that estimates market sentiment for a product launch or identifies customer opinions on several platforms. In this case, the agent is rewarded for a good analysis and punished for a bad one. Added value
Development of a cyber-attack detection agent. The use of RL might enable detection of cyber-attacks that are not listed, with an overall more robust detection. The agent is strongly rewarded for a good detection and punished for a bad one. Similarly, the reward system needs to be balanced to ensure that the agent continues to make predictions.Risk reduction

Reinforcement learning: performance at the price of ethics

A study carried out in August 2023 entitled ‘Insurance pricing on price comparison websites via reinforcement learning’ examines the variation in insurance premiums after the introduction of solutions based on RL. The new premium is determined each time the contract is renewed by the model using information gathered about customers, as well as the net balance of current contracts adjusted for the price elasticity of policyholders. By repeating the process, the model learns how to set a price and maximise the margin paid to insurers.

The market deployment of such a model remains hypothetical but nevertheless poses an ethical problem. Indeed, the sole aim of RL models is to increase their reward or reduce their punishment, and they could therefore prove discriminatory in achieving their objective. What’s more, most models learn continuously, so biases could appear several years after their deployment on the market. The predictability of the system, once emancipated from the training environment, will become increasingly limited, as will the ability to identify the inputs affecting changes to the model.

The AI Act, which comes into force in May 2024, regards the use of RL in financial services organisations as presenting a high level of risk. However, if the model exploits vulnerabilities linked to age, disability or socio-economic status, the risk becomes unacceptable. In this case, from November 2024, the model will be banned and penalties ranging from 1 to 7% of the annual global turnover of the ‘deployer’ (sic) will be imposed. For RL systems, which are ‘AI systems for general use’, it will be necessary to produce technical documentation, a risk management system and human supervision from February 2025, in the first phase of a legislative roll-out that extends until May 2025.

It is already possible for players in the financial sector to prepare for the adoption of this technology in order to enjoy a considerable head start in terms of performance and compliance. Firstly, it is advisable to understand how this technology works, so as to identify the key areas of the business where RL can add significant value. Next, it is a good idea to work with specialist organisations to integrate these solutions into existing systems and ensure that the RL applications implemented run smoothly. Finally, it is advisable to establish clear ethical guidelines from the outset to ensure transparency, fairness, regulatory compliance, and respect for privacy by the RL systems deployed.

As AI technologies, and in particular RL, spread at breakneck speed and affect a multitude of aspects of our daily lives, it is becoming imperative for financial services players (if they position themselves quickly enough) to make a difference using RL systems. However, the models will have to be developed responsibly and in compliance with the legislation that already applies.

RL systems will always find a more ingenious solution to achieve what they are programmed to do (score maximisation); regulators, consumers and companies will have to heighten their ingenuity and proceed with caution in order to understand and control the learning processes adopted and the effects they produce on their environment.

Mazars Forvis logo

Alexandre Di Lorenzo

Consultant - CIO Advisory, Financial Services - Paris, France