Multi-Agent Deep Reinforcement Learning and GAN-Based Market Simulation for Derivatives Pricing and Dynamic Hedging

Massachusetts Institute of Technology


Advancements in computing capabilities have enabled machine learning algorithms to learn directly from large amounts of data. Deep reinforcement learning is a particularly powerful method that uses agents to learn by interacting with an environment of data. Although many traders and investment managers rely on traditional statistical and stochastic methods to price assets and develop trading and hedging strategies, deep reinforcement learning has proven to be an effective method to learn optimal policies for pricing and hedging. Machine learning removes the need for various parametric assumptions about underlying market dynamics by learning directly from data. This research examines the use of machine learning methods to develop a data-driven method of derivatives pricing and dynamic hedging. Nevertheless, machine learning methods like reinforcement learning require an abundance of data to learn. We explore the implementation of a generative adversarial network-based approach to generate realistic market data from past historical data. This data is used to train the reinforcement learning framework and evaluate its robustness. The results demonstrate the efficacy of deep reinforcement learning methods to price derivatives and hedge positions in the proposed systematic GAN-based market simulation framework.


Our model is composed of the following components:

  • SignNet Positional Encoder
    • In the architecture we determined to use SignNet to compute the positional encoding for each graph. We pre-computed the top 50 eigenvalues and corresponding eigenvectors of the laplacian matrix L of each graph. Then we used the com- puted top 50 eigenvectors (Nx50) as the input for the SignNet.
  • Spectral clustering
    • We trained a GNN model using pytorch-geometric and used the loss functions as the sum of minCUT Loss and orthogonal Loss to compute the cluster assignment for each graphs. We trained for each graph for about 50 iterations with a early stop threshold of loss decrease as 0.001. With the trained model we get the cluster assignments which we used in the next step to create virtual nodes and construct the heterogenous graph dataset.
  • Heterogeneous graph
    • From the original graphs after we trained and computed the clusters we will add one new virtual node for each of the clus- ter. For K clusters, where K is a arbitrary number, we will add K new virtual nodes. For each virtual node, we will add M new edges between all the nodes in one cluster and the new virtual node, where M is the number of the nodes in a clus- ter. After virtual nodes connect the local nodes, we will fully connect all the virtual nodes. To address the different proper- ties of the edges from local nodes to local nodes, the edges from local nodes to virtual nodes and the edges from virtual nodes to virtual nodes, we applied the idea of Heterogenous Graph Neural Network by using different message passing convolution layers for different types of edges.

Peptides Dataset Results.


Because of the limited time and computational resources, we could not run the experiment for all the datasets. Therefore, we chose the dataset Peptides-func and Peptides-struct and re-sampled citation dataset from Cora. We used Average Precision, Accuracy, and Mean Absolute Error corresponding to the dataset as suggested by the author. Our baseline models are GCN, GAT, GIN, and SAN.

The models are tested on the following datasets:

  • Citation Dataset
  • Peptides Dataset

They are benchmarked against the following models:

  • GCN
  • GAT
  • GIN
  • SAN ( (SAN is not used in the citation dataset due to la)

The results of our model are shown below: As figures indicate, we achieved a similar performance as SAN for peptides datasets, which is better than other common Message-passing GNNs. Yet, we are way more efficient than SAN.

Citation Dataset Results. Peptides Dataset Results.


As our model is not performing as well on the resampled citation datasets, we proposes the following potential causes:

  • The Cora dataset originally might not contain very important long-range interactions, therefore after re-sample it's still not sure if the long-range interactions now contained in the graphs are strong enough to have great influences on the predictions.
  • Cora dataset contains one single graph for node-level tasks. While our models work well for graph-level tasks, the cluster-method with the fully connected virtual nodes might lead the graph to another side of over-smoothing. While we used heterogenous GNN to try to address this possible issue, that might still has an influence on node-level predictions tasks.


  author    = {Qian, Samson},
  title     = {Multi-Agent Deep Reinforcement Learning and GAN-Based Market Simulation for Derivatives Pricing and Dynamic Hedging},
  journal   = {MIT},
  year      = {2023},