Machine learning tree trimming for faster Markov reward game solutions

Burhaneddin İzgi*, Murat Özkaya, Nazım Kemal Üre, Matjaž Perc

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

Existing methodologies for solving Markov reward games mostly rely on state–action frameworks and iterative algorithms to address these challenges. However, these approaches often impose significant computational burdens, particularly when applied to large-scale games, due to their inherent complexity and the need for extensive iterative calculations. In this paper, we propose a new neural network architecture for solving Markov reward games in the form of a decision tree with relatively large state and action sets, such as 2-actions-3-stages, 3-actions-3-stages, and 4-actions-3-stages, by trimming the decision tree. In this context, we generate datasets of Markov reward games with sizes ranging from (Formula presented) to (Formula presented) using the holistic matrix norm-based solution method and obtain the necessary components, such as the payoff matrices and the corresponding solutions of the games, for training the neural network. We then propose a vectorization process to prepare the outcomes of the matrix norm-based solution method and adapt them for training the proposed neural network. The neural network is trained using both the vectorized payoff and transition matrices as input, and the prediction system generates the optimal strategy set as output. In the model, we approach the problem as a classification task by labeling the optimal and non-optimal branches of the decision tree with ones and zeros, respectively, to identify the most rewarding paths of each game. As a result, we propose a novel neural network architecture for solving Markov reward games in real time, enhancing its practicality for real-world applications. The results reveal that the system efficiently predicts the optimal paths for each decision tree, with f1-scores slightly greater than 0.99, 0.99, and 0.97 for Markov reward games with 2-actions-3-stages, 3-actions-3-stages, and 4-actions-3-stages, respectively.

Original languageEnglish
Article number102726
JournalJournal of Computational Science
Volume92
DOIs
Publication statusPublished - Dec 2025

Bibliographical note

Publisher Copyright:
© 2025 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies.

Keywords

  • Convolutional neural networks
  • Machine learning
  • Markov decision process
  • Markov reward games
  • Matrix norm-based method

Fingerprint

Dive into the research topics of 'Machine learning tree trimming for faster Markov reward game solutions'. Together they form a unique fingerprint.

Cite this