Abstract
Reinforcement learning tree-based planningmethods have been gaining popularity in the last few years due to their success in single-agent domains,where a perfect simulatormodel is available: for example,Go and chess strategic board games. This paper pretends to extend tree search algorithms to the multiagent setting in a decentralized structure, dealing with scalability issues and exponential growth of computational resources. The n-step dynamic tree search combines forward planning and direct temporal-difference updates, outperforming markedly conventional tabular algorithms such asQ learning and state-action-reward-state-action (SARSA). Future state transitions and rewards are predicted with a model built and learned from real interactions between agents and the environment. This paper analyzes the developed algorithmin the hunter–pursuit cooperative game against stochastic and intelligent evaders.The n-step dynamic tree search aims to adapt single-agent tree search learningmethods to themultiagent boundaries and is demonstrated to be a remarkable advance as compared to conventional temporal-difference techniques.
| Original language | English |
|---|---|
| Pages (from-to) | 418-425 |
| Number of pages | 8 |
| Journal | Journal of Aerospace Information Systems |
| Volume | 20 |
| Issue number | 7 |
| DOIs | |
| Publication status | Published - Jul 2023 |
| Externally published | Yes |
Bibliographical note
Publisher Copyright:© 2023 by the American Institute of Aeronautics and Astronautics, Inc. All rights reserved.
Funding
This research is sponsored by the Engineering and Physical Sciences Research Council and BAE Systems under project reference number 2454254.
| Funders | Funder number |
|---|---|
| BAE Systems | 2454254 |
| Engineering and Physical Sciences Research Council |