Abstract
Uncertainty and partial or unknown information about environment dynamics have led reward-based methods to play a key role in the Single-Agent and Multi-Agent Learning problem. Tree-based planning approaches such as Monte Carlo Tree Search algorithm have been a striking success in single-agent domains where a perfect simulator model is available, e.g., Go and chess strategic board games. This paper presents a decentralized tree-based planning scheme, that combines forward planning with direct reinforcement learning temporal-difference updates applied to the multi-agent setting. Forward planning requires an engine model which is learned from experience and represented via function approximation. Evaluation and validation are carried out in the Hunter-Prey Pursuit cooperative environment and performance is compared with state-of-the-art RL techniques. N-Step Dynamic Tree Search (NSDTS) pretends to adapt the most successful single-agent learning methods to the multi-agent boundaries in a decentralized system structure, dealing with scalability issues and exponential growth of computational resources suffered by centralized systems. NSDTS demonstrates to be a remarkable advance compared to the conventional Q-Learning temporal-difference method.
Original language | English |
---|---|
Title of host publication | AIAA SciTech Forum 2022 |
Publisher | American Institute of Aeronautics and Astronautics Inc, AIAA |
ISBN (Print) | 9781624106316 |
DOIs | |
Publication status | Published - 2022 |
Externally published | Yes |
Event | AIAA Science and Technology Forum and Exposition, AIAA SciTech Forum 2022 - San Diego, United States Duration: 3 Jan 2022 → 7 Jan 2022 |
Publication series
Name | AIAA Science and Technology Forum and Exposition, AIAA SciTech Forum 2022 |
---|
Conference
Conference | AIAA Science and Technology Forum and Exposition, AIAA SciTech Forum 2022 |
---|---|
Country/Territory | United States |
City | San Diego |
Period | 3/01/22 → 7/01/22 |
Bibliographical note
Publisher Copyright:© 2022, American Institute of Aeronautics and Astronautics Inc.. All rights reserved.
Funding
This work is sponsored by the Engineering and Physical Sciences Research Council (EPSRC) and BAE Systems under the project reference no. 2454254.
Funders | Funder number |
---|---|
BAE Systems | 2454254 |
Engineering and Physical Sciences Research Council |