Abstract
Reinforcement learning tree-based planning methods have been gaining popularity in the last few years due to their success in single-agent domains, where a perfect simulator model is available, e.g., Go and chess strategic board games. This paper pretends to extend tree search algorithms to the multi-agent setting in a decentralized structure, dealing with scalability issues and exponential growth of computational resources. The N-Step Dynamic Tree Search combines forward planning and direct temporal-difference updates, outperforming markedly state-of-the-art algorithms such as Q-Learning and SARSA. Future state transitions and rewards are predicted with a model built and learned from real interactions between agents and the environment. As an extension of previous work, this paper analyses the developed algorithm in the Hunter-Pursuit cooperative game against intelligent evaders. The N-Step Dynamic Tree Search aims to adapt the most successful single-agent learning methods to the multi-agent boundaries and demonstrates to be a remarkable advance compared to conventional temporal-difference techniques.
Original language | English |
---|---|
Title of host publication | 2022 American Control Conference, ACC 2022 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 761-766 |
Number of pages | 6 |
ISBN (Electronic) | 9781665451963 |
DOIs | |
Publication status | Published - 2022 |
Externally published | Yes |
Event | 2022 American Control Conference, ACC 2022 - Atlanta, United States Duration: 8 Jun 2022 → 10 Jun 2022 |
Publication series
Name | Proceedings of the American Control Conference |
---|---|
Volume | 2022-June |
ISSN (Print) | 0743-1619 |
Conference
Conference | 2022 American Control Conference, ACC 2022 |
---|---|
Country/Territory | United States |
City | Atlanta |
Period | 8/06/22 → 10/06/22 |
Bibliographical note
Publisher Copyright:© 2022 American Automatic Control Council.