Abstract
Reinforcement learning tree-based planning methods have been gaining popularity in the last few years due to their success in single-agent domains, where a perfect simulator model is available, e.g., Go and chess strategic board games. This paper pretends to extend tree search algorithms to the multi-agent setting in a decentralized structure, dealing with scalability issues and exponential growth of computational resources. The N-Step Dynamic Tree Search combines forward planning and direct temporal-difference updates, outperforming markedly state-of-the-art algorithms such as Q-Learning and SARSA. Future state transitions and rewards are predicted with a model built and learned from real interactions between agents and the environment. As an extension of previous work, this paper analyses the developed algorithm in the Hunter-Pursuit cooperative game against intelligent evaders. The N-Step Dynamic Tree Search aims to adapt the most successful single-agent learning methods to the multi-agent boundaries and demonstrates to be a remarkable advance compared to conventional temporal-difference techniques.
| Original language | English |
|---|---|
| Title of host publication | 2022 American Control Conference, ACC 2022 |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| Pages | 761-766 |
| Number of pages | 6 |
| ISBN (Electronic) | 9781665451963 |
| DOIs | |
| Publication status | Published - 2022 |
| Externally published | Yes |
| Event | 2022 American Control Conference, ACC 2022 - Atlanta, United States Duration: 8 Jun 2022 → 10 Jun 2022 |
Publication series
| Name | Proceedings of the American Control Conference |
|---|---|
| Volume | 2022-June |
| ISSN (Print) | 0743-1619 |
Conference
| Conference | 2022 American Control Conference, ACC 2022 |
|---|---|
| Country/Territory | United States |
| City | Atlanta |
| Period | 8/06/22 → 10/06/22 |
Bibliographical note
Publisher Copyright:© 2022 American Automatic Control Council.
Funding
*Research supported by Engineering and Physical Sciences Research Council (EPSRC) and BAE Systems under the project reference no. 2454254. Marc Espinós Longa is a PhD Researcher in the School of Aerospace, Transport & Manufacturing at Cranfield University, Bedfordshire, MK43 0AL, United Kingdom (e-mail: [email protected]). Antonios Tsourdos is an AIAA Senior Member, Head of Center and Director of Research in the School of Aerospace, Transport & Manufacturing
| Funders | Funder number |
|---|---|
| BAE Systems | 2454254 |
| Engineering and Physical Sciences Research Council |
Fingerprint
Dive into the research topics of 'Swarm Intelligence in Cooperative Environments: N-Step Dynamic Tree Search Algorithm Extended Analysis'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver