Abstract
Deep reinforcement learning (DRL) algorithms interact with the environment and aim to learn without labeled data. In high-dimensional spaces, they evolve their policies to maximize the rewards they can collect. They have applications in various fields, such as search and rescue, reconnaissance, military operations, firefighting, and autonomous vehicles. However, there are also situations in which algorithms struggle to cope. In simulation environments, it is assumed that the exact values of the observation data are properly received. If a neural network model meets inputs that are different from those used during training, accurate predictions cannot be made to solve these new situations. This makes it vulnerable to corrupted state data which may be encountered in real-world applications. In this study, State Adversarial Markov Decision Process (SA-MDP) was investigated to increase robustness. The state perturbed adversarial attack model is integrated into the DRL algorithm. To make appropriate decisions under perturbation, the guide actor, which is used only in the training phase and makes decisions with healthy observation data, guides the control actor, which makes decisions based on the perturbation model outputs. The proposed algorithm was applied to the target encirclement task for 3, 5 and 7 agents in multi-agent simulation systems prepared using the Pyglet library. The proposed guided approach was applied to both multi-agent soft actor critic (MA-SAC) and multi-agent twin delayed deep deterministic policy gradient (MA-TD3) algorithms. The results show that our approach is close to the results of the MA-SAC and MA-TD3 algorithms trained in noise-free environments.
Original language | English |
---|---|
Pages (from-to) | 156146-156159 |
Number of pages | 14 |
Journal | IEEE Access |
Volume | 12 |
DOIs | |
Publication status | Published - 2024 |
Bibliographical note
Publisher Copyright:© 2013 IEEE.
Keywords
- Adversarial attack
- encirclement
- guided policy search
- multi-agent reinforcement learning