TY - JOUR
T1 - Guided Soft Actor Critic
T2 - A Guided Deep Reinforcement Learning Approach for Partially Observable Markov Decision Processes
AU - Haklidir, Mehmet
AU - Temeltas, Hakan
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2021
Y1 - 2021
N2 - Most real-world problems are essentially partially observable, and the environmental model is unknown. Therefore, there is a significant need for reinforcement learning approaches to solve them, where the agent perceives the state of the environment partially and noisily. Guided reinforcement learning methods solve this issue by providing additional state knowledge to reinforcement learning algorithms during the learning process, allowing them to solve a partially observable Markov decision process (POMDP) more effectively. However, these guided approaches are relatively rare in the literature, and most existing approaches are model-based, meaning that they require learning an appropriate model of the environment first. In this paper, we propose a novel model-free approach that combines the soft actor-critic method and supervised learning concept to solve real-world problems, formulating them as POMDPs. In experiments performed on OpenAI Gym, an open-source simulation platform, our guided soft actor-critic approach outperformed other baseline algorithms, gaining 720% more maximum average return on five partially observable tasks constructed based on continuous control problems and simulated in MuJoCo.
AB - Most real-world problems are essentially partially observable, and the environmental model is unknown. Therefore, there is a significant need for reinforcement learning approaches to solve them, where the agent perceives the state of the environment partially and noisily. Guided reinforcement learning methods solve this issue by providing additional state knowledge to reinforcement learning algorithms during the learning process, allowing them to solve a partially observable Markov decision process (POMDP) more effectively. However, these guided approaches are relatively rare in the literature, and most existing approaches are model-based, meaning that they require learning an appropriate model of the environment first. In this paper, we propose a novel model-free approach that combines the soft actor-critic method and supervised learning concept to solve real-world problems, formulating them as POMDPs. In experiments performed on OpenAI Gym, an open-source simulation platform, our guided soft actor-critic approach outperformed other baseline algorithms, gaining 720% more maximum average return on five partially observable tasks constructed based on continuous control problems and simulated in MuJoCo.
KW - Deep reinforcement learning
KW - guided policy search
KW - POMDP
UR - http://www.scopus.com/inward/record.url?scp=85120580620&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2021.3131772
DO - 10.1109/ACCESS.2021.3131772
M3 - Article
AN - SCOPUS:85120580620
SN - 2169-3536
VL - 9
SP - 159672
EP - 159683
JO - IEEE Access
JF - IEEE Access
ER -