TY - JOUR
T1 - Ship course-keeping in waves using sample-efficient reinforcement learning
AU - Greep, Justin
AU - Bayezit, Afşin Baran
AU - Mak, Bart
AU - Rijpkema, Douwe
AU - Kınacı, Ömer Kemal
AU - Düz, Bülent
N1 - Publisher Copyright:
© 2024 Elsevier Ltd
PY - 2025/2/1
Y1 - 2025/2/1
N2 - Maintaining a steady course in waves is important for ships for various reasons such as safety, fuel efficiency, and comfort. This has commonly been addressed by adopting conventional control algorithms. Reinforcement learning (RL) methods, on the other hand, have demonstrated successful performance in a wide range of control problems. In this work, the performance of two RL agents (model-free and model-based) in comparison to a linear-quadratic regulator (LQR) is investigated in a numerical environment. The model-free RL agent performed better than the LQR with respect to keeping its course and minimizing the rudder usage. By applying model-based RL, the low sample efficiency and consequent long training times that typically complicate model-free RL were mitigated. As a result, the training time of the course-keeping agent was reduced by more than an order of magnitude. Moreover, the model-based agent learned to exclusively react to the low-frequency yaw motion while ignoring the first-order wave disturbances, thereby reducing the rudder usage considerably.
AB - Maintaining a steady course in waves is important for ships for various reasons such as safety, fuel efficiency, and comfort. This has commonly been addressed by adopting conventional control algorithms. Reinforcement learning (RL) methods, on the other hand, have demonstrated successful performance in a wide range of control problems. In this work, the performance of two RL agents (model-free and model-based) in comparison to a linear-quadratic regulator (LQR) is investigated in a numerical environment. The model-free RL agent performed better than the LQR with respect to keeping its course and minimizing the rudder usage. By applying model-based RL, the low sample efficiency and consequent long training times that typically complicate model-free RL were mitigated. As a result, the training time of the course-keeping agent was reduced by more than an order of magnitude. Moreover, the model-based agent learned to exclusively react to the low-frequency yaw motion while ignoring the first-order wave disturbances, thereby reducing the rudder usage considerably.
KW - Control
KW - Course-keeping of ships in waves
KW - Linear-quadratic regulator
KW - Model-based reinforcement learning
KW - Model-free reinforcement learning
KW - Numerical simulation
KW - Reinforcement learning
KW - Sample efficiency
UR - http://www.scopus.com/inward/record.url?scp=85212038583&partnerID=8YFLogxK
U2 - 10.1016/j.engappai.2024.109848
DO - 10.1016/j.engappai.2024.109848
M3 - Article
AN - SCOPUS:85212038583
SN - 0952-1976
VL - 141
JO - Engineering Applications of Artificial Intelligence
JF - Engineering Applications of Artificial Intelligence
M1 - 109848
ER -