Abstract
This study investigates obstacle detection and ship classification via cameras to ensure safe navigation for Unmanned Surface Vehicles. A two-stage approach was employed to achieve these goals. In the first stage, the focus was on detecting ships, humans, and other obstacles in maritime environments. Models based on the You Only Look Once architecture, specifically YOLOv5 and its variant TPH-YOLOv5 —specialized for detecting small objects— were optimized using the MODS dataset. This dataset contains labeled images of dynamic obstacles, such as ships, humans, and static obstacles, e.g., buoys. TPH-YOLOv5 performed well in detecting small objects, crucial for collision avoidance in Unmanned Surface Vehicles. In the second stage, the study addressed the ship classification problem, using the MARVEL dataset, which contains over two million images across 26 ship subtypes. A comparative analysis was conducted between Convolutional Neural Networks and Vision Transformer based models. Among these, the Data-efficient Image Transformer achieved the highest classification accuracy of 92.87%, surpassing the previously reported state-of-the-art performance. In order to further analyze the classification results, this study introduced a generic method for generating attention heatmaps in vision transformer based models. Unlike related works, this method is applicable not only to Vision Transformer but also to its variants. Additionally, pruning techniques were explored to improve the computational efficiency of Data-efficient Image Transformer model, reducing inference times and moving closer to the speed required for real-time applications, though Convolutional Neural Networks remain faster for such tasks.
Original language | English |
---|---|
Pages (from-to) | 113-122 |
Number of pages | 10 |
Journal | Proceedings of the International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications |
Volume | 3 |
DOIs | |
Publication status | Published - 2025 |
Event | 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, VISIGRAPP 2025 - Porto, Portugal Duration: 26 Feb 2025 → 28 Feb 2025 |
Bibliographical note
Publisher Copyright:© 2025 by SCITEPRESS - Science and Technology Publications, Lda.
Keywords
- Maritime
- Obstacle Detection
- Ship Classification
- Vision Transformers