TY - JOUR
T1 - A Hybrid Deep Learning and Optical Flow Framework for Monocular Capsule Endoscopy Localization
AU - Yakar, İrem
AU - Kuçak, Ramazan Alper
AU - Bilgi, Serdar
AU - Ferhanoglu, Onur
AU - Akinci, Tahir Cetin
N1 - Publisher Copyright:
© 2025 by the authors.
PY - 2025/9
Y1 - 2025/9
N2 - Pose estimation and localization within the gastrointestinal tract, particularly the small bowel, are crucial for invasive medical procedures. However, the task is challenging due to the complex anatomy, homogeneous textures, and limited distinguishable features. This study proposes a hybrid deep learning (DL) method combining Convolutional Neural Network (CNN)-based pose estimation and optical flow to address these challenges in a simulated small bowel environment. Initial pose estimation was used to assess the performance of simultaneous localization and mapping (SLAM) in such complex settings, using a custom endoscope prototype with a laser, micromotor, and miniaturized camera. The results showed limited feature detection and unreliable matches due to repetitive textures. To improve this issue, a hybrid CNN-based approach enhanced with Farneback optical flow was applied. Using consecutive images, three models were compared: Hybrid ResNet-50 with Farneback optical flow, ResNet-50, and NASNetLarge pretrained on ImageNet. The analysis showed that the hybrid model outperformed both ResNet-50 (0.39 cm) and NASNetLarge (1.46 cm), achieving the lowest RMSE of 0.03 cm, with feature-based SLAM failing to provide reliable results. The hybrid model also gained a competitive inference speed of 241.84 ms per frame, outperforming ResNet-50 (316.57 ms) and NASNetLarge (529.66 ms). To assess the impact of the optical flow choice, Lucas–Kanade was also implemented within the same framework and compared with the Farneback-based results. These results demonstrate that combining optical flow with ResNet-50 enhances pose estimation accuracy and stability, especially in textureless environments where traditional methods struggle. The proposed method offers a robust, real-time alternative to SLAM, with potential applications in clinical capsule endoscopy. The results are positioned as a proof-of-concept that highlights the feasibility and clinical potential of the proposed framework. Future work will extend the framework to real patient data and optimize for real-time hardware.
AB - Pose estimation and localization within the gastrointestinal tract, particularly the small bowel, are crucial for invasive medical procedures. However, the task is challenging due to the complex anatomy, homogeneous textures, and limited distinguishable features. This study proposes a hybrid deep learning (DL) method combining Convolutional Neural Network (CNN)-based pose estimation and optical flow to address these challenges in a simulated small bowel environment. Initial pose estimation was used to assess the performance of simultaneous localization and mapping (SLAM) in such complex settings, using a custom endoscope prototype with a laser, micromotor, and miniaturized camera. The results showed limited feature detection and unreliable matches due to repetitive textures. To improve this issue, a hybrid CNN-based approach enhanced with Farneback optical flow was applied. Using consecutive images, three models were compared: Hybrid ResNet-50 with Farneback optical flow, ResNet-50, and NASNetLarge pretrained on ImageNet. The analysis showed that the hybrid model outperformed both ResNet-50 (0.39 cm) and NASNetLarge (1.46 cm), achieving the lowest RMSE of 0.03 cm, with feature-based SLAM failing to provide reliable results. The hybrid model also gained a competitive inference speed of 241.84 ms per frame, outperforming ResNet-50 (316.57 ms) and NASNetLarge (529.66 ms). To assess the impact of the optical flow choice, Lucas–Kanade was also implemented within the same framework and compared with the Farneback-based results. These results demonstrate that combining optical flow with ResNet-50 enhances pose estimation accuracy and stability, especially in textureless environments where traditional methods struggle. The proposed method offers a robust, real-time alternative to SLAM, with potential applications in clinical capsule endoscopy. The results are positioned as a proof-of-concept that highlights the feasibility and clinical potential of the proposed framework. Future work will extend the framework to real patient data and optimize for real-time hardware.
KW - SLAM
KW - capsule endoscopy
KW - deep learning
KW - localization
KW - optical flow
UR - https://www.scopus.com/pages/publications/105017429643
U2 - 10.3390/electronics14183722
DO - 10.3390/electronics14183722
M3 - Article
AN - SCOPUS:105017429643
SN - 2079-9292
VL - 14
JO - Electronics (Switzerland)
JF - Electronics (Switzerland)
IS - 18
M1 - 3722
ER -