A Hybrid Deep Learning and Optical Flow Framework for Monocular Capsule Endoscopy Localization

İrem Yakar*, Ramazan Alper Kuçak, Serdar Bilgi, Onur Ferhanoglu, Tahir Cetin Akinci*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Pose estimation and localization within the gastrointestinal tract, particularly the small bowel, are crucial for invasive medical procedures. However, the task is challenging due to the complex anatomy, homogeneous textures, and limited distinguishable features. This study proposes a hybrid deep learning (DL) method combining Convolutional Neural Network (CNN)-based pose estimation and optical flow to address these challenges in a simulated small bowel environment. Initial pose estimation was used to assess the performance of simultaneous localization and mapping (SLAM) in such complex settings, using a custom endoscope prototype with a laser, micromotor, and miniaturized camera. The results showed limited feature detection and unreliable matches due to repetitive textures. To improve this issue, a hybrid CNN-based approach enhanced with Farneback optical flow was applied. Using consecutive images, three models were compared: Hybrid ResNet-50 with Farneback optical flow, ResNet-50, and NASNetLarge pretrained on ImageNet. The analysis showed that the hybrid model outperformed both ResNet-50 (0.39 cm) and NASNetLarge (1.46 cm), achieving the lowest RMSE of 0.03 cm, with feature-based SLAM failing to provide reliable results. The hybrid model also gained a competitive inference speed of 241.84 ms per frame, outperforming ResNet-50 (316.57 ms) and NASNetLarge (529.66 ms). To assess the impact of the optical flow choice, Lucas–Kanade was also implemented within the same framework and compared with the Farneback-based results. These results demonstrate that combining optical flow with ResNet-50 enhances pose estimation accuracy and stability, especially in textureless environments where traditional methods struggle. The proposed method offers a robust, real-time alternative to SLAM, with potential applications in clinical capsule endoscopy. The results are positioned as a proof-of-concept that highlights the feasibility and clinical potential of the proposed framework. Future work will extend the framework to real patient data and optimize for real-time hardware.

Original languageEnglish
Article number3722
JournalElectronics (Switzerland)
Volume14
Issue number18
DOIs
Publication statusPublished - Sept 2025

Bibliographical note

Publisher Copyright:
© 2025 by the authors.

Keywords

  • SLAM
  • capsule endoscopy
  • deep learning
  • localization
  • optical flow

Fingerprint

Dive into the research topics of 'A Hybrid Deep Learning and Optical Flow Framework for Monocular Capsule Endoscopy Localization'. Together they form a unique fingerprint.

Cite this