Diagnostic Accuracy of a Machine Learning-Derived Appendicitis Score in Children: A Multicenter Validation Study

Emrah Aydın*, Taha Eren Sarnıç, İnan Utku Türkmen, Narmina Khanmammadova, Ufuk Ateş, Mustafa Onur Öztan, Tamer Sekmenli, Necip Fazıl Aras, Tülin Öztaş, Ali Yalçınkaya, Murat Özbek, Deniz Gökçe, Hatice Sonay Yalçın Cömert, Osman Uzunlu, Aliye Kandırıcı, Nazile Ertürk, Alev Süzen, Fatih Akova, Mehmet Paşaoğlu, Egemen EroğluGülnur Göllü Bahadır, Ahmet Murat Çakmak, Salim Bilici, Ramazan Karabulut, Mustafa İmamoğlu, Haluk Sarıhan, Süleyman Cüneyt Karakuş

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Background: Accurate diagnosis of acute appendicitis in children remains challenging due to variable presentations and limitations of existing clinical scoring systems. While machine learning (ML) offers a promising approach to enhance diagnostic precision, most prior studies have been limited by small sample sizes, single-center data, or a lack of external validation. Methods: This prospective, multicenter study included 8586 pediatric patients to develop a machine learning-based diagnostic model using routinely available clinical and hematological parameters. A separate, prospectively collected external validation cohort of 3000 patients was used to assess model performance. The Random Forest algorithm was selected based on its superior performance during model comparison. Diagnostic accuracy, sensitivity, specificity, Area Under Curve (AUC), and calibration metrics were evaluated and compared with traditional scoring systems such as Pediatric Appendicitis Score (PAS), Alvarado, and Appendicitis Inflammatory Response Score (AIRS). Results: The ML model outperformed traditional clinical scores in both development and validation cohorts. In the external validation set, the Random Forest model achieved an AUC of 0.996, accuracy of 0.992, sensitivity of 0.998, and specificity of 0.993. Feature-importance analysis identified white blood cell count, red blood cell count, and mean platelet volume as key predictors. Conclusions: This large, prospectively validated study demonstrates that a machine learning-based scoring system using commonly accessible data can significantly improve the diagnosis of pediatric appendicitis. The model offers high accuracy and clinical interpretability and has the potential to reduce diagnostic delays and unnecessary imaging.

Original languageEnglish
Article number937
JournalChildren
Volume12
Issue number7
DOIs
Publication statusPublished - Jul 2025
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2025 by the authors.

Keywords

  • appendicitis
  • clinical decision support
  • diagnosis
  • machine learning
  • pediatrics
  • random forest

Fingerprint

Dive into the research topics of 'Diagnostic Accuracy of a Machine Learning-Derived Appendicitis Score in Children: A Multicenter Validation Study'. Together they form a unique fingerprint.

Cite this