Özet
Pancreatic Ductal Adenocarcinoma (PDAC) is among the deadliest cancer types, with early detection being critical to improving survival rates. However, developing effective detection models is challenging due to the need for high-quality, class-balanced datasets. Generative models have recently gained attention for addressing this issue. In this study, we compare three tabular data-based generative models: Conditional Tabular Generative Adversarial Networks (CTGAN), Tabular Variational Autoencoder (TVAE), and Gaussian Copula (GC) using PDAC gene expression data. We first constructed an integrated dataset by curating six PDAC studies and applied an ensemble-based feature selection approach combining Differential Expression (DEG) analysis, ANOVA, Lasso, and Mutual Information. The synthetic data were evaluated both statistically (using Correlation Discrepancy (CD), Kolmogorov-Smirnov(KS), and Statistical Similarity(SS) metrics) and biologically (via PDAC marker genes), as well as visually in 2D-PCA space. The GC model produced the most realistic synthetic data with 0.1482 CD, 0.8120 KS, and 0.9529 SS metric values, similar expression level with PDAC markers, and uniform distribution with real data. TVAE followed GC. Based on these findings, we proposed an ensemble model combining GC and TVAE-generated samples. Classification experiments using Random Forest (RF) and Support Vector Machine (SVM) demonstrated that, while the ensemble generative model did not achieve the highest performance (0.8541 precision, 0.8570 recall, 0.8533 F1-measure and 0.9236 AUC) for SVM but achieved (0.8549 precision, 0.8623 recall, 0.8568 F1-measure and 0.9246 AUC) for RF, so it is a promising model for future applications.
| Orijinal dil | İngilizce |
|---|---|
| Ana bilgisayar yayını başlığı | Proceedings of the 7th International Conference on Statistics |
| Ana bilgisayar yayını alt yazısı | Theory and Application, ICSTA 2025 |
| Editörler | Noelle Samia, Dirk Husmeier |
| Yayınlayan | Avestia Publishing |
| ISBN (Basılı) | 9781990800597 |
| DOI'lar | |
| Yayın durumu | Yayınlandı - 2025 |
| Etkinlik | 7th International Conference on Statistics: Theory and Applications, ICSTA 2025 - Paris, France Süre: 17 Ağu 2025 → 19 Ağu 2025 |
Yayın serisi
| Adı | Proceedings of the International Conference on Statistics |
|---|---|
| ISSN (Elektronik) | 2562-7767 |
???event.eventtypes.event.conference???
| ???event.eventtypes.event.conference??? | 7th International Conference on Statistics: Theory and Applications, ICSTA 2025 |
|---|---|
| Ülke/Bölge | France |
| Şehir | Paris |
| Periyot | 17/08/25 → 19/08/25 |
Bibliyografik not
Publisher Copyright:© 2025, Avestia Publishing. All rights reserved.
BM SKH
Bu sonuç, aşağıdaki Sürdürülebilir Kalkınma Hedefine/Hedeflerine katkıda bulunur
-
SKH 3 Sağlık ve Kaliteli Yaşam
Parmak izi
Investigating Tabular Generative Models for Synthetic Data Generation in PDAC Bulk Gene Expression Data' araştırma başlıklarına git. Birlikte benzersiz bir parmak izi oluştururlar.Alıntı Yap
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver