Abstract
This study explores the impact of synthetic data, both physically based and generatively created, on deep learning analytics for earth observation (EO), focusing on the detection of photovoltaic panels. A YOLOv8 object detection model was trained using a publicly available, multi-resolution very high resolution (VHR) EO dataset (0.8 m, 0.3 m, and 0.1 m), comprising 3716 images from various locations in Jiangsu Province, China. Three benchmarks were established using only real EO data. Subsequent experiments evaluated how the inclusion of synthetic data, in varying types and quantities, influenced the model’s ability to detect photovoltaic panels in VHR imagery. Physically based synthetic images were generated using the Unity engine, which allowed the generation of a wide range of realistic scenes by varying scene parameters automatically. This approach produced not only realistic RGB images but also semantic segmentation maps and pixel-accurate masks identifying photovoltaic panel locations. Generative synthetic data were created using diffusion-based models (DALL·E 3 and Stable Diffusion XL), guided by prompts to simulate satellite-like imagery containing solar panels. All synthetic images were manually reviewed, and corresponding annotations were ensured to be consistent with the real dataset. Integrating synthetic with real data generally improved model performance, with the best results achieved when both data types were combined. Performance gains were dependent on data distribution and volume, with the most significant improvements observed when synthetic data were used to meet the YOLOv8-recommended minimum of 1500 images per class. In this setting, combining real data with both physically based and generative synthetic data yielded improvements of 1.7% in precision, 3.9% in recall, 2.3% in mAP@50, and 3.3% in mAP@95 compared to training with real data alone. The study also emphasizes the importance of carefully managing the inclusion of synthetic data in training and validation phases to avoid overfitting to synthetic features, with the goal of enhancing generalization to real-world data. Additionally, a pre-training experiment using only synthetic data, followed by fine-tuning with real images, demonstrated improved early-stage training performance, particularly during the first five epochs, highlighting potential benefits in computationally constrained environments.
| Original language | English |
|---|---|
| Article number | 481 |
| Journal | ISPRS International Journal of Geo-Information |
| Volume | 14 |
| Issue number | 12 |
| DOIs | |
| Publication status | Published - Dec 2025 |
Bibliographical note
Publisher Copyright:© 2025 by the authors.
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 7 Affordable and Clean Energy
Keywords
- AI-generated data
- deep learning
- diffusion model
- earth observation
- physically-based simulation
- synthetic data
Fingerprint
Dive into the research topics of 'Impact of Synthetic Data on Deep Learning Models for Earth Observation: Photovoltaic Panel Detection Case Study'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver