Impact of Synthetic Data on Deep Learning Models for Earth Observation: Photovoltaic Panel Detection Case Study

  • Enes Hisam
  • , Jesus Gimeno
  • , David Miraut
  • , Manolo Pérez-Aixendri*
  • , Marcos Fernández
  • , Rossana Gini
  • , Raúl Rodríguez
  • , Gabriele Meoni
  • , Dursun Zafer Seker
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

This study explores the impact of synthetic data, both physically based and generatively created, on deep learning analytics for earth observation (EO), focusing on the detection of photovoltaic panels. A YOLOv8 object detection model was trained using a publicly available, multi-resolution very high resolution (VHR) EO dataset (0.8 m, 0.3 m, and 0.1 m), comprising 3716 images from various locations in Jiangsu Province, China. Three benchmarks were established using only real EO data. Subsequent experiments evaluated how the inclusion of synthetic data, in varying types and quantities, influenced the model’s ability to detect photovoltaic panels in VHR imagery. Physically based synthetic images were generated using the Unity engine, which allowed the generation of a wide range of realistic scenes by varying scene parameters automatically. This approach produced not only realistic RGB images but also semantic segmentation maps and pixel-accurate masks identifying photovoltaic panel locations. Generative synthetic data were created using diffusion-based models (DALL·E 3 and Stable Diffusion XL), guided by prompts to simulate satellite-like imagery containing solar panels. All synthetic images were manually reviewed, and corresponding annotations were ensured to be consistent with the real dataset. Integrating synthetic with real data generally improved model performance, with the best results achieved when both data types were combined. Performance gains were dependent on data distribution and volume, with the most significant improvements observed when synthetic data were used to meet the YOLOv8-recommended minimum of 1500 images per class. In this setting, combining real data with both physically based and generative synthetic data yielded improvements of 1.7% in precision, 3.9% in recall, 2.3% in mAP@50, and 3.3% in mAP@95 compared to training with real data alone. The study also emphasizes the importance of carefully managing the inclusion of synthetic data in training and validation phases to avoid overfitting to synthetic features, with the goal of enhancing generalization to real-world data. Additionally, a pre-training experiment using only synthetic data, followed by fine-tuning with real images, demonstrated improved early-stage training performance, particularly during the first five epochs, highlighting potential benefits in computationally constrained environments.

Original languageEnglish
Article number481
JournalISPRS International Journal of Geo-Information
Volume14
Issue number12
DOIs
Publication statusPublished - Dec 2025

Bibliographical note

Publisher Copyright:
© 2025 by the authors.

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 7 - Affordable and Clean Energy
    SDG 7 Affordable and Clean Energy

Keywords

  • AI-generated data
  • deep learning
  • diffusion model
  • earth observation
  • physically-based simulation
  • synthetic data

Fingerprint

Dive into the research topics of 'Impact of Synthetic Data on Deep Learning Models for Earth Observation: Photovoltaic Panel Detection Case Study'. Together they form a unique fingerprint.

Cite this