Synthetic Data Generation for Aerial Tree Detection and Classification
- Sponsored by: OCELL GmbH
- Project lead: Dr. Ricardo Acevedo Cabra
- Scientific lead: Beyrem Kaddech
- TUM Co-Mentor: Jona Klemenc
- Term: Summer semester 2025
- Application deadline: Sunday 19.01.2025
Apply to this project here

Motivation:
OCELL’s mission is to unlock the full potential of our forests to protect our climate. We use aerial photography and AI to offer companies local climate projects that are transparent, measurable and effective. Our models can predict and classify individual trees from aerial images.
To further improve the accuracy of these models and generalize across different forest scenarios, we want to augment the datasets used in training using synthetic data generation.
Goals:
The aim of this project is to create a synthetic dataset generation and training pipeline to improve machine learning models used for aerial tree detection and classification.
We first invite you to leverage Unreal Engine’s procedural content generation capabilities to design large scale forest environments that mimic real word scenarios with a variety of tree species, vegetation, terrain and weather conditions. For this, you will have the chance to take advantage of reality-like 3D models of trees and environmental elements (Quixel Megascans). In this first phase of the project, we expect you to creatively model virtual forests that will serve as the basis of synthetic generation.
In the next step you will extract aerial images simulating different ground sampling densities and environmental conditions. Through an iterative process and by comparison with real aerial images, you will be able to recreate photorealistic aerial images where each tree is known in position and crown-size. The combination of images and bounding boxes of individual trees forms the synthetic dataset.
You will then train our models on different combinations of datasets, combining synthetic and real data, benchmark the performance and create a final analysis of the results that evaluates the effectiveness of this approach in enhancing real world predictions. By the end of the project, you will have created a synthetic data generation and training pipeline that can be built upon and integrated into the rest of our Stack.
What we offer:
- 50,000 labeled trees from real aerial images.
- A well-established bounding box detection and tree classification Model.
- Necessary GPU capabilities to generate, train and predict.
- An openness to new initiatives and creative approaches.