Semantic Planar Splatting for 3D Structured Reconstruction
- Sponsored by: TUM Computer Vision Group
- Project lead: Dr. Ricardo Acevedo Cabra
- Scientific lead: Dr. Yan Xia, Prof. Dr. Olaf Wysocki.
- Term: Summer semester 2026
- Application deadline: Sunday 25.01.2026
Apply to this project here
Motivation
Recent Gaussian Splatting (GS) advances have demonstrated its effectiveness in high-fidelity rendering and 3D reconstruction. In parallel to this research direction is 3D structured object reconstruction, where 3D planes are the most common representation due to their simplicity and completeness in describing surfaces. The pivotal trait of these models is that they not only comprise geometric primitives but also semantics, approximating sensor observations to a minimum-viable explicit representation. Such structured semantic 3D models are pivotal in creating urban digital twins, including indoor and outdoor scenarios.
Traditionally, planar 3D reconstruction has been treated as a model-fitting problem where known scene geometry (e.g., point clouds) is fitted with 3D planes or library elements [2,4]. Recent image-based methods simplify this process by reconstructing 3D planes and other primitive shapes directly from single- or multi-view images, typically through detecting, matching, and merging primitives across views [1,3]. Recently, PlanarSplatting [1] has proposed indoor plane-based reconstruction, where the authors directly optimize 3D plane primitives through a differentiable splatting process that projects them into depth and normal maps.
However, the methods are still lacking joint semantic and geometry reconstruction as well as watertight models. We observe that semantics and geometry are indissolubly correlated; hence we propose SemanticPlanarSplatting, where joint optimization of inferred image semantics and geometry shall yield object-oriented plane reconstruction. Such setup shall enable the second step, where optimal planes shall yield compact and watertight models. Another shortcoming of the recent works, is their sole focus on indoor reconstruction. In our project, we will perform scene agnostic tests on indoor but also more challenging outdoor scenarios.
Objectives:
Specific tasks of the team are
- Develop a method, SemanticPlanarSplatting, that simultaneously optimizes image semantics and geometry for object-oriented plane reconstruction. (M1 - Milestone 1)
- Ensure reconstructed planes form compact and watertight models suitable for downstream applications such as CAD and large-scale semantic 3D city models reconstruction. (M2)
- Extend reconstruction beyond indoor scenes by conducting comprehensive tests on both indoor and challenging outdoor environments (see Data). (M3)
Data
Experiments shall be conducted on open benchmark datasets from the computer vision community providing 3D geometry and images, such as Swiss-EPFL and TUM2TWIN (outdoor), ScanNet and ScanNet++ (indoor).
Requirements
- Computer vision, machine learning, data science, photogrammetry or similar background is required.
- Self-motivated and strongly interested in publishing top venues such as CVPR, ECCV, NeurIPS, and similar (Previous publications is a plus).
- Familiar with Pytorch or Tensorflow deep learning framework.
Apply to this project here
References
[1] Tan, B., Yu, R., Shen, Y. and Xue, N., 2025. PlanarSplatting: Accurate Planar Surface Reconstruc-
tion in 3 Minutes. CVPR25.
[2] Chen, Z., Wang, Y., Nan, L. and Zhu, X.X., 2025. Parametric Point Cloud Completion for Polygonal Surface Reconstruction. CVPR25.
[3] Fedele, E., Sun, B., Guibas, L., Pollefeys, M. and Engelmann, F., 2025. Superdec: 3D scene decomposition with superquadric primitives. arXiv preprint arXiv:2504.00992.
[4] Wysocki, O., Xia, Y., Wysocki, M., Grilli, E., Hoegner, L., Cremers, D. and Stilla, U., 2023. Scan2LoD3: Reconstructing semantic 3D building models at LoD3 using ray casting and Bayesian networks. CVPR23.
