SS24 TUM Computer Vision Group: Object-Centric 3D Reconstruction and Decomposition

Object-Centric 3D Reconstruction and Decomposition

Sponsored by: TUM Computer Vision Group (Prof. Daniel Cremers) & University of Oxford (Visual Geometry Group, Department of Engineering Science
Project Lead: Dr. Ricardo Acevedo Cabra
Scientific Lead: Dr. Yan Xia and Dr. Chuanxia Zheng
TUM co-mentor: Dr. Ricardo Acevedo Cabra
Term: Summer semester 2024
Application deadline: Sunday 21.01.2024

Apply to this project here

Introduction
Acknowledging the individual nature of instances in our physical world, there is an emerging need to build an object-centric 3D virtual reality for users to interact with and manipulate objects within it freely. Unlike most existing research, our goal is to capture the 3D geometry and appearance of not only the whole scene but also of the individual objects within it. Imagine the ability to reposition a chair or animate a flying broomstick within a 3D virtual room; such capabilities would provide users with the power to dynamically edit real-world scenarios at their fingertips, offering new perspectives of object visualization. Although existing works such as ObjectNeRF [8] and our ObjectSDF [6] also look at a similar 3D scene decomposition, they need multi-view instance segmentation masks as ground truth, resulting in limited applications. Additionally, approaches like Giraffe [5] and DisCoScene [7] employ a compositional NeRF but are primarily tailored for generation tasks and lack the capability to effectively decompose intricate scenes. In contrast, this project aims to markedly enhance the accuracy of decomposition and interpolation in 3D data, by exploring the potential of integrating state-of-the-art diffusion models and implicit neural representation techniques.

Objectives
• To review and analyze existing methods of object-centric 3D reconstruction and decomposition.
• To develop frameworks for object-centric 3D reconstruction and decomposition for per-scene or a dataset.
• To evaluate the efficiency and accuracy of the proposed frameworks using benchmark datasets.

Data Availability
Per-Scene Optimized Object-Centric 3D Reconstruction.
• DTU dataset [4] is a real dataset, containing 88 forward-facing scenes with the 3D ground-truth.
• ScanNet dataset [1] is a real dataset captured by RGB-D cameras. It contains 2D instance masks.

Large Scale Object-Centric 3D Generation .
• Objaverse-XL dataset [2] is by far the largest 3D dataset to date, including over 10 million 3D objects.
• 3D-Front dataset [3] is a large-scale dataset, containing 18,797 rooms diversely furnished by 3D objects.

Requirements for the students
The students should fulfill the following requirements:
• Good background in math and excellent grades;
• Self-motivated and strongly interested in publishing top venues like CVPR, NeurIPS, or others.
• Practical Python or C++ programming skills;
• Familiar with Pytorch or Tensorflow deep learning framework;
• Have attended at least one related CV course or seminar.

Apply to this project here

References
[1] Angela Dai, Angel X Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5828–5839, 2017.
[2] Matt Deitke, Ruoshi Liu, Matthew Wallingford, Huong Ngo, Oscar Michel, Aditya Kusupati, Alan Fan, Christian Laforte, Vikram Voleti, Samir Yitzhak Gadre, et al. Objaverse-xl: A universe of 10m+ 3d objects. arXiv preprint arXiv:2307.05663, 2023.
[3] Huan Fu, Bowen Cai, Lin Gao, Ling-Xiao Zhang, Jiaming Wang, Cao Li, Qixun Zeng, Chengyue Sun, Rongfei Jia, Binqiang Zhao, et al. 3d-front: 3d furnished rooms with layouts and semantics. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10933–10942, 2021.
[4] Rasmus Jensen, Anders Dahl, George Vogiatzis, Engin Tola, and Henrik Aanæs. Large scale multi-view stereopsis evaluation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 406–413, 2014.
[5] Michael Niemeyer and Andreas Geiger. Giraffe: Representing scenes as compositional generative neural feature fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11453–11464, 2021.
[6] Qianyi Wu, Xian Liu, Yuedong Chen, Kejie Li, Chuanxia Zheng, Jianfei Cai, and Jianmin Zheng. Object-compositional neural implicit surfaces. In European Conference on Computer Vision, pages 197–213. Springer, 2022.
[7] Yinghao Xu, Menglei Chai, Zifan Shi, Sida Peng, Ivan Skorokhodov, Aliaksandr Siarohin, Ceyuan Yang, Yujun Shen, Hsin-Ying Lee, Bolei Zhou, et al. Discoscene: Spatially disentangled generative radiance fields for controllable 3d-aware scene synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4402–4412, 2023.
[8] Bangbang Yang, Yinda Zhang, Yinghao Xu, Yijin Li, Han Zhou, Hujun Bao, Guofeng Zhang, and Zhaopeng Cui. Learning object-compositional neural radiance field for editable scene rendering. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 13779–13788, 2021.