Minimum-viable 3D Representation of Dynamic Models for Non-rigid Object in Open World
- Sponsored by: TUM Photogrammetry and Remote Sensing& Computer Vision for Digital Twins (University of Cambridge)
- Project lead: Dr. Ricardo Acevedo Cabra
- Scientific lead: Prof. Dr. Benjamin Busam, PhD. Guangming Wang
- Term: Summer semester 2026
- Application deadline: Sunday 25.01.2026
Apply to this project here
Motivation
Collaboration between computer vision and robotics can be a key enabler to efficient vision. For instance, even though current foundation models in computer vision have achieved impressive performance, integrating multiple of them to build a general-purpose robot remains difficult and hard to use mainly due to efficiency and computational cost constraints. Therefore, to enable scalable robotic applications, we aim to discover the minimum-viable structural representation for object manipulation. Such minimum-viable representations will allow reasoning about object deformation with the least computation, achieving an optimal balance between accuracy, efficiency, and computational resources for robots.
Objectives
This project aims to infer the minimum-viable representation of dynamic models for a non-rigid object from video observations for object manipulation. By analyzing object deformations during manipulation [1], the model should capture how an object behaves with the suitable structural representation under external forces (e.g., robotic arm interactions) [2], achieving a balance between representation accuracy, computational efficiency, and resource consumption for completing the object manipulation task [3,4].
Tasks and Opportunities
- Develop algorithms that extract compact structural representations of object deformation from interaction video with the object. The goal is to significantly reduce computation and resource usage while maintaining accurate predictions for the manipulation task.
- Propose suitable evaluation metrics that jointly assess prediction accuracy, computational efficiency, and resource usage for the task, to systematically validate the effectiveness of minimum-viable object representations.
- Test the proposed framework and conduct comprehensive evaluations to demonstrate its effectiveness.
- Further opportunities: Motivated and capable students will have opportunities to continue this research, e.g., exploring and expanding the research to long-horizon robotic tasks may favor different minimum-viable structural representations and more challenging robot tasks.
Data
Experiments shall be conducted on open benchmark datasets from the computer vision community providing moving 3D objects with manipulation in videos, such as the data from PhysTwin [1].
Requirements
- Computer vision, robotics, machine learning, data science, computer graphics, photogrammetry or similar background is required; Familiar with PyTorch deep learning framework.
- Self-motivated and strongly interested in publishing top venues such as CVPR, ECCV, NeurIPS, T-PAMI, TRO, IJRR, ICRA, IROS, RSS, and similar (Previous publications is a plus).
Apply to this project here
References
[1] Jiang, H., Hsu, H. Y., Zhang, K., Yu, H. N., Wang, S., Li, Y. Phystwin: Physics-informed reconstruction and simulation of deformable objects from videos. ICCV, 2025.
[2] Wu, Y., Pan, L., Wu, W., Wang, G., Wang, H. Rl-gsbridge: 3d gaussian splatting based real2sim2real method for robotic manipulation learning. ICRA, 2025.
[3] Huang, J., Peter, K. T., Navab, N., Busam, B. TTAPose: Test-time Adaptation for Unseen Object Pose Estimation. IEEE Robotics and Automation Letters, 2025.
[4] Matsuki, H., Murai, R., Kelly, P. H., & Davison, A. J. Gaussian splatting slam. CVPR 2024.
