Joint Detection Tracking and Mapping (JDTAM)

The visual SLAM (Simultaneous Localization And Mapping) problem concerns the ability to incrementally reconstruct the world and simultaneously localize the sensing device by means of visual cues only. Usually the tracking of the camera does not provide nor handle any semantic interpretation of the environment, so the reconstruction and detection processes are decoupled. However, even a partial reconstruction could improve the object detection task, while detecting an object from different point of views definitely constraints the camera pose estimation problem. 
In this project we aim at JOINTLY solve the detection and mapping problems by means of a novel Semantic Bundle Adjustment framework. Object presence is inferred by feature matching and coherency with the incremental reconstruction, while objects' poses are estimated together with camera poses in a unified optimization problem.

   

 

Semantic KinectFusion

We applied the Semantic Bundle Adjustment technique to the KinectFusion [2] tracking system for an effective Semantic Tracking and Mapping. Being our method based on keyframe optimization, we extended KinectFusion with keyframe detection and global pose optimization, as well as smart volume moving capability and loop closure. Preliminary results can be found in [3].

 

 Our novel TSDF-based matching              System Flowchart

 

Coming soon...

We are working on a novel exciting mobile SLAM application for Android platform!
See the algorithm running on a off-the-shelf desktop cpu [AVI]

--> Tesi Disponibili

References

[1] N. Fioraio, L. Di Stefano, "Joint Detection, Tracking and Mapping by Semantic Bundle Adjustment", Computer Vision and Pattern Recognition (CVPR), 2013. [PDF, AVI, BIBTEX]

[2] R. A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, D. Kim, A. J. Davison, P. Kohli, J. Shotton, S. Hodges and A. Fitzgibbon, "KinectFusion: Real-Time Dense Surface Mapping and Tracking", International Symposium on Mixed and Augmented Reality (ISMAR), 2011.

[3] N. Fioraio, G. Cerri and L. Di Stefano, “Towards Semantic KinectFusion”, International Conference on Image Analysis and Processing (ICIAP), 2013. [PDF, BIBTEX]