Real-time stereo vision based on the uniqueness constraint:
experimental results and applications

December 2012

We have completed the design of our embedded (stereo and mono) camera with highly efficient FPGA onboard processing. In stereo mode, the whole processing pipeline fits into entry level FPGA devices without additional hardware requirements delivering accurate and dense depth map in real-time. The imaging sensors, connected to the FPGA board with a standard interface,  provide color and monochrome images up to 60 fps.
 The embedded camera has software API for:

    - Windows 32 and 64 bit 
    - Linux 32 and 64 bit
    - Linux ARM 
    - Mac  
    - Android 

Further details and videos will be available soon.

If you are interested in this project for your applications feel free to contact me:





The research activity on stereo reported below is quite outdated. For an updated overview of my research activity on stereo follow this link.

Moreover, if you are interested in stereo vision you might find interesting this seminar on "Stereo vision: algorithms and applications".



This page provides experimental results and applications concerned with the Single Matching Phase (SMP) stereo algorithm [1]. Although several approaches for computing very accurate depth maps have been recently proposed (see for example [8], [9]) most of these are not currently suitable for real-time applications (see [11] for a performace evaluation of cost aggregation strategies proposed for stereo matching). Conversely, SMP is a fast and reliable algorithm for computing dense stereo correspondence in real-time. The SMP algorithm uses the uniqueness constraint as one of the main cues for detecting unreliable measurements. In [1] we provide, on a large set of standard stereo pairs with ground truth (namely "Tsukuba", "Map", "Sawtooth", "Venus", "Barn1", "Barn2", "Bull" and "Poster" available at the Scharstein and Szeliski's web site [4] and used in their paper [5]), the result of a quantitative comparison between the SMP approach and a known algorithm [3] based on bidirectional matching (BM). Bidirectional matching is also often referred to as left-right consistency check or left-right constraint. We also provide, in [1] and [2], experimental results concerned with rectified stereo sequences acquired in our laboratory with a digital stereo camera and preliminary results concerned with a 3D Tracking application and a 3D People Counting application. The SMP algorithm has been implemented in C exploiting the SIMD parallell capabilities (e.g. MMX and SSE technologies) available in recent Intel, AMD (and many others) microprocessors. A detailed description of the SIMD mapping of the SMP algorithm is available in [2]. A more recent approach concerned with (near) real-time stereo matching algorithm  was proposed in [12] (experimental results here, evaluation on the Middlebury dataset here).

November 2010: The SMP algorithm has been implemented on a Texas Instrument DaVinci DSP (300 MHz CPU + 600 MHz DSP) by Anouar Manders at SenseIT. This implementation runs at 5/6 fps with 640x480 stereo pairs, 15x15 windows, disparity range of 64 pixels and 1/8 subpixel disparity interpolation (detection of unreliable disparities is not implemented yet).



If you are interested in the SMP algorithm or in its applications feel free to contact me at: 




Overview of a stereo vision system



In this page are provided detailed experimental results and videos concerned with 3D Tracking, 3D People Counting and 3D Change/Intrusion Detection applications (described in [10]) that rely on the SMP algorithm [1] for real-time dense depth measurements.






smatt_3D

OpenGL based real-time 3D visualization of the depth map provided by the SMP algorithm [1]




Experimental results with stereo pairs with ground truth:
comparison between SMP and BM algorithms


This section provides experimental results obtained with SMP [1] and BM [3] on a standard set of stereo pairs (namely "Tsukuba", "Map", "Sawtooth", "Venus", "Barn1", "Barn2", "Bull" and "Poster") with available ground truth. The stereo pairs and the ground truth are available at the Scharstein and Szeliski's [4] web site. Disparity values are encoded with 256 gray levels, with brighter levels representing points closer to the camera and unmatched points represented in white.

Click on the image to view the results obtained by SMP and BM algorithms.


preview       preview
Tsukuba                                                              Map


 
preview       preview
        Venus                                                  Sawtooth

  
     
      
preview       preview    
        Barn1                                                    Barn2


 
preview       preview
        Bull                                                    Poster




Figure 1 reports the execution times obtained on a Pentium III 800 MHz running the two algorithms on  320x240, 640x480, 800x600 and 1024x768 pixels images and with disparity ranges of 16, 32, 48, 64, and 80 pixels.  The graph shows that with a small disparity range and small image sizes the BM algorithm is slightly faster. However, as soon as disparity and/or image size increases SMP algorithm gets faster. The SMP algorithm turns out to be significantly faster with a large disparity ranges and/or image sizes.


Performance
Figure 1: Performance in terms of msec per frame for SMP and BM on Pentium III 800 MHz processor


For example; on a Pentium III Processor at 800 MHz, with 800x600 stereo pairs and a disparity range of 16 our algorithm runs at 5.56 fps while BM at 6.96. With this image size and a disparity range of 80 our SMP algorithm is nearly twice faster than BM (i.e. 2.89 fps for SMP and 1.51 for BM).




Experimental results with real stereo sequences
and applications of the SMP algorithm


This section  presents experimental results obtained on stereo sequences acquired in our laboratory with a  monochrome MEGA-D digital stereo head (by Videre Design)  equipped with a pair of 4.8 mm lenses.



Calibration of the stereo camera: dataset and results

The MEGA-D stereo head uses a IEEE 1394 firewire interface and has a fixed baseline of  about 9 cm. The original stereo pairs were rectified using the intrinsic and extrinsic camera parameters estimated with the functions provided by the MATLAB Camera Calibration Toolbox available here.  Image size is 640x480 and the rectified sequences were processed using a 15x15 correlation window, a disparity search range of 64 pixels and a subpixel accuracy of 1/8 .
  
            • The stereo pairs used for the calibration of the stereo camera are available here


              Calibration        Calibration

              The calibration result (estimated intrinsic and extrinsic parameters) is available here 
              results



Application I: "3D tracking"

In this section we show experimental results obtained with SMP [1] on two stereo sequences acquired in our laboratory and referred to as "Outdoor" and "Indoor". We are currently using these sequences within a research activity aimed at developing a real-time 3D People Tracking application. The tracking approach is based on first merging the disparity maps extracted by SMP algorithm with the information provided by a change-detection algorithm in order to build a suitable plan-view representation [6] and [7] that enables us to track, in real-time, moving objects in the 3D space.



Stereo sequence: "Lab_1"

The videos are provided in DivX format.



"Lab_1" stereo sequence:
3D tracking  (video available here)

Sequence acquired with a VidereDesign stereo color camera
@640x480. Rectified stereo pairs, output of the SMP algorithm and other details will be provided soon.






Stereo sequence: "Lab_2"

The videos are provided in DivX format.



"Lab_1" stereo sequence: 3D tracking  (video available here)

Sequence acquired with a VidereDesign stereo color camera @640x480. Rectified stereo pairs, output of the SMP algorithm and other details will be provided soon.


"Cortile " stereo sequence


Background




Moving people


At this link you can find information concerned with the stereo sequence "Cortile". We provide: the rectified stereo sequences (320x240 and 640x480), the disparity maps (for five settings 1, 2, ,4 , 8, 16 of the subpixel parameter) computed by the SMP stereo algorithm and the parameters for obtaining the 3D depth measurements.  Disparity maps are encoded with RGB images (saved with OpenCV) as described in the README file.



"Outdoor" stereo sequence

The videos are provided for best quality in zipped AVI format.
The videos are also provided in DivX format (the DivX codec is available at www.divx.com
)

Outdoor sequence

Frame 0050 of the "Outdoor" stereo sequence
(Top Left) Original Left Image, (Top Right) Original Right Image,
(Bottom Left) Rectified Left Image, (Bottom Right) Rectified Right Image.
The entire video; in DivX format is available here
(size 2.8 MB), in zipped AVI format is available here (size 32.1 MB)



Outdoor sequence
Results on frame 0050 of the "Outdoor" stereo sequence
(Top Left) Disparity map with threshold set to 0, (Top Right) Disparity map with threshold set to 1,
(Bottom Left) Disparity map with threshold set to 2, (Bottom Right) Disparity map with threshold set to 3.
The entire video; in DivX format is available here
(size 2.8 MB), in zipped AVI  format is available here (size 5.12 MB)



Outdoor sequence
Results on frame 0050 of the "Outdoor" stereo sequence
(Top Left) Original Left image, (Top Right) Rectified Left image,
(Bottom Left) Disparity map with threshold set to 0, (Bottom Right) Disparity map with threshold set to 3.
The entire video; in DivX format is available here
(size 2.8 MB), in zipped AVI format is available here (size 51.1 MB)




3D Tracking

Preliminary results of the real-time 3D tracking application on frame 0219 of the "Outdoor" stereo sequence
(Top Left) Rectified Left image, (Top Right)
Disparity map with threshold set to 0,
(Bottom Left) Output of the change detection merged with the disparity map, (Bottom Right) Detected 3D position of the moving people/objects in the field of view of the cameras.
The entire video; in DivX format is available here
(size 4.4 MB), in zipped AVI format is available here (size 29.6 MB)






  "Indoor" stereo sequence

The videos are provided for best quality in (zipped) AVI format.
Some videos are also provided in DivX format (the DivX codec is available at www.divx.com
)

Outdoor sequence

Frame 0103 of the "Indoor" stereo sequence
(Top Left) Original Left Image, (Top Right) Original Right Image,
(Bottom Left) Rectified Left Image, (Bottom Right) Rectified Right Image.
The entire video; in DivX format, is available here
(size 2.8 MB), in zipped AVI format is available here (size 46.9 MB)



Outdoor sequence
Results on frame 0103 of the "Indoor" stereo sequence
(Top Left) Disparity map with threshold set to 0, (Top Right) Disparity map with threshold set to 1,
(Bottom Left) Disparity map with threshold set to 2, (Bottom Right) Disparity map with threshold set to 3.
The entire video; in
in DivX format is available here (size 12.8 MB), in zipped AVI format is available here (size 15.6 MB)




Outdoor sequence
Results on frame 0103 of the "Indoor" stereo sequence
(Top Left) Original Left image, (Top Right) Rectified Left image,
(Bottom Left) Disparity map with threshold set to 0, (Bottom Right) Disparity map with threshold set to 3.
The entire video; in in DivX format is available here (size 10 MB), in zipped AVI format is available here (size 46.9 MB)





3D Tracking

Preliminary results of the real-time 3D tracking application on frame 0120 of the "Indoor" stereo sequence
(Top Left) Rectified Left image, (Top Right)
Disparity map with threshold set to 0,
(Bottom Left) Output of the change detection merged with the disparity map, (Bottom Right) Detected 3D position of the moving people/objects in the field of view of the cameras.
The entire video; in in DivX format is available here (size 8.3 MB), in zipped AVI format is available here (size 27.2 MB)







Application II: "3D people counting"


This section shows preliminary results of  another application aimed at counting in real-time people moving in the field of view of a stereo camera. The 3D People Counting application measures the flow of  people crossing a virtual gate in the 3D space. The green line on the floor, in the first shot of the "Count" stereo sequence, shows the 3D position of the virtual gate between regions A and B. The 3D People Counting application relies on the 3D depth measurements provided by SMP algorithm [1] for tracking and counting people in real-time using a  plan-view representation [6] and [7]. The application counts people crossing from region A to region B (red in the plan view map on the right) and people crossing from region B to region A (green in the plan view map on the right). A video containing the entire sequence is available here.


People Counting


0053


0100


0116


0165


0301

Preliminary results of the real-time 3D People Counting application: 
(Left) Original Left image of the "Count" stereo sequence
(Right) Detected 3D position of the tracked people in the field of view of the cameras and statistics about the crossing in the two directions (A->B and B->A).
The entire video, in DivX format, is available here (size 11.7 MB)





Application III: "3D Change/Intrusion Detection"


This section provides experimental results concerned with a robust real-time Change/Intrusion Detection approach, described in [10], which jointly exploits depth information coming from a 3D device and 2D brightness information. Information on scene changes is recovered by means of two different strategies. The former, referred to as 3D Output, mainly relies on depth information, and aims at being robust to camouflage, shadows and sudden illumination changes. The latter, referred to as 2D Output, aims at obtaining robustness with regards to sudden illumination changes as well as accuracy in the foreground segmentation. The final change masks determined by the two outputs will be referred to as, respectively, C2D and C3D.  As depicted the following figure, the proposed approach, using as 3D device a stereo vision system, can be outlined as a 4-stage algorithm.  The overall system relies on the SMP algorithm [1] for real time dense depth measurements.



Flow-diagram

Flow diagram of the 3D Change/Intrusion Detection application.
A detailed description of the overall approach can be found in [10]



The following figure shows preliminary experimental results obtained processing a challenging stereo sequence, referred to as "Office", with the 3D Change/Intrusion Detection application. In this indoor sequence, acquired with a rectified color stereo camera, the strong photometric distortions (clearly visible comparing the 9 frames shown in the following figure) are induced by switching lights on and off. Moreover, it is worth observing that the same sequence is also affected by severe shadow and camouflage problems. The overall 3D Change/Intrusion Detection application, includng the disparity maps generation step, runs in real-time on a standard personal computer.



Office: experimental results


Preliminary experimental results on 9 out of 195  frames of the Office stereo sequence:
(First column) - Reference image F of the stereo pair (Second column) Background model B2D registered according to the specification given by the histogram of the frame F
(Third column) - Disparity map D computed by the SMP algorithm (Fourth column) - Change mask C2D provided by the 2D Output approach
(Fifth column) - Change mask C3D provided by the 3D Output approach.




NOTE

If you use  the "Indoor", "Outdoor" or "Office"
datasets or the dataset used for the calibration of the stereo head please cite this website:

www.vision.deis.unibo.it/smatt/stereo.htm


In you use the disparity maps computed with the SMP algorithm available on this site please cite paper [1]:

L. Di Stefano, M. Marchionni, S. Mattoccia

A fast area-based stereo matching algorithm
 
Image and Vision Computing 22(12), pp 983-1005, October 2004






References

[1] L. Di Stefano, M. Marchionni, S. Mattoccia
A fast area-based stereo matching algorithm
Image and Vision Computing 22(12) pp 983-1005, October 2004
[
Abstract] [Pdf]  [Bibtex

[2] L. Di Stefano, M. Marchionni, S. Mattoccia
"A PC-based real-time stereo vision system"
Machine GRAPHICS & VISION 13(4) pp 197-220, January 2004
[
Abstract] [Pdf] [Bibtex]

[3] K. Konolige
"Small Vision Systems: hardware and implementation"
Eighth International Symposium on Robotics Research
Hayama, Japan, October 1997

[4] D. Scharstein and R. Szeliski
Middlebury Stereo Vision Page: http://vision.middlebury.edu/stereo/

[5] D. Scharstein and R. Szeliski.
"A taxonomy and evaluation of dense two-frame stereo correspondence algorithms"

IJCV 47(1/2/3):7-42, April-June 2002
Microsoft Research Technical Report MSR-TR-2001-81, November 2001

[6] T. Darrell, D. Demirdijan, N. Checka, P. Felzenszwalb
"Plan-view trajectory estimation with dense stereo background models"
International Conference on Computer Vision (ICCV 2001), 2001

[7] M. Harville
"Stereo person tracking with adaptive plan-view templates of height and occupancy statistics"
Image and Vision Computing 22(2) pp 127-142, February 2004

[8] F. Tombari, S. Mattoccia, L. Di Stefano
"Segmentation-based adaptive support for accurate stereo correspondence
"

IEEE Pacific-Rim Symposium on Image and Video Technology  (PSIVT 2007)
December 17-19, 2007, Santiago, Chile
[Abstract] [Pdf] [Bibtex]

[9] S. Mattoccia, F. Tombari, L. Di Stefano,
Stereo vision enabling precise border localization within a scanline optimization framework"
8th Asian Conference on Computer Vision  (ACCV 2007)
November 18-22, 2007, Tokyo, Japan
[Abstract] [Pdf] [Bibtex]

[10] F. Tombari, S. Mattoccia, L. Di Stefano, F. Tonelli
Detecting motion by means of 2D and 3D information "
ACCV'07 Workshop on Multi-dimensional and Multi-view Image Processing (ACCV 2007 WS)
November 19, 2007, Tokyo, Japan
[Abstract] [Pdf] [Bibtex]

[11] F. Tombari, S. Mattoccia, L. Di Stefano, E. Addimanda
Classification and evaluation of cost aggregation methods for stereo correspondence"
IEEE International Conference on Computer Vision and Pattern Recognition (CVPR 2008)
June 24-26, 2008, Anchorage, Alaska
[Abstract] [Pdf] [Bibtex] [Accompanying page]

[12] F. Tombari, S. Mattoccia, L. Di Stefano, E. Addimanda
Near real-time stereo based on effective cost aggregation"
International Conference on Pattern Recognition (ICPR 2008)
December 8-11, 2008, Tampa, Florida, USA
[Abstract] [Pdf] [Bibtex] [Evaluation]

COPYRIGHT NOTICE: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.  





 
Home
 
  

last update on: December 9, 2012