Online 3D Segmentation
RGB-D Semantic Segmentation Dataset

This page provides a RGB-D Segmentation dataset with groundtruth acquired with a Microsoft Kinect device. The dataset includes 5 (+ 1 for the background) categories of common grocery products such as packets of biscuits, juice bottles, coffee cans and boxes of salt, of different brands and colors. The training set includes 3 model views for each category, while the testing scenes are 16, including a high degree of clutter and occlusions. Thanks to the deployed device, this dataset includes both color and depth. It also includes ground-truth, i.e. the correct label to be assigned to each point of the test set.

The dataset has been proposed in [1]. Please cite appropriately this page and [1] if you plan to use the dataset for any kind of scientific work or publication. For any question feel free to write at:

federico (DOT) tombari (AT) unibo (DOT) it.


The dataset is structured into two main folders:

Training set:

Each model of each of the 6 categories includes 3 files:
  • .yaml: this file stores 4 arrays in the OpenCV "CvMat" format and represents the 640x480 RGB-D range map. The first 3 float arrays (named "X", "Y", "Z") are reserved for the depth (x,y,z values), while the 4th array (named "T") is a 3-channel IplImage that stores the RGB texture information. We suggest to load it by means of the cvLoad  function included in the OpenCV library.
  • .ply: the 3D mesh; can be viewed by means of , e.g., MeshLab
  • .png: the RGB image

Test set:

Each of the 16 test scenes includes the following files (XX is a number from 0 to 15):
  • SceneXX.yaml: the RGB-D range map - see above.
  • SceneXX.ply: the 3D mesh
  • SceneXX_2D.png: the RGB image
  • SceneXX_GT2D.XYZ_LABEL_CONF: the groundtruth labels, referred to each point of the 2D arrays of the range map
  • SceneXX_GT.XYZ_LABEL_CONF: the groundtruth labels, referred to each (x,y,z) value of the 3D mesh.
  • SceneXX_GT3D.ply: the scene 3D mesh where the original texture has been substituted by the groundtruth labels. Useful to visualize the groundtruth.


The dataset can be downloaded as two separate zip files (training set and test set):

Dataset (training set) (zip format - 43.7 MB)

Dataset (test set) (zip format -  310 MB)


Scene (2D) Scene (3D) GroundTruth (3D)

The Figure shows an example of a scene of the dataset, reporting the RGB image, a snapshot of the 3D mesh and a snapshot of the 3D groundtruth labels.


[1] F. Tombari, L. Di Stefano, S. Giardino, "Online Learning for Automatic Segmentation of 3D Data", IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS '11), 2011