CORSMAL Containers Manipulation

The dataset consists of 1,140 audio-visual recordings with 12 human subjects manipulating 15 containers, split into 5 cups, 5 drinking glasses, and 5 food boxes. These containers are made of different materials, such as plastic, glass and paper. Each container can be empty or filled with water, rice or pasta at two different levels of fullness: 50% and 90% with respect to the capacity of the container. The combination of containers and fillings results in a total of 95 configurations acquired for three scenarios with an increasing level of difficulty, caused by occlusions or subject motions:

  • Scenario 1 Scenario 1. The subject sits in front of the robot, while a container is on a table. The subject pours the filling into the container, while trying not to touch the container, or shakes an already filled food box, and then initiates the handover of the container to the robot.

  • Scenario 2 Scenario 2. The subject sits in front of the robot, while holding a container. The subject pours the filling from a jar into a glass/cup or shakes an already filled food box, and then initiates the handover of the container to the robot.

  • Scenario 3 Scenario 3. A container is held by the subject while standing to the side of the robot, potentially visible from one third-person view camera only. The subject pours the filling from a jar into a glass/cup or shakes an already filled food box, takes a few steps to reach the front of the robot and then initiates the handover of the container to the robot.

Each scenario is recorded with two different backgrounds and under two different lighting conditions. The first background condition involves a plain tabletop with the subject wearing a texture-less t-shirt, while the second background condition involves the table covered with a graphics-printed tablecloth and the subject wearing a patterned shirt. The lighting conditions include ceiling room lights and controlled lights. The 95 configurations are executed by a different subject for each scenario and for each background/illumination condition.


Camera 1 Camera 2 Camera 3 Camera 4 Audio
Scenario 1
Red cup
(empty)
”animated” ”animated” ”animated” ”animated”
Scenario 2
Wine glass
(50% rice)
”animated” ”animated” ”animated” ”animated”
Scenario 3
Tea box
(90% pasta)
”animated” ”animated” ”animated” ”animated”

CCM setup

The dataset was acquired with 4 Intel RealSense D435i devices and one microphone array. Each Intel RealSense D435i device consists of 3 cameras and provides spatially aligned RGB, narrow-baseline stereo infrared and depth images at 30 Hz with 1280x720 pixels resolution. One D435i is mounted on a robot arm that does not move during the acquisition and provides a more realistic view of the operating area from the robot perspective. Another D435i is worn by the person at chest level to provide a first-person view, while the remaining two devices are placed at the sides of the robot arm as third-person views that look at the operating area. The microphone array is placed on a table and consists of 8 Boya BY-M1 omnidirectional Lavelier microphones arranged in a circular shape of radius 15 cm. Audio signals are sampled synchronously at 44.1 kHz with a multi-channel audio recorder. All signals are software-synchronized with a rate of 30 Hz. The calibration information (intrinsic and extrinsic parameters) for each Intel RealSense D435i and the inertial measurements of the Intel RealSense D435i used as body-worn camera are also provided.

Microphone locations
README

Data organisation

The dataset is split into training set (9 containers, 684 configurations), public test set (3 unseen containers, 228 configurations), and private test set (3 unseen containers, 228 configurations). The containers for each set are evenly distributed among the three categories. The dataset is annotated with the capacity of the container, the filling type, the filling level, the mass of the container, the mass of the filling, the maximum width and height (and depth for boxes) of each object.

To facilitate the download, we provide archive ZIP files for each Intel RealSense D435i (view 1, view 2, view 3, view 4) and type of data (RGB, depth, infrared, IMU), and for the audio data. Depth data is provided as raw images. RGB and infrared data are provided as MP4 video files. Configurations are already shuffled and provided with a incremental numerical filename. ZIP files are split in chuncks of maximum 4.3 GB.

Because of the CORSMAL Challenges, we provide annotations only for the training set. The public testing set is currently password protected. Please email us to provide the password. The second testing set remains private for validations of the participants' solutions to the CORSMAL Challenge.

Important notes
  • We aim at releasing both the public and private testing sets as open access in 2025 via QMRO
  • The organisation of the files and filenames in the dataset have been changed compared to how the datasets was originally released for the first CORSMAL Challenge. This change was done to facilitate the download and the parsing/loading of the dataset. We provide mapping of the filenames and data structure with previous version of the dataset for the training set: ccm_annotations_train_set_mapping
  • The audio file 00377.wav of the training set is longer than what is supposed to be, due to a missing interruption of the audio recording. The duration of the configuration is 13.10 s and hence the audio file can be trimmed at the time instant.

Training set
CORSMAL-Challenge red cup CORSMAL-Challenge small white cup CORSMAL-Challenge small transparent cup CORSMAL-Challenge green glass CORSMAL-Challenge wine glass CORSMAL-Challenge flute glass CORSMAL-Challenge cereal box CORSMAL-Challenge biscuit box CORSMAL-Challenge tea box
Red cup Small white cup Small transparent cup Green glass Wine glass Champagne flute glass Cerela box Biscuit box Tea box

Please download the data that you only need. The storage of all training set is of about 230 GB.

View1 View2 View3 View4
RGB ccm_train_view1_rgb.z01(~4.3GB)
ccm_train_view1_rgb.zip(~2.5GB)
ccm_train_view2_rgb.z01(~4.3GB)
ccm_train_view2_rgb.zip(~2.4GB)
ccm_train_view3_rgb.z01(~4.3GB)
ccm_train_view3_rgb.zip(~3.3GB)
ccm_train_view4_rgb.z01(~4.3GB)
ccm_train_view4_rgb.z02(~4.3GB)
ccm_train_view4_rgb.zip(~0.3GB)
Depth ccm_train_view1_depth.z01(~4.3GB)
ccm_train_view1_depth.z02(~4.3GB)
ccm_train_view1_depth.z03(~4.3GB)
ccm_train_view1_depth.z04(~4.3GB)
ccm_train_view1_depth.z05(~4.3GB)
ccm_train_view1_depth.z06(~4.3GB)
ccm_train_view1_depth.z07(~4.3GB)
ccm_train_view1_depth.z08(~4.3GB)
ccm_train_view1_depth.z09(~4.3GB)
ccm_train_view1_depth.z10(~4.3GB)
ccm_train_view1_depth.z11(~4.3GB)
ccm_train_view1_depth.z12(~4.3GB)
ccm_train_view1_depth.z13(~4.3GB)
ccm_train_view1_depth.z14(~4.3GB)
ccm_train_view1_depth.zip(~1.5GB)
ccm_train_view2_depth.z01(~4.3GB)
ccm_train_view2_depth.z02(~4.3GB)
ccm_train_view2_depth.z03(~4.3GB)
ccm_train_view2_depth.z04(~4.3GB)
ccm_train_view2_depth.z05(~4.3GB)
ccm_train_view2_depth.z06(~4.3GB)
ccm_train_view2_depth.z07(~4.3GB)
ccm_train_view2_depth.z08(~4.3GB)
ccm_train_view2_depth.z09(~4.3GB)
ccm_train_view2_depth.z10(~4.3GB)
ccm_train_view2_depth.z11(~4.3GB)
ccm_train_view2_depth.z12(~4.3GB)
ccm_train_view2_depth.z13(~4.3GB)
ccm_train_view2_depth.zip(~2.7GB)
ccm_train_view3_depth.z01(~4.3GB)
ccm_train_view3_depth.z02(~4.3GB)
ccm_train_view3_depth.z03(~4.3GB)
ccm_train_view3_depth.z04(~4.3GB)
ccm_train_view3_depth.z05(~4.3GB)
ccm_train_view3_depth.z06(~4.3GB)
ccm_train_view3_depth.z07(~4.3GB)
ccm_train_view3_depth.z08(~4.3GB)
ccm_train_view3_depth.z09(~4.3GB)
ccm_train_view3_depth.z10(~4.3GB)
ccm_train_view3_depth.z11(~4.3GB)
ccm_train_view3_depth.z12(~4.3GB)
ccm_train_view3_depth.z13(~4.3GB)
ccm_train_view3_depth.zip(~1.2GB)
ccm_train_view4_depth.z01(~4.3GB)
ccm_train_view4_depth.z02(~4.3GB)
ccm_train_view4_depth.z03(~4.3GB)
ccm_train_view4_depth.z04(~4.3GB)
ccm_train_view4_depth.z05(~4.3GB)
ccm_train_view4_depth.z06(~4.3GB)
ccm_train_view4_depth.z07(~4.3GB)
ccm_train_view4_depth.z08(~4.3GB)
ccm_train_view4_depth.z09(~4.3GB)
ccm_train_view4_depth.zip(~0.2GB)
Infrared ccm_train_view1_ir1.zip(~2.2GB)
ccm_train_view1_ir2.zip(~2.2GB)
ccm_train_view2_ir1.zip(~2.2GB)
ccm_train_view2_ir2.zip(~2.2GB)
ccm_train_view3_ir1.zip(~2.8GB)
ccm_train_view3_ir2.zip(~2.8GB)
ccm_train_view4_ir1.z01(~4.3GB)
ccm_train_view4_ir1.zip(~3.3GB)
ccm_train_view4_ir2.z01(~4.3GB)
ccm_train_view4_ir2.zip(~3.5GB)
Calib. ccm_train_view1_calib.zip(~0.9MB)
ccm_train_view2_calib.zip(~0.9MB)
ccm_train_view3_calib.zip(~0.9MB)
ccm_train_view4_calib.zip(~0.3MB)
IMU / / / ccm_train_view4_imu.zip(~0.05GB)

Audio: ccm_train_audio.zip (~2.8GB)
Annotation
List of containers and their physical properties

Script to download the training set (Unix, Windows) (~300GB)

Public testing set
CORSMAL-Challenge beer cup CORSMAL-Challenge cocktail glass CORSMAL-Challenge fusilli_box
Beer cup Cocktail glass Fusilli pasta box
View1 View2 View3 View4
RGB ccm_test_pub_view1_rgb.zip(~2.3GB)
ccm_test_pub_view2_rgb.zip(~2.3GB)
ccm_test_pub_view3_rgb.zip(~2.5GB)
ccm_test_pub_view4_rgb.zip(~3.3GB)
Depth ccm_test_pub_view1_depth.z01(~4.3GB)
ccm_test_pub_view1_depth.z02(~4.3GB)
ccm_test_pub_view1_depth.z03(~4.3GB)
ccm_test_pub_view1_depth.z04(~4.3GB)
ccm_test_pub_view1_depth.zip(~4.0GB)
ccm_test_pub_view2_depth.z01(~4.3GB)
ccm_test_pub_view2_depth.z02(~4.3GB)
ccm_test_pub_view2_depth.z03(~4.3GB)
ccm_test_pub_view2_depth.z04(~4.3GB)
ccm_test_pub_view2_depth.zip(~2.5GB)
ccm_test_pub_view3_depth.z01(~4.3GB)
ccm_test_pub_view3_depth.z02(~4.3GB)
ccm_test_pub_view3_depth.z03(~4.3GB)
ccm_test_pub_view3_depth.z04(~4.3GB)
ccm_test_pub_view3_depth.zip(~2.3GB)
ccm_test_pub_view4_depth.z01(~4.3GB)
ccm_test_pub_view4_depth.z02(~4.3GB)
ccm_test_pub_view4_depth.z03(~4.3GB)
ccm_test_pub_view4_depth.zip(~0.5GB)
Infrared ccm_test_pub_view1_ir1.zip(~0.8GB)
ccm_test_pub_view1_ir2.zip(~0.8GB)
ccm_test_pub_view2_ir1.zip(~0.8GB)
ccm_test_pub_view2_ir2.zip(~0.8GB)
ccm_test_pub_view3_ir1.zip(~1.0GB)
ccm_test_pub_view3_ir2.zip(~1.0GB)
ccm_test_pub_view4_ir1.zip(~2.5GB)
ccm_test_pub_view4_ir2.zip(~2.6GB)
Calib. ccm_test_pub_view1_calib.zip(~0.9MB)
ccm_test_pub_view2_calib.zip(~0.9MB)
ccm_test_pub_view3_calib.zip(~0.9MB)
ccm_test_pub_view4_calib.zip(~0.3MB)
IMU / / / ccm_test_pub_view4_imu.zip(~0.05GB)

Audio: ccm_test_pub_audio.zip (~2.8GB)

Script to download public testing set (Unix, Windows) (~95GB).
Annotation of the container ID: ccm_test_pub_annotation_container_id.csv


Info and queries

For any enquiries, questions, concerns and general feedback, please contact us.


License

This work is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/4.0/.


Acknowledgments

If you use the dataset, please use this citation:
CORSMAL Containers Manipulation (1.0) [Data set]
A. Xompero, R. Sanchez-Matilla, R. Mazzon, and A. Cavallaro
Queen Mary University of London. https://doi.org/10.17636/101CORSMAL1

You can also refer to the following publication:
The CORSMAL benchmark for the prediction of the properties of containers
A. Xompero, S. Donaher, V. Iashin, F. Palermo, G. Solak, C. Coppola, R. Ishikawa, Y. Nagao, R. Hachiuma, Q. Liu, F. Feng, C. Lan, R. H. M. Chan, G. Christmann, J. Song, G. Neeharika, C. K. T. Reddy, D. Jain, B. U. Rehman, A. Cavallaro
IEEE Access, vol. 10, April 2022. https://doi.org/10.1109/ACCESS.2022.3166906