# The CORSMAL Containers Manipulation dataset *Authors*: * Alessio Xompero * Ricardo Sanchez-Matilla * Apostolos Modas * Riccardo Mazzon * Andrea Cavallaro * Pascal Frossard * Info: corsmal-challenge@qmul.ac.uk Created date: 2019/12/18 Modified date: 2020/07/20 Version: 0.2 Resource type: Videos, images, audio and inertial data [CORSMAL Containers Manipulation webpage](http://corsmal.eecs.qmul.ac.uk/containers_manip.html) ## Description The training set is composed of: * 12 subjects * 9 objects (3 drinking cups, 3 drinking glasses and 3 food boxes) * 3 filling types for cups and glasses (rice, pasta, water) * 2 filling types for food boxes (rice, pasta) * 3 filling levels (empty, 50%, 90%) * 2 background conditions: * plain: clear table and subject wearing a uniform shirt * textured: tabletop covered by a textured tablecloth and subject wearing a coloured shirt with a repetitive pattern * 2 illumination conditions: * ceiling light * controlled light ## Data organisation The data is structured per object and by data type (RGB, infrared, depth, audio, IMU, calibration). The name of the files indicate the subject (sA), filling type (fiB), filling level (fuC), background (bD), lights (lE) and the camera id (cF). ### RGB Each file contains the RGB video: /sA_fiB_fuC_bD_lE_cF.mp4 Videos are encoded with the x264 codec and visual lossless parameters. ### Infrared Each file contains the stereo infrared video: /sA_fiB_fuC_bD_lE_cF_ir1.mp4 (for left infrared camera), and /sA_fiB_fuC_bD_lE_cF_ir2.mp4 (for right infrared camera) Videos are encoded with the x264 codec and visual lossless parameters. ### Depth Each directory contains the depth frames: /cF/sA_fiB_fuC_bD_lE/.png Images are 16-bit encoding the estimate distance to the pixel. To obtain distance in millimeters divide the value by 1000. ### Audio Each file contains the audio recording: /sA_fiB_fuC_bD_lE_audio.mp4 Audio are recorded at 44.1KHz with an 8-element circular microphone array of radius 15 cm. Microphone locations are manually measured with an uncertainty up to 2 cm in each direction. Recordings were performed in a university room in different moments of the day over a period of 8 days. Therefore audio signals are affected by background noise, such as office noise and outside noise (e.g. busy street and wind). ### IMU Each file contains the accelerometer: /sA_fiB_fuC_bD_liE_accel_cF.csv; and giroscope: /sA_fiB_fuC_bD_liE_gyro_cF.csv Accelerometer files contain 4 columns that indicates the time stamp, X, Y and Z components. Gyroscope files contain 4 columns that indicates the time stamp, X, Y and Z components. ### Calibration Each caliration file provides the intrinsic (focal lenght, fx and fy; principal point, cx and cy) and extrinsic (3x3 rotation matrix, Rx; and 3x1 translation vector, Tx) for each camera. Sample Python code for reading to the calibration parameters: ``` import pickle calibration = pickle.load(open( "", "rb")) intrinsic = calibration[0]['rgb'] extrinsic_rotation = calibration[1]['rgb']['rvec'] extrinsic_translation = calibration[1]['rgb']['tvec'] ``` Intrinsic parameters are structured as: fx 0 cx 0 fy cy 0 0 1 Rotation extrinsic parameters are structured as: R11 R12 R13 R21 R22 R23 R31 R32 R33 Translation extrinsic parameters are structured as: T1,T2,T3 Camera 4 does not contain rotation and translation information mic_loc.txt contains the location of each of the 8 microphones with respect to the camera reference system, as well as the centre of the circular microphone array (referred as origin). ### Annotation The file _annotations.csv_ contains the list of 57 configurations. For each configuration, we annotate: * Capacity of the container [mL] * Mass of the container [g] * Mass of the filling [g] * Filling type (empty, rice or pasta) * Filling level [%] ## Licence This work is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/4.0/ or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.