Benchmarking: from single modality to multi-modal perception for robotics

Benchmarking is fundamental to advance research. Here we expose the benchmarks used in perception (video and audio) and robotics, while you can check out our previous post to know more about Open Research Data & Data Sharing link.

The vision community has developed multiple evaluation platforms, including the KITTI Vision Benchmark Suite ( that consists of a dataset captured by a set of sensors in a car for tasks such as stereo-camera processing, optical flow, visual odometry, 3D object detection and 3D tracking, by providing raw data, benchmarks and evaluation metrics for the different tasks; the Multiple Objects Tracking (MOT benchmark that allows a fair evaluation of multi-person tracking algorithms by providing a dataset, person detections, a common evaluation protocol and several specific challenges; and the Middlebury Stereo Vision campaign ( for the evaluation of stereo-vision algorithms by distributing several stereo datasets with ground-truth disparities and by allowing an online submission system to automatically evaluate new algorithms. Audio-focused campaigns include the Signal Separation Evaluation Campaign (SiSEC which is community-based and allows the comparison of the performance of source audio separation systems on the same data and metrics; and the CHiME ( Speech Separation and Recognition Challenge which aims to standardise data and evaluation metrics for conversational speech recognition.

In robotics, there exists the YCB benchmark ( that facilitates benchmarking for robotic manipulation by providing a set objects with different shapes, sizes, textures, weight and rigidity, their mesh models and high-resolution RGB-D scans as well as widely used manipulation sets for models easy to incorporate into manipulation and planning software platforms; the Amazon Picking Challenge ( that is designed to evaluate solutions for robotic pick-and-place in tasks that go from picking packages in a logistics centre to bin-picking in a manufacturing plant, from unloading groceries at home to clearing debris after a disaster; the ACRV picking benchmark ( that contains 42 commonly-available shelf objects, a set of stencils and standardised task setups to replicated real-world conditions; and the Surreal Robotics Suite ( that is a toolkit and simulation benchmark designed to enable reproducible robotics research and to make Deep Reinforcement Learning in robot manipulation accessible to everyone, by introducing an open-source, reproducible and scalable distributed reinforcement learning framework.

CORSMAL will be providing the research community with a benchmark for multi-modal perception. Follow us on twitter @corsmal for updates.

CHIST-ERA Projects Seminar 2019

The CORSMAL team has contributed to the Open Research Data & Data Sharing special session CHIST-ERA Projects Seminar 2019, held on 3 and 4 April in Bucharest (link to the slides of our presentation).

Open Research Data and Data Sharing are parts of the Open Science movement that aims to enable sustainable and reproducible research. CORSMAL aims to share Data and Models that enable human-robot handover of unknown objects.

Significant efforts are required to properly obtain, annotate and curate Data, and an Open platform can accelerate the design and validation of new solutions and accelerate the spreading of novel ideas. Along with Data, an Open Evaluation Methodology & Experiment Reproducibility is fundamental to formally assess the performance. Examples of such platforms already exist for computer vision and audio analysis (research areas directly related to CORSMAL), as well as for specific robotic tasks. However, the human-in-the-loop scenario considered in our project makes Reproducibility a very challenging task.

CORSMAL is committed to Open Science and we aim to create and distribute Datasets, Models and Evaluation Protocols for the development of collaborative solutions for human-robot object handover through visual, auditory and tactile sensing. We hope that our efforts will enable the advancement of the research and the formulation of new solutions to predict the movements of a person and to estimate the physical properties of objects, and in turn allow accurate, robust and safe planning for the handover.

Stay tuned!

CHISTERA meeting slide