CORSMAL | Human-to-Robot Handovers Sub-track | 9th Robotics Grasping and Manipulation Competition

Human-to-Robot Handovers
(Essential Skills Sub-Track 4)

We provide details, instructions, and documentation for preparing the solution and trials of the handovers in your own lab.

Submission

Valid submissions include a) the .csv file with the results for each configuration; b) single video or video for each configuration to assess the validity of the results. Videos can be uploaded to an online platform (e.g., Youtube, Google drive).
Teams should send an email to Dr. Alessio Xompero with the .csv file and the link to the video(s) for each submission.
Teams should submit a report with the description of their solution and results (optional).

Template for submitting the results: here.
The .csv file must be named according to the following format: rgmc2024_est4_phase1_teamname_submission_v1.csv (replace teamname with the name of your team, and X with the submission number).

For each configuration, the file must provide the results for the following columns:

Target location [mm]: the position (x,y) where the robot should deliver the object. We recommed to provide a picture clearly showing the location of this point.
Final location [mm]: the position (x,y) of the centre of the base of the container at the end of the task.
Handover time [ms]: the total execution time from the moment the person is instructed to grasp the container to the moment the robot releases the gripper at the delivery location to place the container after the handover (unless the handover failed)
Initial mass [g]: the measured mass of the (filled) container before the execution of the configuration.
Final mass [g]: the measured mass of the (filled) container after the execution of the configuration.

Objects

The set of objects for the preparation phase is defined in the CORSMAL Benchmark and consists of four drinking cups with different properties: high deformability, medium transparency (Cup 1); average deformability, low transparency (Cup 2), average deformability, high transparency (Cup 3), and no deformability, high transparency (Cup 4). Cup 4 is the plastic wine glass from the YCB object database. These cups are inexpensive and available worldwide, have different shapes and sizes, different degrees of deformability, include textureless regions, transparencies and reflections that make the vision-based pose estimation challenging.

Purchase links for set of known drinking cups
Cup 1: https://shorturl.at/pFUVY
Cup 2: https://amzn.to/2QrsXH5
Cup 3: https://amzn.to/2JwRk3l
Cup 4: https://amzn.to/33zw4AY

Those who do not have or cannot purchase the containers provided in the above links can purchase local objects that resemble the characteristics of the substituted items. Team must inform the organiser of the selected local objects or other objects used to prepare the solution before the start of the competition for official acceptance.

Filling. To vary the properties of each cup (mass and deformability), containers are filled with two different amounts of rice (which are easy to purchase and - unlike liquids - harmless for the hardware): 0% (empty), and 90% (filled) of the total volume of the cup. The filling amounts are rounded to the smaller quarter of 100 ml to ease the replicability of the configurations. Filling amounts are 125 ml (cup 1), 400 ml (cup 2), 450 ml (cup 3), and 300 ml (cup 4).

Configurations

For the execution of the configurations, we recommend to prepare all the objects in advance on a table near the area where the handovers are executed. This speeds up the execution of all configurations.

There are 8 configurations repeated 3 times with a shuffled order for each block. We recommend inviting a different volunteer for each block of configurations to account for the variability in the execution of the handovers.

ID	Object	Level	Points
1	Empty wine glass (Cup 4)	Easy	5
2	Empty red cup (Cup 2)	Easy	5
3	Empty beer cup (Cup 3)	Medium	10
4	Empty white cup (Cup 1)	Medium	10
5	Filled wine glass (Cup 4)	Difficult	15
6	Filled red cup (Cup 2)	Difficult	15
7	Filled beer cup (Cup 3)	Hard	20
8	Filled white cup (Cup 1)	Hard	20
9	Filled red cup (Cup 2)	Difficult	15
10	Filled beer cup (Cup 3)	Hard	20
11	Empty wine glass (Cup 4)	Easy	5
12	Empty red cup (Cup 2)	Easy	5
13	Filled white cup (Cup 1)	Hard	20
14	Empty beer cup (Cup 3)	Medium	10
15	Filled wine glass (Cup 4)	Difficult	15
16	Empty white cup (Cup 1)	Medium	10
17	Filled red cup (Cup 2)	Difficult	15
18	Empty beer cup (Cup 3)	Medium	10
19	Filled wine glass (Cup 4)	Difficult	15
20	Empty red cup (Cup 2)	Easy	5
21	Empty wine glass (Cup 4)	Easy	5
22	Filled beer cup (Cup 3)	Hard	20
23	Filled white cup (Cup 1)	Hard	20
24	Empty white cup (Cup 1)	Medium	10
TOTAL			300

Procedure
For each configuration:

Prepare the container either empty or filled with its predefined content type and level
Weight the (filled) container before the execution of the task
Place the container at the centre of the table, at a distance not reachable by the robotic arm (safety)
The volunteer grasps the container from its location with a natural grasp
The volunteer carries the container with the intention of handing it over to the robot
The robot should track and predict the pose of the container to move the arm towards the handover area
The volunteer hands the container over to the robot
The robot closes the end effector and grasps the container
The robot delivers the container upright within the predefined area.
Measure the distance between the initial (e.g. centre of the table) and the delivery location of the container (if not failed)
Weight the (filled) container after the execution of the task

Note that the volunteer should avoid assisting the robot (i.e., remaining still at a location until the robot can pick up the container) or assuming an adversarial behaviour (i.e., making it harder for the robot to reach the object).

This procedure has been revised from the CORSMAL Human-to-Robot Handover Protocol document.

Setting up instructions

The setup includes a robotic arm with at least 6 degrees of freedom (e.g., UR5, KUKA) and equipped with a 2-finger parallel gripper (e.g., Robotiq 2F-85); a table where the handover is happening as well as where the robot is placed; selected containers and contents; up to two cameras (e.g., Intel RealSense D435i); and a digital scale to weigh the container. The table is covered by a white table-cloth. The two cameras should be placed at 40 cm from the robotic arm, e.g. using tripods, and oriented in such a way that they both view the centre of the table. The illustration below represents the layout in 3D of the setup within a space of 4.5 x 4.5 meters. The table has the following dimensions: W1800 x D600 x H700 mm.

Teams must prepare the sensing setup such that the cameras are synchronised, calibrated and localised with respect to a calibration board. We recommend the cameras recording RGB sequences at 30 Hz with a resolution of 1280 × 720 pixels (based on the setup used in the CORSMAL Benchmark).
Teams should verify the behaviour of the robotic arm prior to the execution of the task (e.g., end-effector, speed, kinematics, etc.)
Teams will prepare all configurations with their corresponding container and filling before starting the task.
Teams must weigh the mass of the container and content, if any, for each configuration before and after executing the handover to the robot, using a weight scale.
A volunteer from the team will be the person who will hand the container over to the robot using a random/natural grasp for each configuration.
Any initial robot pose can be chosen with respect to the environment setup; however, the subject is expected to stand on the opposite side of the table with respect to the robot.

These instructions have been revised from the CORSMAL Human-to-Robot Handover Benchmark document.

Starting kit and documentation

Benchmark for human-to-robot handovers of unseen containers with unknown filling
R. Sanchez-Matilla, K. Chatzilygeroudis, K., A. Modas, N.F. Duarte, A., Xompero, A., P. Frossard, A. Billard, A. Cavallaro
IEEE Robotics and Automation Letters, 5(2), pp.1642-1649, 2020
[Open Access]

Towards safe human-to-robot handovers of unknown containers
Y. L. Pang, A. Xompero, C. Oh, A. Cavallaro
IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), Virtual, 8-12 Aug 2021
[Open Access] [code] [webpage]

Vision baseline for CORSMAL Benchmark: a vision-based algorithm, part of a larger system, proposed for localising, tracking and estimating the dimensions of a container with a stereo camera.
[paper] [code] [webpage]

LoDE: a method that jointly localises container-like objects and estimates their dimensions with a generative 3D sampling model and a multi-view 3D-2D iterative shape fitting, using two wide-baseline, calibrated RGB cameras.
[paper] [code] [webpage]

The CORSMAL Challenge contains perception solutions for the estimation of the physical properties of manipulated objects prior to a handover to a robot arm.
[challenge] [paper 1] [paper 2]

Additional references
[document]

Human-to-Robot Handovers (Essential Skills Sub-Track 4)

Human-to-Robot Handovers
(Essential Skills Sub-Track 4)