CORSMAL | Human-to-Robot Handovers Sub-track | 9th Robotics Grasping and Manipulation Competition

Human-to-Robot Handovers
(Essential Skills Sub-Track 4)

Solution design and development

Inferences must be generated automatically by a model that uses images or videos as input
No prior knowledge of the objects is available. The only prior knowledge available to the models is the high-level set of categories of the containers (cup, drinking glass, food box).
The use of prior 3D object models is not allowed (e.g., the reconstructed shapes of the containers in 3D).
Each subject is instructed to use only one hand and always the same.
The location where the robot needs to deliver the container must be inferred using perception from the initial location and not hard-coded (to be confirmed).
Learning across executions of configurations is not allowed: participants must not update/fine-tune the vision/robotic algorithms using the data/measurements captured during the execution of the configurations of the benchmark by the subjects (test time).
Online solutions - i.e., solutions that can be run on a continuous stream as for the case of human-to-robot handover - are preferred. To encourage this type of solution, organisers will refer to the CORSMAL real-to-simulation framework that allows the participants to observe models would perform for a human-to-robot handover.
Calls to exisiting Large Language Models are allowed.
Tactile sensors are allowed as part of the solution.
Both RGB and RGB-D inputs can be used for the solution.