Human-to-Robot Handovers
(Essential Skills Sub-Track 4)
Solution design and development
- Inferences must be generated automatically by a model that uses images or videos as input
- No prior knowledge of the objects is available. The only prior knowledge available to the models is the high-level set of categories of the containers (cup, drinking glass, food box).
- The use of prior 3D object models is not allowed (e.g., the reconstructed shapes of the containers in 3D).
- Each subject is instructed to use only one hand and always the same.
- The location where the robot needs to deliver the container must be inferred using perception from the initial location and not hard-coded (to be confirmed).
- Learning across executions of configurations is not allowed: participants must not update/fine-tune the vision/robotic algorithms using the data/measurements captured during the execution of the configurations of the benchmark by the subjects (test time).
- Online solutions - i.e., solutions that can be run on a continuous stream as for the case of human-to-robot handover - are preferred. To encourage this type of solution, organisers will refer to the CORSMAL real-to-simulation framework that allows the participants to observe models would perform for a human-to-robot handover.
- Calls to exisiting Large Language Models are allowed.
- Tactile sensors are allowed as part of the solution.
- Both RGB and RGB-D inputs can be used for the solution.