CORSMAL is exploring the fusion of multiple sensing modalities (touch, sound, and first and third person vision) to accurately and robustly estimate the physical properties of objects in noisy and potentially ambiguous environments.

We aim to develop a framework for recognition and manipulation of objects via cooperation with humans by mimicking human capability of learning and adapting across a set of different manipulators, tasks, sensing configurations and environments.

Our focus is to define learning architectures for multimodal sensory data and for aggregated data from different environments. The goal is to continually improve the adaptability and robustness of the learned models, and to generalise capabilities across tasks and sites.