Visually-Grounded Library of Behaviours for Generalizing Manipulation Across Objects, Configurations, and Views
In Submission
We propose a visually-grounded library of behaviors approach for manipulating diverse objects across a wide range of initial and goal configurations and camera placements. We disentangle the standard image-to-action or image-to-state-to-action process into two separate modules: (1) a behavior selector built with affordance-aware and view-invariant visual feature representations that conditions on the invariant object properties to select a behavior, and (2) a library of behaviors each of which conditions on the variable object properties to act in the environment. Our framework outperforms various learning and non-learning based baselines in graping and pushing tasks.
[ Paper ]