Robot Learning | Human-Centered AI and Robotics Group

Publications

AAAI

GO-DICE: Goal-conditioned Option-aware Offline Imitation Learning

Abhinav Jain, Vaibhav Unhelkar

In AAAI Conference on Artificial Intelligence (AAAI), 2024

Abstract PDF

Offline imitation learning (IL) refers to learning expert behavior solely from demonstrations, without any additional interaction with the environment. Despite significant advances in offline IL, existing techniques find it challenging to learn policies for long-horizon tasks and require significant re-training when task specifications change. Towards addressing these limitations, we present GO-DICE an offline IL technique for goal-conditioned long-horizon sequential tasks. GO-DICE discerns a hierarchy of sub-tasks from demonstrations and uses these to learn separate policies for sub-task transitions and action execution, respectively; this hierarchical policy learning facilitates long-horizon reasoning. Inspired by the expansive DICE-family of techniques, policy learning at both the levels transpires within the space of stationary distributions. Further, both policies are learnt with goal conditioning to minimize need for retraining when task goals change. Experimental results substantiate that GO-DICE outperforms recent baselines, as evidenced by a marked improvement in the completion rate of increasingly challenging pick-and-place Mujoco robotic tasks. GO-DICE is also capable of leveraging imperfect demonstration and partial task segmentation when available, both of which boost task performance relative to learning from expert demonstrations alone.
ICRA

Human-Guided Motion Planning in Partially Observable Environments

Carlos Quintero-Pena*, Constantinos Chamzas*, Zhanyi Sun, Vaibhav Unhelkar, Lydia E Kavraki

In International Conference on Robotics and Automation (ICRA), 2022

Abstract PDF

Motion planning is a core problem in robotics, with a range of existing methods aimed to address its diverse set of challenges. However, most existing methods rely on complete knowledge of the robot environment; an assumption that seldom holds true due to inherent limitations of robot perception. To enable tractable motion planning for high-DOF robots under partial observability, we introduce BLIND, an algorithm that leverages human guidance. BLIND utilizes inverse reinforcement learning to derive motion-level guidance from human critiques. The algorithm overcomes the computational challenge of reward learning for high-DOF robots by projecting the robot’s continuous configuration space to a motion-planner-guided discrete task model. The learned reward is in turn used as guidance to generate robot motion using a novel motion planner. We demonstrate BLIND using the Fetch robot and perform two simulation experiments with partial observability. Our experiments demonstrate that, despite the challenge of partial observability and high dimensionality, BLIND is capable of generating safe robot motion and outperforms baselines on metrics of teaching efficiency, success rate, and path quality.
ICRA

Learning Dense Rewards for Contact-Rich Manipulation Tasks

Zheng Wu, Wenzhao Lian, Vaibhav Unhelkar, Masayoshi Tomizuka, Stefan Schaal

In International Conference on Robotics and Automation (ICRA), 2021

Abstract PDF

Rewards play a crucial role in reinforcement learning. To arrive at the desired policy, the design of a suitable reward function often requires significant domain expertise as well as trial-and-error. Here, we aim to minimize the effort involved in designing reward functions for contact-rich manipulation tasks. In particular, we provide an approach capable of extracting dense reward functions algorithmically from robots’ high-dimensional observations, such as images and tactile feedback. In contrast to state-of-the-art high-dimensional reward learning methodologies, our approach does not leverage adversarial training, and is thus less prone to the associated training instabilities. Instead, our approach learns rewards by estimating task progress in a self-supervised manner. We demonstrate the effectiveness and efficiency of our approach on two contact-rich manipulation tasks, namely, peg-in-hole and USB insertion. The experimental results indicate that the policies trained with the learned reward function achieves better performance and faster convergence compared to the baselines.
CoRL

Semi-Supervised Learning of Decision-Making Models for Human-Robot Collaboration

Vaibhav Unhelkar*, Shen Li*, Julie Shah

In Conference on Robot Learning, 2019

Abstract PDF

We consider human-robot collaboration in sequential tasks with known task objectives. For interaction planning in this setting, the utility of models for decision-making under uncertainty has been demonstrated across domains. However, in practice, specifying the model parameters remains challenging, requiring significant effort from the robot developer. To alleviate this challenge, we present ADACORL, a framework to specify decision-making models and generate robot behavior for interaction. Central to our approach are a factored task model and a semi-supervised algorithm to learn models of human behavior. We demonstrate that our specification approach, despite significantly fewer labels, generates models (and policies) that perform equally well or better than models learned with supervised data. By leveraging pre-computed performance bounds and an online planner, ADACORL can generate robot behavior for collaborative tasks with large state spaces (> 1 million states) and short planning times (< 0.5 s).