Bootstrapping Object-level Planning with Large Language Models
TL;DR – This paper formalizes the concept of object-level planning and discusses how this level of planning naturally integrates with large language models (LLMs).
TL;DR – This paper formalizes the concept of object-level planning and discusses how this level of planning naturally integrates with large language models (LLMs).
TL;DR – Building on prior work (Lang2LTL - CoRL 2023), this paper introduces a modular system that enables robots to follow natural language commands with spatiotemporal referring expressions. This system leverages multi-modal foundation models as well as the formal language LTL (linear temporal logic).
TL;DR – In this paper, we introduce CAPE: an approach to correct errors encountered during robot plan execution. We exploit the ability of large language models to generate high-level plans and to reason about causes of errors.
TL;DR – This paper introduces a deep learning-based method for learning about the effects of verbs – more specifically, looking at initiation and termination conditions as with Markov Decision Processes (MDPs).
TL;DR – In this paper, we introduce the idea of connecting FOONs to robotic task and motion planning. We automatically transform a FOON graph, which exists at the object level (i.e., it is a representation that uses meaningful labels or expressions close to human language), into task planning specifications written in PDDL (not a very intuitive way to communicate about tasks).
TL;DR – This workshop paper (specifically, a blue-sky submission) introduces the importance of object-level planning and representation as an additional layer on top of task and motion planning. I present several benefits of using object-level planning for long-term use in robotics.
TL;DR – In this paper, we introduce the idea of connecting FOONs to robotic task and motion planning. We automatically transform a FOON graph, which exists at the object level (i.e., it is a representation that uses meaningful labels or expressions close to human language), into task planning specifications written in PDDL (not a very intuitive way to communicate about tasks).
TL;DR – This was a collaboration with Clemson University’s Yunyi Jia and Yi Chen, who were interested in using FOONs for representing assembly tasks. They successfully utilized and adapted a FOON to robotic assembly execution.
TL;DR – In this paper, we attempt to execute task plan sequences extracted from FOONs. However, these sequences may contain actions that are not executable by a robot. Therefore, a human is introduced in the planning and execution loop, and both the robot and human assistant work together to solve the task.
TL;DR – This work uses the features from the motion taxonomy to improve action recognition on egocentric videos from the EPIC-KITCHENS dataset. This is done by integrating motion code detection for action sequences.
TL;DR – In this work, we showed how motion codes (which can be constructed using the motion taxonomy proposed in our RSS 2020 paper) can be used to improve action recognition with deep neural networks.
TL;DR – In this work, we introduce new changes to the features of the motion taxonomy and show how action verbs encoded as motion codes better capture differences between them than conventional word embedding (as word2vec).
TL;DR – This paper introduces the motion taxonomy, a collection of robot-relevant features that are better suited for verb or action embedding than conventional word embedding. Motion codes are constructed per verb using the taxonomy. In this work, we show that motion codes assigned to verbs are closely related to one another based on force and trajectory data.
TL;DR – This was my first survey paper that covers knowledge representations for service robotics. Although it is dated, it covers an extensive list of approaches used to represent knowledge for several robot sub-tasks.
TL;DR – This work leverages functional object-oriented networks and deep learning for video understanding. In addition, with the deep network framework, we jointly recognize object and action types, which can then be used for constructing new FOON structures.
TL;DR – In this paper, we explore methods in natural language processing (NLP) – specifically semantic similarity – for expanding or generalizing knowledge contained in a FOON. This alleviates the need for demonstrating and annotating graphs by other means.
TL;DR – This was the very first paper on FOON: the functional object-oriented network. Here, we introduced what they are and how they can be used for task planning. They are advantageous for their flexibility and human interpretability.