NOTES:

  • This work extends Lang2LTL : a modular system that leverages language models for grounding concepts or referrents in natural language commands to a formal logic known as linear temporal logic (LTL).
    • Lang2LTL version 1 (Liu and Yang et al. 2023) only worked on grounding commands with temporal constraints.
    • Lang2LTL version 2 (i.e., this paper) added the capability of grounding spatiotemporal commands, where there may be reasoning required to understand spatial relations between referrents while also accounting for temporal ordering constraints.
  • Our system combines the modalities of text and images to perform language grounding.
  • We perform experiments in both simulated and real-robot experiments (see videos on our website ).

Citation:

J. X. Liu, A. Shah, G. Konidaris, S. Tellex, and D. Paulius (2024). “Lang2LTL-2: Grounding Spatiotemporal Navigation Commands Using Large Language and Vision-Language Models”. In: 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).