Navigating the Nuances: A Fine-grained Evaluation of  Vision-Language Navigation

Navigating the Nuances: A Fine-grained Evaluation of Vision-Language Navigation

ACL ARR 2024 April Submission175 Authors

14 Apr 2024 (modified: 20 May 2024)ACL ARR 2024 April SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: This study presents a novel evaluation framework for Vision-Language Navigation (VLN) task. It aims to assess the limitations of current models for various instruction categories in a finer grained level. During dataset design, to ensure a comprehensive coverage across instruction categories, we first iteratively constructing a context-free grammar (CFG) for VLN instructions with the help of Large-Language Models (LLMs). Based on the CFG, we induct and generate data spanning five principal instruction categories (i.e. direction change, landmark recognition, region recognition, vertical movement and numerical comprehension). Our analysis of different models reveals notable performance discrepancies and recurrent issues. The stagnation of numerical comprehension, heavy selective biases over directional concepts and other interesting findings contribute to the development of future language-guided navigation systems.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: benchmarking; vision language navigation;

Contribution Types: Model analysis & interpretability, Data resources

Languages Studied: English

Section 2 Permission To Publish Peer Reviewers Content Agreement: Authors grant permission for ACL to publish peer reviewers' content

Submission Number: 175

Loading