Navigating to Objects Specfied by Images

This Image Goal Navigation task requires reasoning over the relation of objects in the scene (e.g., disambiguating between instances of similar appearance) and exploring efﬁciently to discover where the goal is (e.g., entering bedrooms while searching for the bed).

sub-tasks

exploration
goal instance re-identiﬁcation
goal localization
local navigation

Image Goal Navigation(ImageNav) exits ambiguous image goals (e.g., captures of nondescript walls) and is detached from potential user applications
instance-based ImageNav task(InstanceImageNav)
- goal images depict an object instance
- goal images are independent of agent embodiment
limitations of end-to-end methods
- high sample complexity
- overfitting
- poor sim-to-real transfer
- skills relating to visual scene understanding, semantic exploration, and long-term memory tend to be difficult to learn end-to-end

InstanceImageNav Task Setup

Methodology

Exploration
- Efficiently visiting locations in the environment that may afford this view, i.e., maximizing coverage of the observable space, can lead to a successful navigation
- frontier-based exploration
  
  operates on a top-down 2D map tracking occupancy, free-space, and frontiers, where frontiers delineate the boundary between explored and unexplored regions, FBE greedily selects the nearest frontier in the map to navigate to
Goal Instance Re-Identification
- SuperPoint提取关键点
- SuperGlue关键点特征匹配
Goal Localization
- 输出目标相对于agent当前pose的位置
- 当观察到目标时(特征点匹配后高于检测阈值)，则使用Detic进行instance segmentation，然后选取包含图像中心点的mask作为对象mask，把mask内的关键点进行反投影获得目标位置
- The point cloud of goal points is coalesced along the height dimension and voxelized into a 2D map channel. This channel is then concatenated with the map constructed during exploration and treated as a navigation target
Local Navigation
- fast marching method
  - given a partial occupancy map and a set of goal points, plan a path in the agent’s action space $\mathcal{A}$ that conveys the agent to the nearest reachable point.

Experiments

The object instances depicted by the goal images have a category belonging to one of the following: {chair, couch, plant, bed, toilet, television }

Failure Analysis

cause of failure(For a random subset of 100 failed episodes)
- Instance Re-ID: False Negative
- Instance Re-ID: False Positive
  - Incorrect detections often come from visual features correctly matched to the goal image background
  - This failure mode could be mitigated with additional background vs. foreground reasoning
- Exploration Error: 在有限时间内没有看到目标物体
- Localization Error: 投影点不属于目标物体，可能是关键点匹配错误，或者mask错误

Navigating to Objects Specfied by Images

Related Work

InstanceImageNav Task Setup

Methodology

Experiments

Failure Analysis