Navigating to Objects Specfied by Images
This Image Goal Navigation task requires reasoning over the relation of objects in the scene (e.g., disambiguating between instances of similar appearance) and exploring efficiently to discover where the goal is (e.g., entering bedrooms while searching for the bed).
sub-tasks
- exploration
- goal instance re-identification
- goal localization
- local navigation
Related Work
- Image Goal Navigation(ImageNav) exits ambiguous image goals (e.g., captures of nondescript walls) and is detached from potential user applications
- instance-based ImageNav task(InstanceImageNav)
- goal images depict an object instance
- goal images are independent of agent embodiment
- limitations of end-to-end methods
- high sample complexity
- overfitting
- poor sim-to-real transfer
- skills relating to visual scene understanding, semantic exploration, and long-term memory tend to be difficult to learn end-to-end
InstanceImageNav Task Setup
Methodology
Exploration
Efficiently visiting locations in the environment that may afford this view, i.e., maximizing coverage of the observable space, can lead to a successful navigation
frontier-based exploration
operates on a top-down 2D map tracking occupancy, free-space, and frontiers, where frontiers delineate the boundary between explored and unexplored regions, FBE greedily selects the nearest frontier in the map to navigate to
Goal Instance Re-Identification
- SuperPoint提取关键点
- SuperGlue关键点特征匹配
Goal Localization
- 输出目标相对于agent当前pose的位置
- 当观察到目标时(特征点匹配后高于检测阈值),则使用Detic进行instance segmentation,然后选取包含图像中心点的mask作为对象mask,把mask内的关键点进行反投影获得目标位置
- The point cloud of goal points is coalesced along the height dimension and voxelized into a 2D map channel. This channel is then concatenated with the map constructed during exploration and treated as a navigation target
Local Navigation
- fast marching method
- given a partial occupancy map and a set of goal points, plan a path in the agent’s action space $\mathcal{A}$ that conveys the agent to the nearest reachable point.
- fast marching method
Experiments
- The object instances depicted by the goal images have a category belonging to one of the following: {chair, couch, plant, bed, toilet, television }
Failure Analysis
cause of failure(For a random subset of 100 failed episodes)
Instance Re-ID: False Negative
Instance Re-ID: False Positive
- Incorrect detections often come from visual features correctly matched to the goal image background
- This failure mode could be mitigated with additional background vs. foreground reasoning
Exploration Error: 在有限时间内没有看到目标物体
Localization Error: 投影点不属于目标物体,可能是关键点匹配错误,或者mask错误