0%

Navigating to Objects Specfied by Images

Navigating to Objects Specfied by Images

This Image Goal Navigation task requires reasoning over the relation of objects in the scene (e.g., disambiguating between instances of similar appearance) and exploring efficiently to discover where the goal is (e.g., entering bedrooms while searching for the bed).

sub-tasks

  • exploration
  • goal instance re-identification
  • goal localization
  • local navigation

image-20231008111603939

  • Image Goal Navigation(ImageNav) exits ambiguous image goals (e.g., captures of nondescript walls) and is detached from potential user applications
  • instance-based ImageNav task(InstanceImageNav)
    • goal images depict an object instance
    • goal images are independent of agent embodiment
  • limitations of end-to-end methods
    • high sample complexity
    • overfitting
    • poor sim-to-real transfer
    • skills relating to visual scene understanding, semantic exploration, and long-term memory tend to be difficult to learn end-to-end

InstanceImageNav Task Setup

image-20231008113706804

image-20231008113746728

Methodology

  • Exploration

    • Efficiently visiting locations in the environment that may afford this view, i.e., maximizing coverage of the observable space, can lead to a successful navigation

    • frontier-based exploration

      operates on a top-down 2D map tracking occupancy, free-space, and frontiers, where frontiers delineate the boundary between explored and unexplored regions, FBE greedily selects the nearest frontier in the map to navigate to

  • Goal Instance Re-Identification

    • SuperPoint提取关键点
    • SuperGlue关键点特征匹配
  • Goal Localization

    • 输出目标相对于agent当前pose的位置
    • 当观察到目标时(特征点匹配后高于检测阈值),则使用Detic进行instance segmentation,然后选取包含图像中心点的mask作为对象mask,把mask内的关键点进行反投影获得目标位置
    • The point cloud of goal points is coalesced along the height dimension and voxelized into a 2D map channel. This channel is then concatenated with the map constructed during exploration and treated as a navigation target
  • Local Navigation

    • fast marching method
      • given a partial occupancy map and a set of goal points, plan a path in the agent’s action space $\mathcal{A}$ that conveys the agent to the nearest reachable point.

Experiments

  • The object instances depicted by the goal images have a category belonging to one of the following: {chair, couch, plant, bed, toilet, television }

Failure Analysis

image-20231008120146714

image-20231008120100694

  • cause of failure(For a random subset of 100 failed episodes)

    • Instance Re-ID: False Negative

      image-20231008120329590

    • Instance Re-ID: False Positive

      • Incorrect detections often come from visual features correctly matched to the goal image background
      • This failure mode could be mitigated with additional background vs. foreground reasoning
    • Exploration Error: 在有限时间内没有看到目标物体

    • Localization Error: 投影点不属于目标物体,可能是关键点匹配错误,或者mask错误

image-20231008115916798

image-20231008122330985